CN111737090B - Log simulation method and device, computer equipment and storage medium - Google Patents

Log simulation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111737090B
CN111737090B CN202010860245.7A CN202010860245A CN111737090B CN 111737090 B CN111737090 B CN 111737090B CN 202010860245 A CN202010860245 A CN 202010860245A CN 111737090 B CN111737090 B CN 111737090B
Authority
CN
China
Prior art keywords
log
simulation
statistical characteristic
sample
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010860245.7A
Other languages
Chinese (zh)
Other versions
CN111737090A (en
Inventor
巩国栋
严朝豪
薛野
宋洋
孙凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhixiang Technology Co Ltd
Original Assignee
Beijing Zhixiang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhixiang Technology Co Ltd filed Critical Beijing Zhixiang Technology Co Ltd
Priority to CN202010860245.7A priority Critical patent/CN111737090B/en
Publication of CN111737090A publication Critical patent/CN111737090A/en
Application granted granted Critical
Publication of CN111737090B publication Critical patent/CN111737090B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a log simulation method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample; according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in a preset number of generated simulation log samples to obtain a first simulation log set; according to the statistical characteristic rule of the target basic log sample, the statistical characteristic simulation is carried out on the first simulation log set to obtain the second simulation log set.

Description

Log simulation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a log simulation method and apparatus, a computer device, and a storage medium.
Background
With the development of computer technology, a technology for performing data analysis on a computer operating state according to a host log appears, so as to detect the computer operating state. When data analysis is performed on the running state of the computer according to the host logs, the types and the number of the host logs are particularly important for the accuracy of the data analysis result.
In the current method for acquiring the host log, natural accumulation of the log is required to be performed by computer equipment or operation of actual processes is required to be performed by technicians aiming at different scenes, so that the corresponding host log is manufactured. However, the current log acquisition method needs a long time, and technicians run the actual process, so that the operation is complicated, and the requirement of daily detection data analysis of the running state of the computer cannot be met.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a log simulation method, apparatus, computer device and storage medium for solving the above technical problems.
A method of log simulation, the method comprising:
acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample;
according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in the generated simulation log samples with the preset number to obtain a first simulation log set;
and performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
In one embodiment, the obtaining a target basic log sample and generating a preset number of simulation log samples according to the target basic log sample includes:
preprocessing an original log sample to obtain a basic log sample, and storing the basic log sample into a basic log sample library;
extracting a preset number of target basic log samples from the basic log sample library;
and copying the extracted target basic log samples to obtain simulation log samples with preset number.
In one embodiment, the performing, according to a preset log identity generation rule, identity simulation on each of the simulation logs in the generated simulation log samples in a preset number to obtain a first simulation log set includes:
generating different identity identification fields according to a preset log identity identification generation rule and a preset random number generator;
and updating the generated identification field to the corresponding position of the identification field of the simulation log sample to obtain a first simulation log set.
In one embodiment, the performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set includes:
performing statistical characteristic analysis on the target basic log sample to obtain a reference statistical characteristic parameter of the target basic log sample;
carrying out statistical characteristic analysis on the simulation log sample to obtain a simulation statistical characteristic parameter of the simulation log sample;
and when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, and obtaining a second simulation log set.
In one embodiment, the reference statistical characteristic parameters comprise reference log quantity statistical characteristic parameters and reference field statistical characteristic parameters, and the simulation statistical characteristic parameters comprise simulation log quantity statistical characteristic parameters and simulation field statistical characteristic parameters; when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, including:
when the statistical characteristic parameter of the reference log quantity is inconsistent with the statistical characteristic parameter of the simulated log quantity, increasing or decreasing the number of the simulated log samples according to the log quantity distribution rule of the target basic log sample until the statistical characteristic parameter of the simulated log quantity is consistent with the statistical characteristic parameter of the reference log quantity;
and when the statistical characteristic parameters of the reference field are inconsistent with the statistical characteristic parameters of the simulation field, generating a corresponding data set by using a random number generator according to a field distribution rule in the target basic log sample, and modifying the fields of the simulation log sample until the statistical characteristic parameters of the simulation field are consistent with the statistical characteristic parameters of the reference field.
In one embodiment, the method further comprises:
according to a received log simulation request of a computer target operation scene, determining a log type contained in the computer target operation scene in a preset corresponding relation between the computer operation scene and a log type contained in the operation scene;
and calling generation rules and random number generators of different types of logs according to the log types, and performing different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set.
In one embodiment, the method further comprises:
setting the same log label for the corresponding simulation log generated by each basic log sample according to the log label carried by each basic log sample in the target basic log sample, so that the simulation log samples in the simulation log set carry the log labels of the corresponding types, and the simulation log set comprises the first simulation log set and the second simulation log set;
after the generating rule and the random number generator for calling different types of logs according to the log types are called, different types of log simulation processing is performed on the simulation logs in the second simulation log set, and a third simulation log set is obtained, the method further comprises the following steps:
according to the corresponding relation between the log type and the log label in the preset computer target operation scene, verifying the log label carried by each simulation log in the third simulation log set;
and when the third simulation log set has the simulation log which does not carry the log label, adding the log label to the simulation log according to the log type in the computer target operation scene.
A log emulation device, the device comprising:
the acquisition module is used for acquiring a target basic log sample and generating a preset number of simulation log samples according to the target basic log sample;
the first simulation module is used for performing identity identification simulation on each simulation log in the generated simulation log samples with preset number according to a preset log identity identification generation rule to obtain a first simulation log set;
and the second simulation module is used for carrying out statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample;
according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in the generated simulation log samples with the preset number to obtain a first simulation log set;
and performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample;
according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in the generated simulation log samples with the preset number to obtain a first simulation log set;
and performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
According to the log simulation method, the log simulation device, the computer equipment and the storage medium, a target basic log sample is obtained, and a preset number of simulation log samples are generated according to the target basic log sample; according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in the generated simulation log samples with the preset number to obtain a first simulation log set; according to the statistical characteristic rule of the target basic log sample, performing statistical characteristic simulation on the first simulation log set to obtain a second simulation log set.
Drawings
FIG. 1 is a flow diagram of a log simulation method in one embodiment;
FIG. 2 is a flow diagram of a method for pre-processing raw log samples in one embodiment;
FIG. 3 is a flow diagram illustrating a method for simulating a primary log in accordance with an embodiment;
FIG. 4 is a flowchart illustrating a method for simulating a secondary log according to an embodiment;
FIG. 5 is a schematic flow chart diagram illustrating the statistical feature simulation steps of the secondary simulation in one embodiment;
FIG. 6 is a flow diagram illustrating a method for processing a three-level log simulation in one embodiment;
FIG. 7 is a flowchart illustrating a method for setting a journal tag in one embodiment;
FIG. 8 is a diagram illustrating an example embodiment of a log simulation method;
FIG. 9 is a block diagram showing the structure of a log emulation device in one embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, a log simulation method is provided, and this embodiment is illustrated by applying this method to a computer device, and it is understood that this method may also be applied to other electronic devices having a log function, and therefore, this embodiment is not limited, and this method includes the following steps:
step 101, obtaining a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample.
In implementation, the computer device obtains a target basic log sample, and generates a preset number of simulation logs as simulation log samples based on the target basic log sample, where the generated simulation logs are logs with the same content as the target basic log obtained by copying the preset number of target basic logs in the target basic log sample.
Optionally, the preset number of the simulation log samples is set by inputting a demand log number instruction through the terminal device by the user, or the number of the simulation logs in the simulation log samples is preset through a preset configuration file, so that the specific implementation manner is not limited in the embodiment of the present application.
102, performing identity identification simulation on each simulation log in a preset number of generated simulation log samples according to a preset log identity identification generation rule to obtain a first simulation log set.
In implementation, the computer device performs identity identification simulation on each simulation log in the obtained simulation log samples of the preset number according to a preset log identity identification generation rule to obtain a first simulation log set. The emulation of the emulated log identity may be viewed as a first level log emulation of the log sample.
Wherein, the log identity may include: the identity information of the file log also comprises characteristic information such as a file name, a file size, a file path and the like; the identification information for the blog also includes connection traffic size, etc.
And 103, performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
In implementation, according to the simulation requirement of the log sample, the statistical characteristic rule of the simulation log sample is required to be consistent with the log sample generated by the actual running process, so that the statistical characteristic of the first simulation log set needs to be simulated, and the computer device performs statistical characteristic simulation of the log distribution rule on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set. Wherein, the statistical characteristic simulation of the log sample can be regarded as a secondary log simulation of the log sample.
In the log simulation method, the computer equipment acquires a target basic log sample and generates a preset number of simulation log samples according to the target basic log sample; then, the computer equipment performs identity identification simulation on each simulation log in a preset number of generated simulation log samples according to a preset log identity identification generation rule to obtain a first simulation log set; furthermore, the computer device carries out statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
In one embodiment, as shown in fig. 2, the specific processing procedure of step 101 is as follows:
step 1011, preprocessing the original log sample to obtain a basic log sample, and storing the basic log sample into a basic log sample library.
In implementation, the computer device preprocesses an original log sample generated by actual operation accumulated in the memory, that is, filters logs with obviously missing data to obtain a basic log sample, and then stores the obtained basic log sample in the basic log sample library.
At step 1012, a preset number of target base log samples are extracted from the base log sample library.
In implementation, when log simulation is required, the computer device extracts a preset number of target basic log samples from the basic log sample library to serve as a master of the simulation log.
Optionally, the method for extracting the target basic log sample from the basic log sample library may adopt different random sampling methods in statistics to sample according to different distribution characteristics of the basic log samples included in different basic log sample libraries, and therefore, the specific implementation process is not limited in the embodiment of the present application.
Optionally, the number of the extracted target basic log samples is determined according to the total number of the basic log samples included in the basic log sample library and the number of the simulation logs required to be generated, and therefore, the embodiment of the present application is not limited.
And 1013, copying the extracted target basic log samples to obtain simulation log samples with a preset number.
In implementation, the computer device performs a preset number of copies on each target basic log in the extracted target basic log samples, so as to obtain a total preset number of simulation log samples. For example, the computer device uses the extracted 50 target base logs as a master of the simulation logs, and copies 49 copies of each target base log (simulation log master), and finally obtains a simulation log sample containing 2500 logs.
In the embodiment, the logs which are originally accumulated are preprocessed to screen out seriously missing data contents and cannot be used for data analysis, so that the quality of the log samples is improved, the obtained high-quality basic log samples are stored in the basic log sample library and serve as a log (data) acquisition source during log simulation, when the log simulation is required, the log samples are sampled in the basic log sample library, namely, the characteristic attributes of all logs contained in the whole basic log sample library can be reflected through a small number of basic log samples by utilizing a statistical principle, the characteristic analysis efficiency of the basic log sample library is improved, and a preset number of simulated log samples are further generated according to the sampled target basic log samples, so that the subsequent step-by-step simulation processing is conveniently performed on the simulated log samples.
In one embodiment, as shown in FIG. 3, the specific process of step 102 is as follows:
step 1021, generating different identity fields according to a preset log identity generation rule and a preset random number generator.
In implementation, the computer device may generate different id fields according to a preset log id generation rule and a preset random number generator.
Specifically, for example, a preset number of host ID numbers satisfying the ID number bit composition are randomly generated by a random number generator based on the host ID generation rule according to a preset host ID generation rule. For another example, for the time information included in the log, a preset number of target times within a preset time range threshold are generated according to a corresponding random number generator. In addition, the log identity further comprises: the log ID information may be simulated by a corresponding generation rule and a random number generator, and a specific simulation implementation manner thereof is similar to a simulation manner of the host ID number included in the log, and is not described in detail in this embodiment of the application.
And 1022, updating the generated identification field to a position corresponding to the identification field of the simulation log sample to obtain a first simulation log set.
In implementation, the computer device updates the generated identification field to a position corresponding to the identification field of the simulation log sample, so as to obtain a first simulation log set, that is, a simulation log sample obtained by performing primary feature simulation on the simulation log.
Specifically, the identity field obtained according to the corresponding identity generation rule and the random number generator is updated to the corresponding position of the identity field of each simulation log, and the essence is that the identity information in the simulation logs with the same identity information obtained by simple copying is correspondingly modified, so that each simulation log has different identity information and is used for simulating to obtain different log samples. For example, in 2500 simulation logs obtained by copying corresponding 50 simulation log masters, identification information of the simulation logs obtained from the same simulation log master is the same, and feature information included in such simulation log samples is still the feature information of the original 50 simulation log masters, and has no data analysis value.
In this embodiment, the identity information of the simulation log sample is simulated through a preset identity generation rule and a random number generator, so that the generated log sample can display different identity information, that is, the feature information of the simulation log sample for data analysis is added.
In one embodiment, as shown in fig. 4, the specific processing procedure of step 103 is as follows:
and step 1031, performing statistical characteristic analysis on the target basic log sample to obtain a reference statistical characteristic parameter of the target basic log sample.
In implementation, the computer device performs statistical characteristic analysis on the target basic log sample obtained by initial sampling to obtain each statistical characteristic parameter of the target basic log sample, and uses the statistical characteristic parameter as a reference of the characteristic simulation, so that the statistical characteristic parameter is a reference statistical characteristic parameter. Specifically, according to the different distribution rules (for example, positive distribution, uniform distribution, etc.) of the corresponding sample parameters in the target basic log sample, the target basic log sample may obtain statistical characteristic parameters of the corresponding data: expectation, variance, standard deviation, mode, etc., and the embodiments of the present application are not limited.
And 1032, performing statistical characteristic analysis on the simulation log sample to obtain a simulation statistical characteristic parameter of the simulation log sample.
In implementation, the computer device performs statistical characteristic analysis on the simulation log sample to obtain each simulation statistical characteristic parameter of the simulation log sample. Specifically, the characteristic parameters of the data corresponding to the simulated log sample may also be expectation, variance, standard deviation, mode, and the like, corresponding to the characteristic parameters of the target basic log sample, which is not limited in the embodiment of the present application.
And 1033, when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, and obtaining a second simulation log set.
In the implementation, the computer equipment compares the reference statistical characteristic parameter with the simulation characteristic parameter of the simulation log sample, when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, the computer equipment needs to perform statistical characteristic simulation processing on the simulation log sample corresponding to the reference statistical characteristic parameter, namely, the simulation log sample is modified according to the distribution rule of the target basic log sample, the simulation statistical characteristic parameter of the modified simulation log sample is checked until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, and the modification of the simulation log sample is stopped.
Optionally, the computer device may also read and check the reasonability of the association between logs of the same type and logs of different types in each generated simulation log according to a preset log generation rule, and the reasonability of fields in specific simulation logs of the same type, for example, documents read and written by Word in a document log should be both a doc suffix; the relevance between different types of logs is reasonable, e.g. the size of the browser read-write file (file log) corresponds to the corresponding network traffic size (network log). Therefore, the rationality of the association of the simulation logs in the simulation log samples is ensured, and an abnormal simulation log sample set cannot occur.
In one embodiment, as shown in fig. 5, the reference statistical characteristic parameter includes a reference log quantity statistical characteristic parameter and a reference field statistical characteristic parameter, and the simulation statistical characteristic parameter includes a simulation log quantity statistical characteristic parameter and a simulation field statistical characteristic parameter, then the specific processing procedure of step 1033 is as follows:
step 10331, when the statistical characteristic parameter of the reference log quantity is not consistent with the statistical characteristic parameter of the simulation log quantity, increasing or decreasing the number of the simulation log samples according to the log quantity distribution rule of the target basic log sample until the statistical characteristic parameter of the simulation log quantity is consistent with the statistical characteristic parameter of the reference log quantity.
In implementation, the statistical characteristics of the log sample can be reflected in the distribution of the log amount and the field value of the log sample, for the statistical characteristic distribution of the log quantity, for example, the statistical characteristic distribution of the log quantity corresponding to each application program of the file system, the statistical characteristic distribution of the log quantity of different types of logs corresponding to different time periods, the statistical characteristic distribution of the internal log quantity and the external log quantity of the network connection log, and the like, therefore, the computer device firstly simulates the statistical characteristic distribution of the log quantity, comparing the statistical characteristic of the reference log quantity with the statistical characteristic parameter of the simulated log quantity, when the statistical characteristic parameter of the reference log quantity obtained by calculation is inconsistent with the statistical characteristic parameter of the simulated log quantity, the computer equipment according to the log quantity distribution rule of the target basic log sample, and increasing and decreasing the number of the simulation log samples until the statistical characteristic parameters of the simulation log amount are consistent with the statistical characteristic parameters of the reference log amount.
Step 10332, when the statistical characteristic parameter of the reference field is not consistent with the statistical characteristic parameter of the simulation field, generating a corresponding data set by using a random number generator according to the field distribution rule in the target basic log sample, and modifying the field of the simulation log sample until the statistical characteristic parameter of the simulation field is consistent with the statistical characteristic parameter of the reference field.
In implementation, the computer device simulates the statistical characteristic distribution of fields (namely field values), compares the statistical characteristic of a reference field with the statistical characteristic parameter of a simulation field, and generates a corresponding data set by using a preset random number generator according to the field (field value) distribution rule in a target basic log sample when the statistical characteristic parameter of the reference field is inconsistent with the statistical characteristic parameter of the simulation field, and modifies the corresponding fields in the simulation log sample until the statistical characteristic parameter of the simulation field is consistent with the statistical characteristic parameter of the reference field. For example, the word frequency ratio of the filenames of the file logs included in the target basic log set conforms to normal distribution and corresponds to a set of values of expectation and standard deviation, then the word frequency ratio of the filenames of the file logs in the simulation log sample set also conforms to normal distribution, and the corresponding expectation and standard deviation of the filenames are the same as the reference parameter values of the target basic log sample set.
In this embodiment, the statistical characteristics of the simulation log samples are simulated, so that the obtained simulation log set has the same distribution rule (statistical characteristics) as the target basic log samples, that is, the log data analysis performed by the simulation log samples in sufficient number and including the required statistical characteristics is represented, and the analysis result can reflect the running state information of the target basic log samples. By adopting the second-level log simulation processing method in the embodiment, the characteristic information is further added to the log sample, and the second-level simulation processing and the first-level simulation processing are independent from each other, so that the simulation effect of each level can be verified.
In one embodiment, as shown in fig. 6, the log simulation method further includes:
and 104, determining the log type contained in the computer target operation scene in the preset corresponding relation between the computer operation scene and the log type contained in the operation scene according to the received log simulation request of the computer target operation scene.
In the implementation, the computer device stores a corresponding relationship between a computer running scene and a log type in advance, that is, a corresponding relationship between the running scene and the type and the number of logs generated by actual running of the running scene, and then determines the log type included in the computer target running scene in a preset corresponding relationship between the log types included in the computer running scene and the running scene according to a received log simulation request of the computer target running scene.
Specifically, for example, a scenario in which an employee packages company core data and then sends the company core data out through certain network sharing software is taken as an example, and the scenario includes the following computer host actions: a large number of core data files are read by the same process in a certain time period- > the process simultaneously creates a new file- > the new file is opened by certain sharing software- > the sharing software simultaneously has network outgoing flow not smaller than the size of the file, and meanwhile, the running action of the host computer is recorded along with human-computer interaction. According to the action process, the corresponding log types comprise a file operation log, a process log, a network connection log and a host interaction log of time sequence and field values.
Optionally, in the correspondence between the computer operating scenario and the log type, the computer operating scenario may further include a combination scenario, other self-defined scenarios, and the like, and each scenario corresponds to a preset log type, so that the correspondence between the scenario stored in the computer device and the log type is not specifically limited in the embodiment of the present application.
And 105, calling generation rules and random number generators of different types of logs according to the log types, and performing different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set.
In implementation, the computer device calls corresponding log generation rules of different types and the matched random number generator according to the log types, and performs log simulation processing of different types on the simulation logs in the second simulation log set to obtain a third simulation log set.
Specifically, the computer device queries the second simulation log set according to the log types included in the obtained computer target operation scene, and when the second simulation log set does not include a certain log type included in the computer target operation scene, the computer device may invoke a generation rule of the log and a matched random number generator to generate the simulation log and add the simulation log to the second simulation log set.
In this embodiment, the computer device performs, according to a received log simulation request of a computer target operation scene, simulation processing of a specific scene on a simulation log sample, that is, in a preset correspondence between a computer operation scene and a log type included in the operation scene, a log type included in the computer target operation scene is determined. And calling generation rules and random number generators of different types of logs according to the log types, and performing different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set. By adopting the method in the embodiment, the log sample under the scene can be obtained without actually deducing the characteristic scene, and the problems of difficulty and high cost of reproducing a specific scene are avoided. Meanwhile, the diversity of simulation log samples in the third simulation log set is ensured, so that the overfitting phenomenon of an algorithm occurs when log data are analyzed.
In one embodiment, as shown in fig. 7, the log simulation method further includes:
step 701, according to the log labels carried by each basic log sample in the target basic log sample, setting the same log labels for the corresponding simulation logs generated by each basic log sample, so that the simulation log samples in the simulation log set carry the log labels of the corresponding types, and the simulation log set comprises a first simulation log set and a second simulation log set.
In implementation, the computer sets the same log label for the corresponding simulation log generated by each basic log sample according to the log label carried by each basic log sample in the target basic log sample, that is, when one target basic log sample is used as a master of the simulation log, the simulation log samples copied from the master of the simulation log all carry the same log label. And correspondingly, each simulation log in the second simulation log set obtained by carrying out subsequent simulation operation on the basis of the simulation log sample also carries the same label.
Optionally, when the target basic log sample has no log label, the computer device may perform an operation of manually supplementing the log label by human-computer interaction with the log content information.
Step 702, according to the corresponding relation between the log type and the log label in the preset computer target operation scene, checking the log label carried by each simulation log in the third simulation log set; and when the third simulation log set has the simulation log which does not carry the log label, performing log label adding on the simulation log according to the log type in the computer target operation scene.
In implementation, the computer device checks the log label carried by each simulation log in the third simulation log set according to a preset corresponding relationship between the log type and the log label in the computer target operation scene, and when detecting that the third simulation log set has a simulation log which is newly added during the three-level log simulation processing and does not carry the log label, the computer device adds the log label to the simulation log according to the log type in the computer target operation scene.
Specifically, the log label is used to describe a main class and a subclass of the log, and the content of the log label may be defined according to a specific data analysis target and a specific business scenario, for example, a scenario of core data of an employee-issuing company is simulated, and a main class may be correspondingly added to a certain simulation log in the third simulation log set: safety alarm, subclass: an internal violation tag.
In the embodiment, the log label of the simulation log sample is correspondingly obtained through the log label carried by the target basic log sample, the characteristic information carried by the target basic log sample is reserved, meanwhile, the identification effect on each type of simulation log in the simulation log sample is achieved, in addition, the log label can be set for the newly added simulation log according to the corresponding target operation scene, and the characteristic information of the simulation log sample set is increased.
In an embodiment, as shown in fig. 8, an example of a log simulation method is provided, and a specific processing procedure is as follows:
firstly, setting the format and the simulation quantity of required log samples, namely acquiring and copying basic log samples, specifically, acquiring target basic log samples by dividing original log samples, extracting and preprocessing sample logs, and copying the target basic log samples to obtain a zero-order simulation result; secondly, simulating basic feature requirements aiming at the zero-level simulation result, carrying out primary simulation by taking a field where the identification information is located as a target field, and obtaining a primary simulation result through a preset value range of the identification information field (a value range of the target field) and a corresponding basic field value simulator (a random number generator); simulating the statistical characteristic requirement of the primary simulation result, namely simulating the statistical characteristic of the primary simulation result by combining a statistical characteristic simulator according to the statistical characteristic rule of a target basic log sample to obtain a secondary simulation result, simulating a self-defined operation scene of the secondary simulation result, combining the self-defined characteristic simulator according to the log type corresponding to the self-defined operation scene, adding the log types contained in the secondary simulation result to obtain a tertiary simulation result, simultaneously adding label information to the tertiary simulation result, generating corresponding label information according to the requirement of the label information, enabling each simulation log in the tertiary simulation result to carry a corresponding subclass of log labels, and storing the final simulation result into a log database, for data analysis of the log samples.
It should be understood that although the various steps in the flow charts of fig. 1-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 9, there is provided a log simulation apparatus 900, including: the system comprises an acquisition module, a first simulation module and a second simulation module, wherein:
the obtaining module 910 is configured to obtain a target basic log sample, and generate a preset number of simulation log samples according to the target basic log sample.
The first simulation module 920 is configured to perform identity simulation on each simulation log in a preset number of generated simulation log samples according to a preset log identity generation rule, so as to obtain a first simulation log set.
And the second simulation module 920 is configured to perform statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
In one embodiment, the obtaining module 910 is specifically configured to pre-process an original log sample to obtain a basic log sample, and store the basic log sample in a basic log sample library.
And extracting a preset number of target basic log samples from the basic log sample library.
And copying the extracted target basic log samples to obtain simulation log samples with preset number.
In one embodiment, the first simulation module 920 is specifically configured to generate different id fields according to a preset log id generation rule and a preset random number generator.
And updating the generated identification field to the corresponding position of the identification field of the simulation log sample to obtain a first simulation log set.
In one embodiment, the second simulation module 930 is specifically configured to perform statistical feature analysis on the target basic log sample to obtain a reference statistical feature parameter of the target basic log sample.
And carrying out statistical characteristic analysis on the simulation log sample to obtain simulation statistical characteristic parameters of the simulation log sample.
And when the reference statistical characteristic parameters are inconsistent with the simulation statistical characteristic parameters, modifying the simulation log samples according to the distribution rule of the target basic log samples until the simulation statistical characteristic parameters are consistent with the reference statistical characteristic parameters, and obtaining a second simulation log set.
In one embodiment, the reference statistical characteristic parameters comprise reference log quantity statistical characteristic parameters and reference field statistical characteristic parameters, and the simulation statistical characteristic parameters comprise simulation log quantity statistical characteristic parameters and simulation field statistical characteristic parameters; the second simulation module is specifically used for increasing and decreasing the number of the simulation log samples according to the log quantity distribution rule of the target basic log sample when the reference log quantity statistical characteristic parameter is inconsistent with the simulation log quantity statistical characteristic parameter until the simulation log quantity statistical characteristic parameter is consistent with the reference log quantity statistical characteristic parameter.
And when the statistical characteristic parameters of the reference field are inconsistent with the statistical characteristic parameters of the simulation field, generating a corresponding data set by using a random number generator according to the field distribution rule in the target basic log sample, and modifying the field of the simulation log sample until the statistical characteristic parameters of the simulation field are consistent with the statistical characteristic parameters of the reference field.
In one embodiment, the apparatus 900 further comprises:
and the determining module is used for determining the log types contained in the computer target operation scene in the preset corresponding relation between the computer operation scene and the log types contained in the operation scene according to the received log simulation request of the computer target operation scene.
And the third simulation module is used for calling generation rules and random number generators of different types of logs according to the log types, and carrying out different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set.
In one embodiment, the apparatus 900 further comprises:
and the label module is used for setting the same log label for the corresponding simulation log generated by each basic log sample according to the log label carried by each basic log sample in the target basic log sample, so that the simulation log samples in the simulation log set carry the log labels of the corresponding types, and the simulation log set comprises a first simulation log set and a second simulation log set.
And the checking module is used for checking the log label carried by each simulation log in the third simulation log set according to the corresponding relation between the log type and the log label in the preset computer target operation scene.
And the correction module is used for adding the log labels to the simulation logs according to the log types in the computer target operation scene when the third simulation log set has the simulation logs which do not carry the log labels.
The log simulation device acquires a target basic log sample, and generates a preset number of simulation log samples according to the target basic log sample; according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in a preset number of generated simulation log samples to obtain a first simulation log set; according to the statistical characteristic rule of the target basic log sample, the statistical characteristic simulation is carried out on the first simulation log set to obtain the second simulation log set.
For the specific limitations of the log simulation apparatus, reference may be made to the limitations of the log simulation method in the foregoing, and details are not described here. The modules in the log simulation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a log simulation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
and acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample.
And according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in a preset number of generated simulation log samples to obtain a first simulation log set.
And performing statistical characteristic simulation on the first simulation log set according to the statistical characteristic rule of the target basic log sample to obtain a second simulation log set.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and preprocessing the original log sample to obtain a basic log sample, and storing the basic log sample into a basic log sample library.
And extracting a preset number of target basic log samples from the basic log sample library.
And copying the extracted target basic log samples to obtain simulation log samples with preset number.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and generating different identity identification fields according to a preset log identity identification generation rule and a preset random number generator.
And updating the generated identification field to the corresponding position of the identification field of the simulation log sample to obtain a first simulation log set.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and carrying out statistical characteristic analysis on the target basic log sample to obtain a reference statistical characteristic parameter of the target basic log sample.
And carrying out statistical characteristic analysis on the simulation log sample to obtain simulation statistical characteristic parameters of the simulation log sample.
And when the reference statistical characteristic parameters are inconsistent with the simulation statistical characteristic parameters, modifying the simulation log samples according to the distribution rule of the target basic log samples until the simulation statistical characteristic parameters are consistent with the reference statistical characteristic parameters, and obtaining a second simulation log set.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and when the statistical characteristic parameter of the reference log quantity is inconsistent with the statistical characteristic parameter of the simulated log quantity, increasing and decreasing the quantity of the simulated log samples according to the log quantity distribution rule of the target basic log sample until the statistical characteristic parameter of the simulated log quantity is consistent with the statistical characteristic parameter of the reference log quantity.
And when the statistical characteristic parameters of the reference field are inconsistent with the statistical characteristic parameters of the simulation field, generating a corresponding data set by using a random number generator according to the field distribution rule in the target basic log sample, and modifying the field of the simulation log sample until the statistical characteristic parameters of the simulation field are consistent with the statistical characteristic parameters of the reference field.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
according to a received log simulation request of a computer target operation scene, determining a log type contained in the computer target operation scene in a preset corresponding relation between the computer operation scene and a log type contained in the operation scene.
And calling generation rules and random number generators of different types of logs according to the log types, and performing different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and setting the same log label for the corresponding simulation log generated by each basic log sample according to the log label carried by each basic log sample in the target basic log sample, so that the simulation log samples in the simulation log set carry the log labels of the corresponding types, and the simulation log set comprises a first simulation log set and a second simulation log set.
After the generation rules and the random number generators of different types of logs are called according to the log types, different types of log simulation processing is carried out on the simulation logs in the second simulation log set, and a third simulation log set is obtained, the method further comprises the following steps:
and checking the log label carried by each simulation log in the third simulation log set according to the corresponding relation between the log type and the log label in the preset computer target operation scene.
And when the third simulation log set has the simulation log which does not carry the log label, performing log label adding on the simulation log according to the log type in the computer target operation scene.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A log simulation method, the method comprising:
acquiring a target basic log sample, and generating a preset number of simulation log samples according to the target basic log sample;
according to a preset log identity identification generation rule, performing identity identification simulation on each simulation log in the generated simulation log samples with the preset number to obtain a first simulation log set;
performing statistical characteristic analysis on the target basic log sample to obtain a reference statistical characteristic parameter of the target basic log sample;
carrying out statistical characteristic analysis on simulation log samples in the first simulation log set to obtain simulation statistical characteristic parameters of the simulation log samples;
and when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, and obtaining a second simulation log set.
2. The method of claim 1, wherein obtaining a target base log sample and generating a preset number of simulation log samples from the target base log sample comprises:
preprocessing an original log sample to obtain a basic log sample, and storing the basic log sample into a basic log sample library;
extracting a preset number of target basic log samples from the basic log sample library;
and copying the extracted target basic log samples to obtain simulation log samples with preset number.
3. The method according to claim 1, wherein the performing, according to a preset log identity generation rule, identity simulation on each of the simulation logs in the generated simulation log samples of a preset number to obtain a first simulation log set comprises:
generating different identity identification fields according to a preset log identity identification generation rule and a preset random number generator;
and updating the generated identification field to the corresponding position of the identification field of the simulation log sample to obtain a first simulation log set.
4. The method of claim 1, wherein the baseline statistical characteristic parameters comprise baseline log quantity statistical characteristic parameters and baseline field statistical characteristic parameters, and the simulated statistical characteristic parameters comprise simulated log quantity statistical characteristic parameters and simulated field statistical characteristic parameters; when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, including:
when the statistical characteristic parameter of the reference log quantity is inconsistent with the statistical characteristic parameter of the simulated log quantity, increasing or decreasing the number of the simulated log samples according to the log quantity distribution rule of the target basic log sample until the statistical characteristic parameter of the simulated log quantity is consistent with the statistical characteristic parameter of the reference log quantity;
and when the statistical characteristic parameters of the reference field are inconsistent with the statistical characteristic parameters of the simulation field, generating a corresponding data set by using a random number generator according to a field distribution rule in the target basic log sample, and modifying the fields of the simulation log sample until the statistical characteristic parameters of the simulation field are consistent with the statistical characteristic parameters of the reference field.
5. The method of claim 1, further comprising:
according to a received log simulation request of a computer target operation scene, determining a log type contained in the computer target operation scene in a preset corresponding relation between the computer operation scene and a log type contained in the operation scene;
and calling generation rules and random number generators of different types of logs according to the log types, and performing different types of log simulation processing on the simulation logs in the second simulation log set to obtain a third simulation log set.
6. The method of claim 5, further comprising:
setting the same log label for the corresponding simulation log generated by each basic log sample according to the log label carried by each basic log sample in the target basic log sample, so that the simulation log samples in the simulation log set carry the log labels of the corresponding types, and the simulation log set comprises the first simulation log set and the second simulation log set;
after the generating rule and the random number generator for calling different types of logs according to the log types are called, different types of log simulation processing is performed on the simulation logs in the second simulation log set, and a third simulation log set is obtained, the method further comprises the following steps:
according to the corresponding relation between the log type and the log label in the preset computer target operation scene, verifying the log label carried by each simulation log in the third simulation log set;
and when the third simulation log set has the simulation log which does not carry the log label, adding the log label to the simulation log according to the log type in the computer target operation scene.
7. An apparatus for log emulation, the apparatus comprising:
the acquisition module is used for acquiring a target basic log sample and generating a preset number of simulation log samples according to the target basic log sample;
the first simulation module is used for performing identity identification simulation on each simulation log in the generated simulation log samples with preset number according to a preset log identity identification generation rule to obtain a first simulation log set;
the second simulation module is used for carrying out statistical characteristic analysis on the target basic log sample to obtain a reference statistical characteristic parameter of the target basic log sample; carrying out statistical characteristic analysis on simulation log samples in the first simulation log set to obtain simulation statistical characteristic parameters of the simulation log samples; and when the reference statistical characteristic parameter is inconsistent with the simulation statistical characteristic parameter, modifying the simulation log sample according to the distribution rule of the target basic log sample until the simulation statistical characteristic parameter is consistent with the reference statistical characteristic parameter, and obtaining a second simulation log set.
8. The apparatus of claim 7, wherein the first emulation module is specifically configured to generate different id fields according to a preset log id generation rule and a preset random number generator;
and updating the generated identification field to the corresponding position of the identification field of the simulation log sample to obtain a first simulation log set.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202010860245.7A 2020-08-25 2020-08-25 Log simulation method and device, computer equipment and storage medium Active CN111737090B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010860245.7A CN111737090B (en) 2020-08-25 2020-08-25 Log simulation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010860245.7A CN111737090B (en) 2020-08-25 2020-08-25 Log simulation method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111737090A CN111737090A (en) 2020-10-02
CN111737090B true CN111737090B (en) 2020-12-01

Family

ID=72658777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010860245.7A Active CN111737090B (en) 2020-08-25 2020-08-25 Log simulation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111737090B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329273B (en) * 2020-12-17 2023-10-24 芯天下技术股份有限公司 Method and device for improving chip verification efficiency, storage medium and terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105635301A (en) * 2016-01-14 2016-06-01 郑州悉知信息科技股份有限公司 Access log merging method and log processing server and system
CN111125040A (en) * 2018-10-31 2020-05-08 华为技术有限公司 Method, apparatus and storage medium for managing redo log

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9501501B2 (en) * 2013-03-15 2016-11-22 Amazon Technologies, Inc. Log record management

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105635301A (en) * 2016-01-14 2016-06-01 郑州悉知信息科技股份有限公司 Access log merging method and log processing server and system
CN111125040A (en) * 2018-10-31 2020-05-08 华为技术有限公司 Method, apparatus and storage medium for managing redo log

Also Published As

Publication number Publication date
CN111737090A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN110309071B (en) Test code generation method and module, and test method and system
CN102054149B (en) Method for extracting malicious code behavior characteristic
CN109564608A (en) The virtual memory address of the target application function of updated version for application binary code is updated
CN108415826B (en) Application testing method, terminal device and computer readable storage medium
CN113158189B (en) Method, device, equipment and medium for generating malicious software analysis report
CN111475494A (en) Mass data processing method, system, terminal and storage medium
CN106445815A (en) Automated testing method and device
CN111881471A (en) Non-intrusive log data desensitization method, device and system
CN110765152B (en) SQL extraction method, SQL extraction device, computer equipment and storage medium
CN111488603A (en) Method and device for identifying sensitive content of printed file
CN111737090B (en) Log simulation method and device, computer equipment and storage medium
CN111260080A (en) Process optimization method, device, terminal and storage medium based on machine learning
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
CN108228611B (en) Document information copying method and device
CN112363939A (en) Method, system and equipment for quickly generating fuzzy test network protocol template
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CN113420288B (en) Container mirror image sensitive information detection system and method
CN117009972A (en) Vulnerability detection method, vulnerability detection device, computer equipment and storage medium
CN111859985A (en) AI customer service model testing method, device, electronic equipment and storage medium
CN114385722A (en) Interface attribute consistency checking method and device, electronic equipment and storage medium
CN113849785B (en) Mobile terminal information asset use behavior identification method for application program
JP2017207876A (en) Dump mask program, dump mask method, and information processing device
CN117971309A (en) Code annotation generation method and device, storage medium and electronic equipment
CN114489654A (en) Compiling method, device, equipment and storage medium
CN117851252A (en) Interface exception handling method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant