CN115757302B - Data analysis method, device, equipment and storage medium - Google Patents

Data analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN115757302B
CN115757302B CN202211338656.5A CN202211338656A CN115757302B CN 115757302 B CN115757302 B CN 115757302B CN 202211338656 A CN202211338656 A CN 202211338656A CN 115757302 B CN115757302 B CN 115757302B
Authority
CN
China
Prior art keywords
data
program
sample data
analyzed
log data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211338656.5A
Other languages
Chinese (zh)
Other versions
CN115757302A (en
Inventor
郭飞
刘焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211338656.5A priority Critical patent/CN115757302B/en
Publication of CN115757302A publication Critical patent/CN115757302A/en
Application granted granted Critical
Publication of CN115757302B publication Critical patent/CN115757302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The specification discloses a method, a device, equipment and a storage medium for data analysis, which can determine the log data of a program for executing the calling operation on the data to be analyzed from a large number of log data by determining the program for executing the calling operation on the data to be analyzed, so as to reduce the number of the log data needing to be screened, further screen target log data from candidate log data, and analyze the data to be analyzed according to the target log data.

Description

Data analysis method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for data analysis.
Background
With the development of internet technology, each internet service provider pays more and more attention to the privacy data of users involved in application services, and in order to protect the privacy data, each internet service provider needs to know which business operations in the application services call the privacy data of users.
The common method is to screen out the log data of the related personal privacy data from the log data recorded by the application service, so that the user privacy data can be called by the business operations in the application service according to the screened log, and the user privacy data can be protected.
However, a large amount of log data is typically recorded in the log of the application service, which makes the process of filtering each log data of the user privacy data related to the application service extremely difficult.
Disclosure of Invention
The present specification provides a method, apparatus, device, and storage medium for data analysis, so as to partially solve the problems existing in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a method of data analysis, comprising:
acquiring data to be analyzed;
determining a program for executing calling operation on the data to be analyzed;
determining the log data generated by executing the program from the log data as candidate log data;
and screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data, taking the candidate log data as target log data, and carrying out data analysis on the data to be analyzed according to the target log data.
Optionally, determining a tag of a program for executing a calling operation on the data to be analyzed specifically includes:
inputting the data to be analyzed into a pre-trained analysis model to determine a program for executing calling operation on the data to be analyzed through the analysis model.
Optionally, training the analysis model specifically includes:
constructing each sample data;
inputting the sample data into the analysis model for each sample data to determine a program for executing calling operation on the sample data through the analysis model as a program corresponding to the sample data;
and training the analysis model by taking the deviation between the program corresponding to the sample data and the program for actually executing the calling operation on the sample data as an optimization target.
Optionally, constructing each sample data specifically includes:
for any two original sample data, judging whether the two original sample data are matched according to the any two original sample data and a program corresponding to the any two original sample data;
if yes, carrying out normalization processing on the program information of the programs corresponding to the arbitrary two original sample data to obtain normalized program information;
and taking the program corresponding to the normalized program information as the program corresponding to any two original sample data to obtain each sample data.
Optionally, constructing each sample data specifically includes:
determining programs corresponding to the original sample data as target programs;
judging whether the number of the original sample data corresponding to each target program exceeds a preset threshold value or not according to each target program;
if yes, splitting the target program into subprograms, and determining each piece of original sample data corresponding to each subprogram;
and constructing each sample data according to the subprogram corresponding to each original sample data.
Optionally, constructing each sample data specifically includes:
acquiring original log data;
normalizing the data format of the data contained in each original log data to obtain each processed log data;
and constructing each sample data according to each processed log data.
Optionally, candidate log data generated by the program calling the data to be analyzed is selected from the candidate log data as target log data, which specifically includes:
extracting characteristic representations of the data to be analyzed and characteristic representations of each candidate log data through the analysis model;
and screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data according to the similarity between the characteristic representation of the data to be analyzed and the characteristic representation of each candidate log data, and taking the candidate log data as target log data.
The present specification provides an apparatus for data analysis, comprising:
the acquisition module is used for acquiring data to be analyzed;
the determining module is used for determining a program for executing calling operation on the data to be analyzed;
the matching module is used for determining the log data generated by executing the program from the log data to serve as candidate log data;
and the execution module is used for screening candidate log data generated by the program calling the data to be analyzed from the candidate log data, taking the candidate log data as target log data, and carrying out data analysis on the data to be analyzed according to the target log data.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor performs a method of data analysis as described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of data analysis as described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the method for analyzing data provided in the present specification, first, data to be analyzed is obtained, a program for executing a calling operation on the data to be analyzed is determined, log data generated by executing the program for executing the calling operation on the data to be analyzed is determined from each log data, the log data are used as candidate log data, candidate log data generated by calling the data to be analyzed by the program for executing the calling operation on the data to be analyzed are screened from the candidate log data, the candidate log data are used as target log data, and data analysis is performed on the data to be analyzed according to the target log data.
According to the method, the program for executing the calling operation on the data to be analyzed can be determined, so that the log data of the program for executing the calling operation on the data to be analyzed can be determined from a large amount of log data, the number of the log data to be screened is reduced, the target log data can be screened from the candidate log data, and the data to be analyzed is analyzed according to the target log data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at
In the figure:
FIG. 1 is a flow chart of a method of data analysis provided in the present specification;
FIG. 2 is a schematic diagram of a method for determining target log data provided in the present specification;
FIG. 3 is a schematic diagram of an apparatus for data analysis provided herein;
fig. 4 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for data analysis provided in the present specification, which includes the following steps:
s100: and acquiring data to be analyzed.
In this specification, the service platform may obtain the data to be analyzed, and further may screen log data related to the data to be analyzed from the log data of the service platform, as target log data, and analyze the data to be analyzed according to the target log data, where the log data may be such as: log data corresponding to operations such as data call, data processing and the like recorded by programs such as a system, an application, an interface and the like.
The data to be analyzed may be data that needs to be analyzed according to the service requirement, for example: personal privacy data of the user, etc.
In addition, because the acquired data to be analyzed may have a disordered data format, and the data content may have a disordered code and other problems, the service platform can clean the data after acquiring the data to be analyzed, so as to acquire the cleaned data to be analyzed, and further analyze the cleaned data to be analyzed. The method of data cleansing here may be such as: removing messy codes in the data to be analyzed, adjusting data with inconsistent formats such as time, date, numerical value, full-half-angle symbol and the like in the data to be analyzed, filling missing values in the data to be analyzed and the like.
In the present specification, the execution subject of the method for implementing data analysis may refer to a designated device such as a server provided on a service platform, or may refer to a terminal device such as a desktop computer or a notebook computer, and for convenience of description, the method for implementing data analysis provided in the present specification will be described below using only the server as an example of the execution subject.
S102: and determining a program for executing calling operation on the data to be analyzed.
After the server obtains the data to be analyzed, a program for executing the calling operation on the data to be analyzed can be determined according to a preset business rule, wherein the business rule can be a corresponding relation between each data to be analyzed and each program determined according to each program corresponding to each historical data to be analyzed.
In addition, after the server obtains the data to be analyzed, the data to be analyzed may be input into a pre-trained analysis model, so as to extract a feature representation corresponding to the data to be analyzed through the analysis model, and determine a program for executing a calling operation on the data to be analyzed according to the extracted feature representation, so that the program for executing the calling operation on the data to be analyzed may be determined according to the determined program, where the program for executing the calling operation on the data to be analyzed may refer to a system program, an application program, an interface program, a method program, and the like.
In the foregoing, the training method of the analysis model may be that each sample data is constructed, the sample data is input into the analysis model for each sample data, a program for performing a calling operation on the sample data is determined by the analysis model, the program is used as a program corresponding to the sample data, a deviation between the program corresponding to the sample data and a program for actually performing the calling operation on the sample data is minimized as an optimization target, the analysis model is trained, and the sample data refers to at least part of data in each log data acquired by the server, and program information of the program for performing the calling operation on the part of data.
The method for constructing each sample data may be that at least part of data in each log data of a history record is obtained and used as original sample data, a procedure for executing calling operation on each original sample data is determined and used as a procedure corresponding to each original sample data, for any two original sample data, whether the two original sample data are matched or not is judged according to the any two original sample data and the procedure information of the procedure corresponding to the any two original sample data, if yes, the procedure information of the procedure corresponding to the two original sample data is normalized, normalized procedure information is obtained, and the procedure corresponding to the normalized procedure information is used as the procedure corresponding to the any two original sample data, so as to obtain each sample data.
It should be noted that, since the program information may refer to the program name of the program, different program developers may name the same program into different program names, for example: the name of the method for obtaining the user number can be getuser_id, getuser_no, getuser_id and other naming modes, and the naming modes are all essentially programs of the method for obtaining the code of the user, only because naming habits of different developers are different.
Therefore, when the difference between the two program information is determined to be small, and when the similarity between the original sample data corresponding to the two program information is determined to be large, the server can normalize the two program information to obtain normalized program information, and further can use the two programs corresponding to the normalized program information as the programs corresponding to the two original sample data.
In the above description, the normalization of the program information is not actually performed on the program information such as the program name of the program, but the normalization processing is performed by extracting the program information and then using the extracted program information together with the original sample data as the sample data, and it is understood that the program information is used as a label for indicating a program that performs a calling operation on at least part of the log data included in the sample data.
Further, in the foregoing, the server may determine, according to any two pieces of original sample data and program information of programs corresponding to any two pieces of original sample data, whether the two pieces of original sample data match with each other or not by using a preset edit distance algorithm, to determine an edit distance between the program information of the programs corresponding to any two pieces of original sample data, and if the determined edit distance between the two pieces of program information is smaller than a preset first threshold, and the determined edit distance between the two pieces of original sample data is smaller than a preset second threshold, consider that the two pieces of original sample data match with each other.
In the above, the edit distance is the minimum number of steps required for editing (e.g., inserting, deleting, replacing) a single character by a pointer for any two character strings to convert one character string into another.
In addition, since the program corresponding to the original sample data determined here can be used for training the preset analysis model, the quantity of the original sample data corresponding to each program should be balanced as much as possible, so that the effect of training the model to be analyzed is better, and therefore, the server can split the program originally corresponding to a plurality of original sample data into a plurality of subprograms, so that the difference value between the quantity of the original sample data corresponding to each program is smaller.
Specifically, the server may further determine a program corresponding to each original sample data, as each target program, where the program corresponding to each original sample data refers to a program that executes a calling operation on each original sample data, and for each target program, determine whether the number of each original sample data corresponding to the target program exceeds a preset third threshold, if yes, split the target program into each subroutine, determine each original sample data corresponding to each subroutine, and construct each sample data according to the subroutine corresponding to each original sample data.
Further, the server may further determine, for each target program, whether the number of the original sample data corresponding to the target program is lower than a preset fourth threshold, if so, fuse the original sample data corresponding to the target program and the original sample data corresponding to the target program, where the number of the other original sample data is lower than the preset fourth threshold, and determine the programs corresponding to the original sample data corresponding to the two target programs.
For example: assuming that there are three original sample data of which the program getUseName corresponds to numbers 1 to 3 and three original sample data of which the program sum corresponds to numbers 4 to 6, the three original sample data of which the program getUseName corresponds and the three original sample data of which the program sum corresponds may be regarded as one and the same set of original sample data, and getUserName and sum may be regarded as programs corresponding to the six original sample data.
In addition, since the original log data acquired by the server may be log data of different hardware devices and different software, the acquired log data formats are not uniform, and therefore, the server may normalize the data formats of the data included in each original log data to obtain each processed log data, and construct each sample data according to each processed log data and at least part of the data in each processed log data.
S104: and determining the log data generated by executing the program from the log data as candidate log data.
In this specification, the server may determine, according to the determination of the program for executing the call operation on the data to be analyzed, and according to whether the program for executing the call operation on the data to be analyzed is consistent with the program corresponding to each log data, from each log data, the log data generated by the program for executing the call operation on the data to be analyzed, as each candidate log data.
It should be noted that, the format of each log data acquired by the server may not be uniform, and each acquired log data may include more interference data, for example: the storage location information of the log data, the storage time information of the log data, etc., but these interference data may not be used when the server determines the log data generated by calling the program for executing the calling operation on the data to be analyzed, but rather some interference exists, so that part of the log data may be extracted, and the information of the program calling data corresponding to the log data recorded in the log data may be extracted in the form of key value pairs, for example: assuming that a piece of log data is an application program which calls account password data of a user from a user table of data, the log data can be extracted as the application program which calls the data and the account password data.
S106: and screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data, taking the candidate log data as target log data, and carrying out data analysis on the data to be analyzed according to the target log data.
The server can extract the characteristic representation of the data to be analyzed and the characteristic representation of each candidate log data through an analysis model, screen the candidate log data generated by calling the data to be analyzed by the program from each candidate log data according to the similarity between the characteristic representation of the data to be analyzed and the characteristic representation of each candidate log data, serve as target log data, and conduct data analysis on the data to be analyzed according to the screened target log data.
It should be noted that, the target log data selected by the server is the log data corresponding to the call operation for calling the data to be analyzed, so that detailed information of the call operation for calling the data to be analyzed can be determined according to the target log data, for example: the time of the call operation, the call method of the call operation, the data to be analyzed from where the call operation is called, and the like, and execute corresponding tasks according to the determined detailed information, for example: and analyzing whether the data to be analyzed has the possibility of data leakage in the called process according to the determined detailed information, so that the places with the possibility of data leakage can be repaired and the like.
In the above description, there may be many methods for the server to screen out the target log data from the candidate log data, for example: a neighbor vector matching algorithm (Approximate Nearest Neighbor, ANN), a tree search algorithm, etc.
In order to further describe the above details, the present disclosure also provides a schematic diagram of a method for determining the target log data, as shown in fig. 2.
FIG. 2 is a schematic diagram of a method for determining target log data provided in the present specification.
As can be seen from fig. 2, after the server obtains the data to be analyzed, the server may clean the data to be analyzed, remove the data such as the messy codes in the data to be analyzed, and further input the cleaned data to be analyzed into a preset analysis model.
According to a preset analysis model, a program for executing calling operation on data to be analyzed is analyzed, feature representations corresponding to the data to be analyzed are extracted, candidate logs can be determined from a large amount of log data according to the determination, and target log data can be determined according to the similarity between the feature representations corresponding to the extracted data to be analyzed and the feature representations corresponding to the candidate log data extracted in advance through the analysis model.
From the above, it can be seen that a program for executing a call operation on data to be analyzed can be determined through a preset analysis model, so that each candidate log data related to the analyzed program can be selected from a large number of log data according to the analyzed program, so as to reduce the number of log data to be screened, further, target log data can be screened from each candidate log data, and the data to be analyzed can be analyzed according to the target log data.
The above method for data analysis provided for one or more embodiments of the present disclosure further provides a corresponding apparatus for data analysis based on the same concept, as shown in fig. 3.
Fig. 3 is a schematic diagram of an apparatus for data analysis provided in the present specification, including:
an acquisition module 301, configured to acquire data to be analyzed;
a determining module 302, configured to determine a program that performs a calling operation on the data to be analyzed;
a matching module 303, configured to determine, from among the log data, log data generated by executing the program, as candidate log data;
and the execution module 304 is configured to screen candidate log data generated by the program calling the data to be analyzed from the candidate log data, and perform data analysis on the data to be analyzed according to the target log data.
Optionally, the determining module 302 is specifically configured to input the data to be analyzed into a pre-trained analysis model, so as to determine, through the analysis model, a procedure for executing a calling operation on the data to be analyzed.
Optionally, the apparatus further comprises: a training module 305;
the training module 305 is specifically configured to construct each sample data; inputting the sample data into the analysis model for each sample data to determine a program for executing calling operation on the sample data through the analysis model as a program corresponding to the sample data; and training the analysis model by taking the deviation between the program corresponding to the sample data and the program for actually executing the calling operation on the sample data as an optimization target.
Optionally, the apparatus further comprises: a build module 306;
the construction module 306 is specifically configured to determine, for any two pieces of original sample data, whether the two pieces of original sample data are matched according to the two pieces of original sample data and a program corresponding to the two pieces of original sample data; if yes, carrying out normalization processing on the program information of the programs corresponding to the arbitrary two original sample data to obtain normalized program information; and taking the program corresponding to the normalized program information as the program corresponding to any two original sample data to obtain each sample data.
Optionally, the construction module 306 is specifically configured to determine a program corresponding to each original sample data as each target program; judging whether the number of the original sample data corresponding to each target program exceeds a preset threshold value or not according to each target program; if yes, splitting the target program into subprograms, and determining each piece of original sample data corresponding to each subprogram; and constructing each sample data according to the subprogram corresponding to each original sample data.
Optionally, the construction module 306 is specifically configured to obtain each original log data; normalizing the data format of the data contained in each original log data to obtain each processed log data; and constructing each sample data according to each processed log data.
Optionally, the executing module 304 is specifically configured to extract, through the analysis model, a feature representation of the data to be analyzed and a feature representation of each candidate log data; and screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data according to the similarity between the characteristic representation of the data to be analyzed and the characteristic representation of each candidate log data, and taking the candidate log data as target log data.
The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of data analysis as provided in figure 1 above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 4. At the hardware level, as in fig. 4, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, although it may include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs to implement the method of data analysis of fig. 1 described above. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely an example of the present specification and is not intended to limit the present specification. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (7)

1. A method of data analysis, the method comprising:
acquiring data to be analyzed;
inputting the data to be analyzed into a pre-trained analysis model to determine a program for executing calling operation on the data to be analyzed through the analysis model;
determining the log data generated by executing the program from the log data as candidate log data;
screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data to be used as target log data, and carrying out data analysis on the data to be analyzed according to the target log data;
training the analysis model specifically comprises: constructing each sample data, inputting the sample data into the analysis model, determining a program for executing calling operation on the sample data through the analysis model as a program corresponding to the sample data, and training the analysis model by taking the deviation between the program corresponding to the sample data and the program for actually executing the calling operation on the sample data as an optimization target;
constructing each sample data specifically comprises the following steps: determining programs corresponding to original sample data as target programs, judging whether the number of the original sample data corresponding to each target program exceeds a preset threshold value for each target program, if so, splitting the target program into subprograms, determining the original sample data corresponding to each subprogram, and constructing the sample data according to the subprograms corresponding to the original sample data.
2. The method of claim 1, constructing each sample data, specifically comprising:
for any two original sample data, judging whether the two original sample data are matched according to the any two original sample data and a program corresponding to the any two original sample data;
if yes, carrying out normalization processing on the program information of the programs corresponding to the arbitrary two original sample data to obtain normalized program information;
and taking the program corresponding to the normalized program information as the program corresponding to any two original sample data to obtain each sample data.
3. The method of claim 1, constructing each sample data, specifically comprising:
acquiring original log data;
normalizing the data format of the data contained in each original log data to obtain each processed log data;
and constructing each sample data according to each processed log data.
4. The method of claim 1, wherein the screening the candidate log data generated by the program calling the data to be analyzed from the candidate log data as target log data specifically comprises:
extracting characteristic representations of the data to be analyzed and characteristic representations of each candidate log data through the analysis model;
and screening candidate log data generated by calling the data to be analyzed by the program from the candidate log data according to the similarity between the characteristic representation of the data to be analyzed and the characteristic representation of each candidate log data, and taking the candidate log data as target log data.
5. An apparatus for data analysis, comprising:
the acquisition module is used for acquiring data to be analyzed;
the determining module is used for inputting the data to be analyzed into a pre-trained analysis model so as to determine a program for executing calling operation on the data to be analyzed through the analysis model;
the matching module is used for determining the log data generated by executing the program from the log data to serve as candidate log data;
the execution module is used for screening candidate log data generated by the program calling the data to be analyzed from the candidate log data, taking the candidate log data as target log data, and carrying out data analysis on the data to be analyzed according to the target log data;
the training module is used for constructing each sample data, inputting the sample data into the analysis model, determining a program for executing calling operation on the sample data through the analysis model as a program corresponding to the sample data, and training the analysis model by taking the deviation between the program corresponding to the sample data and the program for actually executing the calling operation on the sample data as an optimization target;
the construction module is used for determining programs corresponding to the original sample data, judging whether the number of the original sample data corresponding to each target program exceeds a preset threshold value according to each target program, if so, splitting the target program into subprograms, determining the original sample data corresponding to each subprogram, and constructing the sample data according to the subprograms corresponding to the original sample data.
6. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-4.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-4 when executing the program.
CN202211338656.5A 2022-10-28 2022-10-28 Data analysis method, device, equipment and storage medium Active CN115757302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211338656.5A CN115757302B (en) 2022-10-28 2022-10-28 Data analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211338656.5A CN115757302B (en) 2022-10-28 2022-10-28 Data analysis method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115757302A CN115757302A (en) 2023-03-07
CN115757302B true CN115757302B (en) 2023-06-27

Family

ID=85355954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211338656.5A Active CN115757302B (en) 2022-10-28 2022-10-28 Data analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115757302B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650446A (en) * 2016-12-26 2017-05-10 北京邮电大学 Identification method and system of malicious program behavior, based on system call
CN110502269A (en) * 2019-07-24 2019-11-26 深圳壹账通智能科技有限公司 Application program optimization method, equipment, storage medium and device
CN113971282A (en) * 2020-07-24 2022-01-25 武汉安天信息技术有限责任公司 AI model-based malicious application program detection method and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582650B (en) * 2018-11-09 2021-05-25 金色熊猫有限公司 Module calling amount processing method and device, electronic equipment and storage medium
CN112307191A (en) * 2020-11-03 2021-02-02 平安普惠企业管理有限公司 Multi-system interactive log query method, device, equipment and storage medium
CN114780370A (en) * 2022-05-10 2022-07-22 中国平安财产保险股份有限公司 Data correction method and device based on log, electronic equipment and storage medium
CN115061874A (en) * 2022-06-14 2022-09-16 中国工商银行股份有限公司 Log information verification method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650446A (en) * 2016-12-26 2017-05-10 北京邮电大学 Identification method and system of malicious program behavior, based on system call
CN110502269A (en) * 2019-07-24 2019-11-26 深圳壹账通智能科技有限公司 Application program optimization method, equipment, storage medium and device
CN113971282A (en) * 2020-07-24 2022-01-25 武汉安天信息技术有限责任公司 AI model-based malicious application program detection method and equipment

Also Published As

Publication number Publication date
CN115757302A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN110569428B (en) Recommendation model construction method, device and equipment
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
CN111966334A (en) Service processing method, device and equipment
CN116049761A (en) Data processing method, device and equipment
CN111488510B (en) Method and device for determining related words of applet, processing equipment and search system
CN116402165B (en) Operator detection method and device, storage medium and electronic equipment
CN115757302B (en) Data analysis method, device, equipment and storage medium
CN112231531A (en) Data display method, equipment and medium based on openstb
CN115878654A (en) Data query method, device, equipment and storage medium
CN115391426A (en) Data query method and device, storage medium and electronic equipment
CN110704742B (en) Feature extraction method and device
CN111242195B (en) Model, insurance wind control model training method and device and electronic equipment
CN113435950A (en) Bill processing method and device
CN110245136B (en) Data retrieval method, device, equipment and storage equipment
CN116340469B (en) Synonym mining method and device, storage medium and electronic equipment
CN117076650B (en) Intelligent dialogue method, device, medium and equipment based on large language model
CN117252183B (en) Semantic-based multi-source table automatic matching method, device and storage medium
CN117035695B (en) Information early warning method and device, readable storage medium and electronic equipment
CN117421214A (en) Batch counting method, device, electronic equipment and computer readable storage medium
CN116595969A (en) Text generation method and device, storage medium and electronic equipment
CN116822606A (en) Training method, device, equipment and storage medium of anomaly detection model
CN117593003A (en) Model training method and device, storage medium and electronic equipment
CN117931672A (en) Query processing method and device applied to code change
CN117828360A (en) Model training method, model training device, model code generating device, storage medium and storage medium
CN117591217A (en) Information display method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant