CN116431698B - Data extraction method, device, equipment and storage medium - Google Patents

Data extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116431698B
CN116431698B CN202310118319.3A CN202310118319A CN116431698B CN 116431698 B CN116431698 B CN 116431698B CN 202310118319 A CN202310118319 A CN 202310118319A CN 116431698 B CN116431698 B CN 116431698B
Authority
CN
China
Prior art keywords
log data
instruction
spl
field
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310118319.3A
Other languages
Chinese (zh)
Other versions
CN116431698A (en
Inventor
张大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youtejie Information Technology Co ltd
Original Assignee
Beijing Youtejie Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youtejie Information Technology Co ltd filed Critical Beijing Youtejie Information Technology Co ltd
Priority to CN202310118319.3A priority Critical patent/CN116431698B/en
Publication of CN116431698A publication Critical patent/CN116431698A/en
Application granted granted Critical
Publication of CN116431698B publication Critical patent/CN116431698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a data extraction method, a device, equipment and a storage medium. Comprising the following steps: acquiring log data to be extracted, and determining the field position of each first appointed field in the log data to be extracted; generating respective definition spl instructions according to the acquired instruction configuration information and each field position, wherein the custom spl instructions comprise the field positions; determining a target spl instruction from the self-defined spl instruction, and cutting log data to be extracted based on the target spl instruction to obtain target log data. The field position of each first designated field in the log data to be extracted and the acquired instruction configuration information are determined to generate each defined spl instruction, the target spl instruction is determined, the log data to be extracted can be cut through the target spl instruction, the target log data can be acquired, the data extraction of multiple scenes can be realized, and the daily operation and maintenance analysis and business analysis work requirements of users are better met.

Description

Data extraction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data extraction method, apparatus, device, and storage medium.
Background
The business system refers to a system for enterprises to operate and manage and the like to meet business requirements, and the types of the business system can include: sales and marketing information systems, manufacturing and production information systems, financial and meeting information systems, human resource information systems and the like, a large amount of business data of enterprises are stored in business systems, and reliable reference can be provided for the enterprises to find economic growth points and optimize operation strategies by analyzing the business systems.
At present, when a service system is docked, an application log of the service system needs to be acquired, and then the application log is processed through a built-in function so as to extract key data required by a user.
However, the application log is often massive data, and some special scenes cannot be completed through the function command built in the prior art, for example, when other systems need to be docked during data processing, when data processing is needed, data needs to be processed in a recursive mode or some special data analysis needs need to be used, so that the daily operation and maintenance analysis and business analysis work needs of a user cannot be met.
Disclosure of Invention
The invention provides a data extraction method, a device, equipment and a storage medium, which are used for realizing data extraction under different service scenes.
According to an aspect of the present invention, there is provided a data extraction method, the method comprising:
acquiring log data to be extracted, and determining the field position of each first appointed field in the log data to be extracted;
generating respective definition spl instructions according to the acquired instruction configuration information and each field position, wherein the custom spl instructions comprise the field positions;
determining a target spl instruction from the self-defined spl instruction, and cutting log data to be extracted based on the target spl instruction to obtain target log data.
Optionally, obtaining log data to be extracted includes: determining a starting identifier and an ending identifier of original log data according to the acquired line feed rule, and determining a starting position and an ending position of each log event in the original log data according to the starting identifier and the ending identifier; taking the end position of the previous original log data and the start position of the next original log data in each adjacent log event as splicing points; and merging the log data contained in each adjacent log event according to the splice points to generate the log data to be extracted.
Optionally, determining the field position of each first designated field in the log data to be extracted includes: screening the log data to be extracted according to the first specified field to obtain associated log data matched with the first specified field; and taking the corresponding position of the associated log data in the log data to be extracted as a field position.
Optionally, generating the spl defining instruction according to the obtained instruction configuration information and the field positions includes: determining an instruction type contained in instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type; and combining the configuration information and the field positions of each instruction with the corresponding spl template according to the instruction type to generate each defined spl instruction.
Optionally, cutting the log data to be extracted based on the target spl instruction to obtain target log data includes: determining a cutting position according to a field position contained in the target spl instruction; cutting the log data to be extracted according to the cutting position to obtain cutting log data, and generating target log data according to the cutting log data.
Optionally, generating the target log data according to the cut log data includes: determining second designated fields contained in the cut log data, and carrying out standardization processing on each second designated field to generate renamed fields; replacing a corresponding second designated field in the cutting log data according to the renamed field to generate replaced cutting log data; and taking the replaced cutting log data as target log data.
Optionally, after cutting the log data to be extracted based on the target spl instruction to obtain target log data, the method further includes: according to a preset arrangement rule, arranging the target log data to generate a log data list; and displaying the log data list.
According to another aspect of the present invention, there is provided a data extraction apparatus comprising:
the log data acquisition and field position determination module is used for acquiring log data to be extracted and determining the field position of each first designated field in the log data to be extracted;
the custom spl instruction generation module is used for generating a custom spl instruction according to the acquired instruction configuration information and each field position, wherein the custom spl instruction comprises the field positions;
the target log data acquisition module is used for determining a target spl instruction from the custom spl instructions, and cutting log data to be extracted based on the target spl instruction to acquire target log data.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data extraction method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data extraction method according to any one of the embodiments of the present invention.
According to the technical scheme, the field positions of the first designated fields in the log data to be extracted and the acquired instruction configuration information are determined to generate the defined spl instructions, the target spl instructions are determined from the defined spl instructions, the target log data can be acquired by cutting the log data to be extracted through the target spl instructions, multi-scene data extraction can be realized, and further daily operation and maintenance analysis and business analysis work requirements of users are better met.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data extraction method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of another data extraction method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data extraction device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a data extraction method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data extraction method according to an embodiment of the present invention, where the method may be performed by a data extraction device, and the data extraction device may be implemented in hardware and/or software, and the data extraction device may be configured in a computer. As shown in fig. 1, the method includes:
s110, acquiring log data to be extracted, and determining the field positions of first designated fields in the log data to be extracted.
The log data refers to procedural event record data generated in the running process of servers, network equipment, security equipment, databases, middleware and business systems in an IT production environment. The log data to be extracted refers to log data generated by combining the collected original log data by the controller. The designated field refers to an upper node related field designated by a user, and the user refers to a worker or technician who performs data extraction. The field position refers to a field position containing a first specified field in the log data to be extracted, and the field position can be an area formed by a plurality of coordinate points, and the position of the first specified field in the log data to be extracted can be determined through the coordinate points.
Optionally, obtaining log data to be extracted includes: determining a starting identifier and an ending identifier of original log data according to the acquired line feed rule, and determining a starting position and an ending position of each log event in the original log data according to the starting identifier and the ending identifier; taking the end position of the previous original log data and the start position of the next original log data in each adjacent log event as splicing points; and merging the log data contained in each adjacent log event according to the splice points to generate the log data to be extracted.
Specifically, when the controller acquires the logs to be extracted, the controller firstly acquires the original log data with massive unstructured features, configures the acquired original log data according to a line feed rule, namely, the controller determines the starting position of each log event in the original log data according to the starting mark of the original log, the position of the starting mark in the original log data is the starting position, and correspondingly, the controller determines the ending position of each log event in the original log data according to the ending mark of the original log, and the position of the ending mark in the original log data is the ending position. The start identifier and the end identifier are set by the user in the line feed rule according to a log type, and the log type may include an internet application log, an instant messaging log, a data block log, an attack/scan log, a file transfer log, a remote control log, a mail log, and the like, which is not limited in this embodiment.
Further, after determining the start position and the end position, the controller takes the end position of the previous original log data and the start position of the next original log data in each adjacent log event as a splicing point, and then merges the log data contained in each adjacent log event into one line of data according to the splicing point to generate the log data to be extracted.
Optionally, determining the field position of each first designated field in the log data to be extracted includes: screening the log data to be extracted according to the first specified field to obtain associated log data matched with the first specified field; and taking the corresponding position of the associated log data in the log data to be extracted as a field position.
Specifically, the first specified field refers to a field related to the upper node pbu _id set by the user, for example, the first specified field may be a field such as type, ip, port, block, sender _ compid, target _ compid, version. The controller may filter the log data to be extracted according to the first specified field, determine associated log data matched with the first specified field according to the range set by the user, then determine a corresponding position of the associated log data in the log data to be extracted, where the corresponding position may be coordinate information of the associated log data, and the controller uses the coordinate information as a field position. For example, the first specified field "ip" may determine that the associated log data is "ip=001", and the controller may use the corresponding position of "ip=001" as the field position of the first specified field "ip", that is, the associated log data of "ip=001" may be located by the field position of the first specified field "ip".
S120, generating respective definition spl instructions according to the acquired instruction configuration information and the field positions.
Specifically, the instruction configuration information refers to information input by a user when the user performs instruction registration on the client instruction configuration page, the instruction configuration information comprises an instruction name, a program name, a resource tag, an application, a program parameter, batch execution and an instruction type, and the custom spl instruction refers to custom spl instructions generated based on the instruction configuration information input by the user and field positions, and the number of custom spl instructions is multiple and can be selected by the user. Spl is a programming language for structured data computation, and has been innovated for SQL deficiency, redefining and expanding operations in many structured data, increasing discreteness, enhancing ordered computation, realizing thorough aggregation, supporting object reference, and advocating step-by-step operations.
The instruction name refers to a name used in spl, the program name refers to a program name uploaded by a client selected by a user, and it should be noted that a developer may write a custom program through log easy sdk for the user to select, so that flexible analysis of data may be realized. The resource tag can be selected or newly built, the program parameters are parameter names supported in the py file when the instruction support parameters are created, the multiple parameters are divided by commas, the built-in program is started once each time when the batch execution is started, the built-in program is started once after the batch execution is disabled, and then subsequent data are executed through thread communication. The instruction types are four in number, including generating data commands, distributable processing commands, centralized processing commands, and format conversion commands. Wherein the generate data command is used to generate data, typically the first command of a command or sub-command, such as makerests, times, etc.; the distributable processing command is used for executing commands without context dependence, such as eval, parse and the like, row by row; the centralized processing command is used for executing the order depending on the input data, and the command needing centralized processing, such as sort, stats and the like; the format conversion command is used to process the input command as a whole, requiring all data to produce a result, such as a transfer, a reduce, etc.
Optionally, generating the spl defining instruction according to the obtained instruction configuration information and the field positions includes: determining an instruction type contained in instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type; and combining the configuration information and the field positions of each instruction with the corresponding spl template according to the instruction type to generate each defined spl instruction.
Specifically, the user can register the custom spl instruction by clicking "new" on the client instruction configuration page, after the user inputs the instruction configuration information, the user can click "application" new instruction, at this time, the controller can determine the instruction type contained in the instruction configuration information, then combine the instruction name, the program name, the resource tag, the belonging application, the program parameters, batch execution and the field position in the instruction configuration information with the corresponding spl template according to the instruction type to generate the custom spl instruction, and the unstructured data can be processed into the structured data through the custom spl instruction. Furthermore, the user can also realize the functions of creating, editing, authorizing and deleting the instruction on the instruction configuration page.
S130, determining a target spl instruction from the self-defined spl instruction, and cutting log data to be extracted based on the target spl instruction to obtain target log data.
Optionally, after cutting the log data to be extracted based on the target spl instruction to obtain target log data, the method further includes: according to a preset arrangement rule, arranging the target log data to generate a log data list; and displaying the log data list.
The target spl instruction is an instruction selected by a user from the alternative spl to perform processing on log data to be extracted, and the controller can determine a field position of corresponding target log data in the log data to be extracted based on the target spl instruction, and then cut a log which is not well processed by field extraction to extract the target log data.
According to the technical scheme, the field positions of the first designated fields in the log data to be extracted and the acquired instruction configuration information are determined to generate the defined spl instructions, the target spl instructions are determined from the defined spl instructions, the target log data can be acquired by cutting the log data to be extracted through the target spl instructions, multi-scene data extraction can be realized, and further daily operation and maintenance analysis and business analysis work requirements of users are better met.
Example two
Fig. 2 is a flowchart of a data extraction method according to a second embodiment of the present invention, where a specific process of cutting log data to be extracted based on a target spl instruction to obtain target log data is added based on the first embodiment. The specific contents of steps S210 to S220 are substantially the same as steps S110 to S120 in the first embodiment, and thus, a detailed description is omitted in this embodiment. As shown in fig. 2, the method includes:
s210, acquiring log data to be extracted, and determining the field positions of first designated fields in the log data to be extracted.
Optionally, obtaining log data to be extracted includes: determining a starting identifier and an ending identifier of original log data according to the acquired line feed rule, and determining a starting position and an ending position of each log event in the original log data according to the starting identifier and the ending identifier; taking the end position of the previous original log data and the start position of the next original log data in each adjacent log event as splicing points; and merging the log data contained in each adjacent log event according to the splice points to generate the log data to be extracted.
Optionally, determining the field position of each first designated field in the log data to be extracted includes: screening the log data to be extracted according to the first specified field to obtain associated log data matched with the first specified field; and taking the corresponding position of the associated log data in the log data to be extracted as a field position.
S220, generating respective definition spl instructions according to the acquired instruction configuration information and the field positions.
Optionally, generating the spl defining instruction according to the obtained instruction configuration information and the field positions includes: determining an instruction type contained in instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type; and combining the configuration information and the field positions of each instruction with the corresponding spl template according to the instruction type to generate each defined spl instruction.
S230, determining a target spl instruction from the custom spl instruction.
S240, determining the cutting position according to the field position contained in the target spl instruction.
S250, cutting the log data to be extracted according to the cutting position to obtain cutting log data, and generating target log data according to the cutting log data.
Specifically, because the target spl instruction includes the field position of the target log data, the controller can execute the target spl instruction to cut the log data to be extracted to obtain the target log data, and further, the controller can count the number of the target log data and display each target log data in a list form. For example, the multi-value field raw_message may be partitioned into a plurality of strings with a "\" symbol using mvjoin function at the time of the cut.
Optionally, generating the target log data according to the cut log data includes: determining second designated fields contained in the cut log data, and carrying out standardization processing on each second designated field to generate renamed fields; replacing a corresponding second designated field in the cutting log data according to the renamed field to generate replaced cutting log data; and taking the replaced cutting log data as target log data.
Specifically, the second designated field is a field which is set by the user and needs to be renamed, and through renaming the second designated field, the log data after cutting can be more standard. The controller determines second designated fields contained in the cut log data, then performs standardization processing on the second designated fields to generate renamed fields, wherein the standardization processing refers to a renaming rule set by a user, the second designated fields and corresponding renamed fields can be determined according to the renaming rule, then the corresponding second designated fields in the cut log data are replaced according to the renaming fields to generate replaced cut log data, and the replaced cut log data are used as target log data. For example, the renaming rule may be to replace "from" with "source", where the controller may use "from" as the second designated field, then replace "from" contained in the cut log data with "source", and use the replaced cut log data as the target log data.
Optionally, after cutting the log data to be extracted based on the target spl instruction to obtain target log data, the method further includes: according to a preset arrangement rule, arranging the target log data to generate a log data list; and displaying the log data list.
According to the technical scheme, the field positions of the first designated fields in the log data to be extracted and the acquired instruction configuration information are determined to generate the defined spl instructions, the target spl instructions are determined from the defined spl instructions, the cutting positions are determined according to the field positions contained in the target spl instructions, the log data to be extracted can be cut to obtain the cutting log data, and the cutting log data can be renamed to generate the target log data, so that multi-scene data extraction can be realized, and further the daily operation and maintenance analysis and business analysis work requirements of users can be better met.
Example III
Fig. 3 is a schematic structural diagram of a data extraction device according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: the log data acquisition and field position determination module 310 is configured to acquire log data to be extracted, and determine a field position of each first specified field in the log data to be extracted; the custom spl instruction generating module 320 is configured to generate a custom spl instruction according to the obtained instruction configuration information and each field position, where the custom spl instruction includes a field position; the target log data obtaining module 330 is configured to determine a target spl instruction from the custom spl instructions, and cut log data to be extracted based on the target spl instruction to obtain target log data.
Optionally, the log data acquisition and field location determining module 310 specifically includes: a log obtaining unit to be extracted, configured to: determining a starting identifier and an ending identifier of original log data according to the acquired line feed rule, and determining a starting position and an ending position of each log event in the original log data according to the starting identifier and the ending identifier; taking the end position of the previous original log data and the start position of the next original log data in each adjacent log event as splicing points; and merging the log data contained in each adjacent log event according to the splice points to generate the log data to be extracted.
Optionally, the log data acquisition and field location determination module 310 further includes: a field position determining unit configured to: screening the log data to be extracted according to the first specified field to obtain associated log data matched with the first specified field; and taking the corresponding position of the associated log data in the log data to be extracted as a field position.
Optionally, the custom spl instruction generation module 320 is specifically configured to: determining an instruction type contained in instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type; and combining the configuration information and the field positions of each instruction with the corresponding spl template according to the instruction type to generate each defined spl instruction.
Optionally, the target log data obtaining module 330 specifically includes: the cutting position determining unit is used for determining a cutting position according to the field position contained in the target spl instruction; the target log data generation unit is used for cutting the log data to be extracted according to the cutting position to obtain cutting log data, and generating target log data according to the cutting log data.
Optionally, the apparatus further comprises: the data display module is used for arranging the target log data according to a preset arrangement rule to generate a log data list after cutting the log data to be extracted based on the target spl instruction to obtain the target log data; and displaying the log data list.
According to the technical scheme, the field positions of the first designated fields in the log data to be extracted and the acquired instruction configuration information are determined to generate the defined spl instructions, the target spl instructions are determined from the defined spl instructions, the target log data can be acquired by cutting the log data to be extracted through the target spl instructions, multi-scene data extraction can be realized, and further daily operation and maintenance analysis and business analysis work requirements of users are better met.
The data extraction device provided by the embodiment of the invention can execute the data extraction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a data extraction method. Namely: acquiring log data to be extracted, and determining the field position of each first appointed field in the log data to be extracted; generating respective definition spl instructions according to the acquired instruction configuration information and each field position, wherein the custom spl instructions comprise the field positions; determining a target spl instruction from the self-defined spl instruction, and cutting log data to be extracted based on the target spl instruction to obtain target log data.
In some embodiments, a data extraction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of one data extraction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform a data extraction method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A data extraction method, comprising:
acquiring log data to be extracted, and determining the field position of each first appointed field in the log data to be extracted;
generating respective defined spl instructions according to the acquired instruction configuration information and each field position, wherein the custom spl instructions comprise the field positions;
determining a target spl instruction from the self-defined spl instruction, and cutting the log data to be extracted based on the target spl instruction to obtain target log data;
the generating the spl defining instruction according to the obtained instruction configuration information and the field positions includes:
determining an instruction type contained in the instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type;
and combining the configuration information of each instruction and the field position with a corresponding spl template according to the instruction type to generate each custom spl instruction, and processing unstructured data into structured data through the custom spl instructions.
2. The method of claim 1, wherein the obtaining log data to be extracted comprises:
determining a starting identifier and an ending identifier of original log data according to the acquired line feed rule, and determining a starting position and an ending position of each log event in the original log data according to the starting identifier and the ending identifier;
taking the end position of the previous original log data and the start position of the next original log data in each adjacent log event as splicing points;
and merging the log data contained in each adjacent log event according to the splicing point to generate log data to be extracted.
3. The method of claim 1, wherein determining a field location of each first specified field in the log data to be extracted comprises:
screening the log data to be extracted according to the first specified field to obtain associated log data matched with the first specified field;
and taking the corresponding position of the associated log data in the log data to be extracted as the field position.
4. The method of claim 1, wherein the cutting the log data to be extracted based on the target spl instruction to obtain target log data comprises:
determining a cutting position according to a field position contained in the target spl instruction;
cutting the log data to be extracted according to the cutting position to obtain cutting log data, and generating the target log data according to the cutting log data.
5. The method of claim 4, wherein the generating the target log data from the cut log data comprises:
determining second designated fields contained in the cutting log data, and carrying out standardization processing on each second designated field to generate renamed fields;
replacing the second designated field corresponding to the cutting log data according to the renamed field to generate replaced cutting log data;
and taking the replaced cutting log data as the target log data.
6. The method of claim 1, further comprising, after the cutting the log data to be extracted based on the target spl instruction to obtain target log data:
according to a preset arrangement rule, arranging the target log data to generate a log data list;
and displaying the log data list.
7. A data extraction apparatus, comprising:
the system comprises a log data acquisition and field position determination module, a log data extraction and field position determination module and a log data extraction module, wherein the log data acquisition and field position determination module is used for acquiring log data to be extracted and determining the field position of each first appointed field in the log data to be extracted;
the custom spl instruction generation module is used for generating a custom spl instruction according to the acquired instruction configuration information and each field position, wherein the custom spl instruction comprises the field positions;
the target log data acquisition module is used for determining a target spl instruction from the self-defined spl instruction, and cutting the log data to be extracted based on the target spl instruction to acquire target log data;
the custom spl instruction generation module is specifically configured to: determining an instruction type contained in the instruction configuration information, wherein the instruction configuration information comprises an instruction name, a program name, a resource tag, an application to which the instruction configuration information belongs, a program parameter, batch execution and an instruction type;
and combining the configuration information of each instruction and the field position with a corresponding spl template according to the instruction type to generate each custom spl instruction, and processing unstructured data into structured data through the custom spl instructions.
8. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
9. A computer storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-6.
CN202310118319.3A 2023-02-03 2023-02-03 Data extraction method, device, equipment and storage medium Active CN116431698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310118319.3A CN116431698B (en) 2023-02-03 2023-02-03 Data extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310118319.3A CN116431698B (en) 2023-02-03 2023-02-03 Data extraction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116431698A CN116431698A (en) 2023-07-14
CN116431698B true CN116431698B (en) 2024-01-30

Family

ID=87089649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310118319.3A Active CN116431698B (en) 2023-02-03 2023-02-03 Data extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116431698B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026613A (en) * 2019-12-11 2020-04-17 人教数字出版有限公司 Log processing method and device
CN112163946A (en) * 2020-09-30 2021-01-01 中国工商银行股份有限公司 Accounting processing method and device based on distributed transaction system
CN112162965A (en) * 2020-10-12 2021-01-01 平安科技(深圳)有限公司 Log data processing method and device, computer equipment and storage medium
CN113297245A (en) * 2020-05-29 2021-08-24 阿里巴巴集团控股有限公司 Method and device for acquiring execution information
CN113360521A (en) * 2021-07-08 2021-09-07 北京优特捷信息技术有限公司 Log query method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9361464B2 (en) * 2012-04-24 2016-06-07 Jianqing Wu Versatile log system
US9922099B2 (en) * 2014-09-30 2018-03-20 Splunk Inc. Event limited field picker
US10885026B2 (en) * 2017-07-29 2021-01-05 Splunk Inc. Translating a natural language request to a domain-specific language request using templates
US11042464B2 (en) * 2018-07-16 2021-06-22 Red Hat Israel, Ltd. Log record analysis based on reverse engineering of log record formats

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026613A (en) * 2019-12-11 2020-04-17 人教数字出版有限公司 Log processing method and device
CN113297245A (en) * 2020-05-29 2021-08-24 阿里巴巴集团控股有限公司 Method and device for acquiring execution information
CN112163946A (en) * 2020-09-30 2021-01-01 中国工商银行股份有限公司 Accounting processing method and device based on distributed transaction system
CN112162965A (en) * 2020-10-12 2021-01-01 平安科技(深圳)有限公司 Log data processing method and device, computer equipment and storage medium
CN113360521A (en) * 2021-07-08 2021-09-07 北京优特捷信息技术有限公司 Log query method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN116431698A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN114021156A (en) Method, device and equipment for organizing vulnerability automatic aggregation and storage medium
CN114816578A (en) Method, device and equipment for generating program configuration file based on configuration table
CN115048352B (en) Log field extraction method, device, equipment and storage medium
CN116431698B (en) Data extraction method, device, equipment and storage medium
CN116009847A (en) Code generation method, device, electronic equipment and storage medium
CN112860811B (en) Method and device for determining data blood relationship, electronic equipment and storage medium
CN116089739A (en) Message pushing method, device, electronic equipment and storage medium
CN116225312A (en) Mirror image cleaning method and device, electronic equipment and storage medium
CN115757304A (en) Log storage method, device and system, electronic equipment and storage medium
CN115408546A (en) Time sequence data management method, device, equipment and storage medium
CN115544010A (en) Mapping relation determining method and device, electronic equipment and storage medium
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
CN114756468A (en) Test data creating method, device, equipment and storage medium
CN115186738A (en) Model training method, device and storage medium
CN115858325B (en) Project log adjusting method, device, equipment and storage medium
CN117667942A (en) Data synchronous integration method and device, electronic equipment and storage medium
CN114706578A (en) Data processing method, device, equipment and medium
CN117633116A (en) Data synchronization method, device, electronic equipment and storage medium
CN117632120A (en) Processing system, method, equipment and storage medium for report data
CN117709902A (en) Material input method, device, equipment and medium based on BOM file
CN116881368A (en) Data synchronization method and device, electronic equipment and storage medium
CN117633088A (en) File data importing method, device, equipment and storage medium
CN116821217A (en) Data distribution conversion method, device, equipment and storage medium
CN116304796A (en) Data classification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant