CN117407156A - Target data extraction method, device, computer equipment and storage medium - Google Patents

Target data extraction method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117407156A
CN117407156A CN202311243486.7A CN202311243486A CN117407156A CN 117407156 A CN117407156 A CN 117407156A CN 202311243486 A CN202311243486 A CN 202311243486A CN 117407156 A CN117407156 A CN 117407156A
Authority
CN
China
Prior art keywords
data
computing unit
current
operator
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311243486.7A
Other languages
Chinese (zh)
Inventor
孙雪永
汤乐奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202311243486.7A priority Critical patent/CN117407156A/en
Publication of CN117407156A publication Critical patent/CN117407156A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to a target data extraction method, apparatus, computer device and storage medium, applied to a server, the server being deployed with at least one computing unit, the method comprising: determining a data format corresponding to the computing unit; dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating a corresponding operator data structure; and acquiring data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on an operator data structure, and performing deserialization calculation based on a data extraction operator to obtain target data in the data to be extracted. By adopting the method, the calculation efficiency can be improved, and the waste of calculation resources is reduced.

Description

Target data extraction method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and apparatus for extracting target data, a computer device, and a storage medium.
Background
At present, a large number of columns of structured data are often generated, but the number of columns needed to be used is relatively small, especially in the process of serialization and deserialization of data, all fields are often subjected to serialization and deserialization operations in the prior art, and sometimes even a small part of fields are used, the whole data needs to be processed, so that the problem of serious waste of computing resources and reduced program performance can occur.
At present, an effective solution to the problem of large waste of computing resources has not been proposed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target data extraction method, apparatus, computer device, and storage medium.
In a first aspect, the present application provides a target data extraction method applied to a server, where at least one computing unit is deployed on the server, the method including:
determining a data format corresponding to the computing unit;
dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating a corresponding operator data structure;
and acquiring data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on an operator data structure, and performing deserialization calculation based on a data extraction operator to obtain target data in the data to be extracted.
In one embodiment, the data formats include a data input format and a data output format; dividing the computing units based on the data format to obtain at least two computing unit domains, including:
if the fact that the data input format and the data output format corresponding to the current computing unit in the computing units are different is detected, determining the current computing unit as a dividing point computing unit;
and dividing the computing units based on the dividing point computing units to obtain computing unit domains.
In one embodiment, obtaining target data in data to be extracted includes:
matching: based on a current target operator data structure corresponding to a current data extraction operator in the data extraction operators, matching with a historical operator data structure corresponding to historical data, and if matching is successful, determining a current residual operator data structure in the current target operator data structure based on a matching result; the historical data are obtained from the data to be extracted according to the historical operator data structure;
the extraction step: acquiring initial data from data to be extracted based on a current residual operator data structure;
and obtaining a next target operator data structure corresponding to the next data extraction operator, repeating the matching step and the extraction step until all the data extraction operators are traversed, and fusing the initial data with the historical data to obtain target data.
In one embodiment, the data extraction operator corresponds to a target operator data structure; acquiring initial data from data to be extracted based on a current residual operator data structure, including:
constructing a target storage unit based on the target operator data structure;
acquiring current target historical data from the historical data based on the matching result, and acquiring current initial data based on current residual data corresponding to the current residual operator data structure of the current target historical data;
and sorting the current initial data based on the storage sequence indicated in the target storage unit to obtain initial data.
In one embodiment, after obtaining the target data in the data to be extracted, the method further includes:
acquiring corresponding target data results from the target data based on the unit data structures corresponding to all the computing units;
binding the unit data structure with the corresponding target data result to obtain a final data result.
In one embodiment, obtaining unit data structures corresponding to all computing units includes:
unit data structure matching step: acquiring a calculation unit node map corresponding to all calculation units based on all calculation unit domains;
determining a current computing unit in the computing units, and determining a current unit data structure corresponding to the current computing unit based on the computing unit node diagram;
determining the next computing unit in the computing units, and repeating the unit data structure matching step until all the computing units are traversed to obtain unit data structures corresponding to all the computing units.
In one embodiment, determining a current unit data structure corresponding to a current computing unit based on a computing unit node map includes:
acquiring a father node computing unit aiming at the current computing unit based on the searching of the current computing unit to the father node direction of the computing unit node diagram, and acquiring a next father node computing unit aiming at the current computing unit based on the searching of the current computing unit to the child node direction of the computing unit node diagram;
determining a current computing unit domain corresponding to the current computing unit based on the parent node computing unit and the next parent node computing unit;
a current unit data structure for the current computing unit determination is determined based on the current computing unit domain.
In a second aspect, the present application further provides a target data extraction apparatus. The device comprises:
the acquisition module is used for acquiring data to be extracted and at least one preset calculation unit; wherein the computing unit declares the corresponding data format;
the computing module is used for dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains;
the generation module is used for inputting the data to be extracted into a data extraction unit composed of all calculation units, and acquiring target data based on a data extraction operator.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
determining a data format corresponding to the computing unit;
dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating a corresponding operator data structure;
and acquiring data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on an operator data structure, and performing deserialization calculation based on a data extraction operator to obtain target data in the data to be extracted.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
determining a data format corresponding to the computing unit;
dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating a corresponding operator data structure;
and acquiring data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on an operator data structure, and performing deserialization calculation based on a data extraction operator to obtain target data in the data to be extracted.
According to the target data extraction method, the device, the computer equipment and the storage medium, which data in the data to be extracted are required target data can be clarified through the operator data structure, and the target data are correspondingly extracted based on the data extraction operator, so that the calculation efficiency is greatly improved, and the waste of calculation resources generated by calculating redundant data is avoided; furthermore, the computing unit is divided to finish the distribution of the data extraction operators, so that the problem that extraction of data in different structures conflicts when data are extracted in practical application can be avoided, and the possibility of errors when the data are extracted is reduced.
Drawings
FIG. 1 is an application environment diagram of a target data extraction method in one embodiment;
FIG. 2 is a flow chart of a method for extracting target data in one embodiment;
FIG. 3 is a flow chart of a method for extracting target data in a preferred embodiment;
FIG. 4 is a node diagram of multiple computing units in one embodiment;
FIG. 5 is a block diagram of a target data extraction device in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The target data extraction method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. Firstly, a server determines a data format corresponding to a computing unit, then divides the data unit based on the data format to obtain a plurality of computing unit domains, and distributes a corresponding operator to each computing unit domain, wherein the data extraction operator indicates a corresponding operator data structure, and finally, determines target data to be extracted from data to be extracted based on the operator data structure, and performs deserialization calculation based on the data extraction operator to obtain target data in the data to be extracted. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a target data extraction method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
step S202, determining a data format corresponding to the computing unit.
Each computing unit has independent computing logic, and the computing units are used for independently completing the computation of data, such as data pulling, data comparison, data type conversion and the like. Further, the different computing units may declare their own data structures, including an input data structure and an output data structure.
Step S204, dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating the corresponding operator data structure.
The computing units are divided based on a data format, for example, the data format may be int, string, etc., so as to separate the computing units that compute data with different structures. Each calculation unit domain comprises at least one calculation unit, the calculation unit domain determines a required data structure based on the corresponding calculation unit, and the calculation unit domains and the data extraction operators are in one-to-one correspondence, so that an operator data structure corresponding to the data extraction operators is obtained based on the required data structure of the corresponding calculation unit.
Step S206, obtaining data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on an operator data structure, and performing deserialization calculation based on a data extraction operator to obtain target data in the data to be extracted.
In practical application, the data to be extracted includes a large amount of data of various types, and in the prior art, if the data to be extracted is only a small part of the data to be extracted, all the data to be extracted are subjected to serialization and deserialization, which results in the degradation of program performance and the waste of computing resources. In the application, after the data to be extracted is obtained, the corresponding target data to be extracted is determined from the data to be extracted based on the operator data structure, and the anti-serialization calculation is performed based on the data extraction operator, in practical application, the data is usually processed in a serialization mode and then transmitted, and the anti-serialization calculation is performed on the target data to be extracted, so that the extraction of the target data can be realized.
The steps S202 to S206 can be mainly applied to the scene that a large amount of data exists but only a part of the data is actually needed, and the scheme based on the application can realize the part of data needed by the serialization on demand without serializing all the data, so that the calculation efficiency is improved, and the unnecessary waste of resources is reduced. Furthermore, the scheme of the application is very flexible, can be adjusted by a user according to actual conditions, different data extraction operators are selected and inserted, extraction of various data is completed, the invasiveness of the application to a source program is low, the data processing performance can be greatly optimized under a specific scene, and the development cost is saved.
In one embodiment, the data formats include a data input format and a data output format; dividing the computing units based on the data format to obtain at least two computing unit domains, including:
if the fact that the data input format and the data output format corresponding to the current computing unit in the computing units are different is detected, determining the current computing unit as a dividing point computing unit;
and dividing the computing units based on the dividing point computing units to obtain computing unit domains.
Specifically, the data input format and the data output format declared by the respective computing units are collected, and the data input format and the data output format are compared with each other for the presence of the same field but different data formats. For example, the following table is the data format declared by abcde five computational units in one embodiment:
the data input format of the c calculation unit is set, the data input format is string, the data output format is int, that is, the point is the above-mentioned division point calculation unit, so that the division point calculation unit is used for dividing, and two calculation unit domains are obtained, that is, the calculation unit domain data input type formed by abc calculation units is { name: string, set: string }, the calculation unit domain input type formed by de calculation units is { set: int, id: string, phone: string }, the input and output fields of the calculation units in the same domain are aggregated, and the calculation unit input data fields and types in the same calculation unit domain are the same, wherein the above-mentioned data structure is formed by the above-mentioned data types and the above-mentioned data formats. The method can rapidly locate the dividing point computing units in the computing units, thereby completing the division of the computing unit domain and laying a foundation for the distribution of the subsequent data extraction operators.
In one embodiment, obtaining target data in data to be extracted includes:
matching: based on a current target operator data structure corresponding to a current data extraction operator in the data extraction operators, matching with a historical operator data structure corresponding to historical data, and if matching is successful, determining a current residual operator data structure in the current target operator data structure based on a matching result; the historical data are obtained from the data to be extracted according to the historical operator data structure;
the extraction step: acquiring initial data from data to be extracted based on a current residual operator data structure;
and obtaining a next target operator data structure corresponding to the next data extraction operator, repeating the matching step and the extraction step until all the data extraction operators are traversed, and fusing the initial data and the historical data to obtain target data.
In particular, one data extraction operator may correspond to a plurality of operator data structures, and there may be overlapping portions between operator data structures corresponding to different data extraction operators. Therefore, in order to reduce unnecessary waste of calculation resources, when determining the corresponding operator data structure based on the data extraction operator, the data extracted from the operator is compared with the extracted data structure, if the matching is successful, the operator data structure successfully matched in the current data extraction operator is proved to be extracted, the operator data structure successfully matched is skipped, the rest operator data structures except the operator data structure successfully matched are extracted, and the current rest operator data structure is a data structure not extracted from the data. Repeating the steps until all the data extraction operators finish extraction, and obtaining the target data. By the method, the situation that the overlapping data structures possibly exist among different data extraction operators in actual application is considered, and in order to avoid resource waste caused by the situation, the calculation efficiency can be further improved according to the method.
In one embodiment, the data extraction operator corresponds to a target operator data structure; acquiring initial data from data to be extracted based on a current residual operator data structure, including:
constructing a target storage unit based on the target operator data structure;
acquiring current target historical data from the historical data based on the matching result, and acquiring current initial data based on current residual data corresponding to the current residual operator data structure of the current target historical data;
and sorting the current initial data based on the storage sequence indicated in the target storage unit to obtain initial data.
In particular, in practical applications, it is first necessary to construct a storage unit according to the object operator data structure, and preferably, an officially provided data unit may be used as the storage unit, where the storage unit stores only data and does not store a header of the data, so that the program IO may be further reduced. And then, based on the current residual operator data structure, copying the current residual data from the target historical data, thereby obtaining the current initial data, wherein the current initial data is the sum of the current target historical data and the current residual data. In practical application, one data extraction operator generally corresponds to a plurality of target operator data structures, and based on the storage sequence indicated by the target operator data structures in the target storage units, the corresponding data is inserted into a designated position to complete the sorting of the current initial data. The method can further perfect the acquisition step of the initial data, and the initial data comprising various data structures are orderly ordered, so that the calculation efficiency can be further improved.
In one embodiment, after obtaining the target data in the data to be extracted, the method further includes:
acquiring corresponding target data results from the target data based on the unit data structures corresponding to all the computing units;
binding the unit data structure with the corresponding target data result to obtain a final data result.
Specifically, after the target data is obtained, the target data does not have header data, i.e. the specific data structure of the target data cannot be known. Therefore, the corresponding target data result is required to be determined based on the unit data structure indicated by the computing unit, and the corresponding target data result is bound, so that the computing unit can understand the corresponding target data result, and the steps are repeated until the corresponding unit data structure is matched for each type of target data result, and the final data result is obtained. The method can be used for completing the supplement of the data, distributing the corresponding data structure for the extracted data, better adapting to the actual application scene and facilitating the processing calculation of the extracted data in the subsequent steps.
In one embodiment, obtaining unit data structures corresponding to all computing units includes:
unit data structure matching step: acquiring a calculation unit node map corresponding to all calculation units based on all calculation unit domains;
determining a current computing unit in the computing units, and determining a current unit data structure corresponding to the current computing unit based on the computing unit node diagram;
determining the next computing unit in the computing units, and repeating the unit data structure matching step until all the computing units are traversed to obtain unit data structures corresponding to all the computing units.
Specifically, the above-mentioned computation unit node map is established based on the computation unit domain, and preferably, the above-mentioned division point is set as a parent node, and a plurality of computation units are regarded as corresponding child nodes, so that the unit data structure corresponding to the computation unit can be determined based on the computation unit node map with high efficiency. By the method, the computing unit domain where the computing unit is located can be determined more efficiently by using a node diagram, so that the unit data structure can be determined quickly.
In one embodiment, determining a current unit data structure corresponding to a current computing unit based on a computing unit node map includes:
acquiring a father node computing unit aiming at the current computing unit based on the searching of the current computing unit to the father node direction of the computing unit node diagram, and acquiring a next father node computing unit aiming at the current computing unit based on the searching of the current computing unit to the child node direction of the computing unit node diagram;
determining a current computing unit domain corresponding to the current computing unit based on the parent node computing unit and the next parent node computing unit;
a current unit data structure for the current computing unit determination is determined based on the current computing unit domain.
Specifically, the current computing unit is searched in the direction of the father node until the father node is found in the node diagram mode, the father node is the computing unit partition point corresponding to the current computing unit, namely the father node computing unit in the text, then the father node is searched in the opposite direction, another father node is found, namely the next father node computing unit in the next computing unit domain, the computing unit domain where the current computing unit is located and all computing units in the computing unit domain can be determined through the method, and then the current unit data structure can be obtained through the inputtatent and outputtatent methods of the reflection calling node. The inputtattement and the outputtattement are used for indicating an input data structure and an output data structure of the computing unit.
The embodiment also provides a specific embodiment of a named entity recognition method, as shown in fig. 3, and fig. 3 is a flow chart of the named entity recognition method in a preferred embodiment.
In step S301, an anti-serialization policy is formulated. Firstly, collecting data input formats and data output formats which are declared by a plurality of required computing units, comparing whether the data input structures and the data output structures have the same fields and different data types, if yes, marking the data input structures and the data output structures as dividing point computing units, and dividing the plurality of computing units based on the dividing point computing units to obtain a plurality of corresponding computing unit domains. And then aggregating the input and output fields of the computing units in the same domain, wherein the operator input data types and fields in the same domain are the same. Further, it will be appreciated that in practical applications, all computing units may be abstracted to SchemaComponent, schemaComponent to provide an inputstatement method for declaring schema of input and an outputstatement for declaring schema of output, and all schema components are integrated in schema context and record precedence relationships, where the schema, i.e. the data structure, is composed of the data format corresponding to the computing units and the corresponding operator data types above.
Step S302, serializing the data as needed. The insertion of a corresponding data extraction operator, i.e., an inverse serialization operator deserialize, is assigned in front of each computational unit domain. The data storage carrier uses a row provided by the link official, and the row can be understood as an array, and the data only stores data and does not store the header of the data, and the array is the storage unit. In this application, only the data corresponding to the current domain is deserialized, and a field original_message is additionally added to store the original data information, and it should be noted that if some type of data already exists in the extracted data, it is not necessary to extract again, and in practical application, all data extraction operators will hold the data structure corresponding to the current domain by using the generic sequence class CommonDescrializeFunction, commonDescrializeFunction, construct row according to the data structure, and insert the extracted data into the designated position in row. The method provided by using jackson in a serialization manner is as follows:
String name=jsonNode.get("name").asText();
int age=jsonNode.get("age").asInt();
step S303, providing an aggregation method for each computing unit. After extracting the target data, the data has no corresponding data structure, that is, the computing unit cannot find the data corresponding to the field to process, so the application provides an aggregation method, which can aggregate the data structure schema corresponding to the current field. Specifically, in practical application, a context object may be created to hold all computing units, and when the computing units perform aggregation computation, the context object may check all computing units in a computing unit domain where the current computing unit is located, and traverse and call an input/output declaration method of the computing units, so as to aggregate the acquired data structures together, and the computing units obtain the aggregated data structures, so that the data can be acquired from the row. The computing unit may be simplified into a plurality of nodes, as shown in fig. 4, and fig. 4 is a node diagram corresponding to the computing unit in one embodiment. Wherein b and e are cut-off nodes, and the corresponding calculation units of the cut-off points in the corresponding calculation units are the same domain, so that the value of the node c is in which domain and the data structure corresponding to the calculation unit c in the current domain is obtained.
According to the method used in the application, according to the sequence recorded in the schema context, the c node is searched from the direction of the father node until the corresponding cut-off node is found, then the opposite direction of the current node is searched to know that another cut-off node is found, so that all the child nodes of the current domain, namely all the computing units corresponding to the current domain, are collected, and the data structure corresponding to the current computing unit can be obtained through the input state and output state methods of the reflection calling node.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a target data extraction device for realizing the target data extraction method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the target data extraction device or devices provided below may refer to the limitation of the target data extraction method hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 5, there is provided a target data extraction apparatus including: the device comprises an acquisition module, a calculation module and a generation module, wherein:
and the acquisition module is used for determining the data format corresponding to the calculation unit.
The computing module is used for dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; the data extraction operator is used for indicating the corresponding operator data type.
The generating module is used for acquiring the data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on the operator data type, and performing deserialization calculation based on the data extraction operator to obtain target data in the data to be extracted.
Specifically, the acquisition module acquires a plurality of the computing units and determines a data format corresponding to the computing units, then the acquisition module transmits the computing units and the corresponding data formats to the computing modules, the computing modules divide the computing units based on the data formats to obtain a plurality of computing unit domains, wherein each computing unit domain is composed of at least one computing unit, and a corresponding data extraction operator is allocated to the computing unit domain, and further the data extraction operator is used for indicating a corresponding operator data type. And the calculation module is used for transmitting the calculation unit domain and the corresponding data extraction operator plug to the generation module, the generation module is used for obtaining data to be extracted, further determining corresponding target data to be extracted from the data to be extracted based on the operator data type, and carrying out fractional serialization calculation based on the data extraction operator to obtain the target data. By the device, partial data required by the on-demand serialization can be realized without serialization of all data, so that the calculation efficiency is improved, and unnecessary resource waste is reduced. Furthermore, the scheme of the application is very flexible, can be adjusted by a user according to actual conditions, different data extraction operators are selected and inserted, extraction of various data is completed, the invasiveness of the application to a source program is low, the data processing performance can be greatly optimized under a specific scene, and the development cost is saved.
The respective modules in the above-described target data extraction device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing computer program data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of target data extraction.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method for extracting target data, characterized by being applied to a server, wherein at least one computing unit is deployed on the server; the method comprises the following steps:
determining a data format corresponding to the computing unit;
dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains; wherein the data extraction operator is used for indicating a corresponding operator data structure;
and acquiring data to be extracted, determining corresponding target data to be extracted from the data to be extracted based on the operator data structure, and performing deserialization calculation based on the data extraction operator to obtain target data in the data to be extracted.
2. The method of claim 1, wherein the data format comprises a data input format and a data output format; the dividing the computing unit based on the data format to obtain at least two computing unit domains includes:
if the data input format corresponding to the current computing unit in the computing units is detected to be different from the data output format, determining the current computing unit as a dividing point computing unit;
dividing the computing units based on the dividing point computing units to obtain the computing unit domain.
3. The method according to claim 1, wherein the obtaining the target data in the data to be extracted includes:
matching: based on a current target operator data structure corresponding to a current data extraction operator in the data extraction operators, matching with a historical operator data structure corresponding to historical data, and if matching is successful, determining a current residual operator data structure in the current target operator data structure based on a matching result; the historical data are acquired from the data to be extracted according to the historical operator data structure;
the extraction step: acquiring initial data from the data to be extracted based on the current residual operator data structure;
and obtaining a next target operator data structure corresponding to a next data extraction operator, repeating the matching step and the extraction step until all the data extraction operators are traversed, and fusing the initial data and the historical data to obtain the target data.
4. A method according to claim 3, wherein the data extraction operator corresponds to a target operator data structure; the obtaining initial data from the data to be extracted based on the current residual operator data structure comprises the following steps:
constructing a target storage unit based on the target operator data structure;
acquiring current target historical data from the historical data based on the matching result, and acquiring current initial data based on the current residual data corresponding to the current residual operator data structure of the current target historical data;
and sequencing the current initial data based on the storage sequence indicated in the target storage unit to obtain the initial data.
5. The method according to claim 1, wherein after the obtaining the target data in the data to be extracted, the method further comprises:
acquiring corresponding target data results from the target data based on the unit data structures corresponding to all the computing units;
binding the unit data structure with the corresponding target data result to obtain a final data result.
6. The method of claim 5, wherein obtaining unit data structures corresponding to all of the computing units comprises:
unit data structure matching step: acquiring a calculation unit node map corresponding to all the calculation units based on all the calculation unit domains;
determining a current computing unit in the computing units, and determining a current unit data structure corresponding to the current computing unit based on the computing unit node diagram;
and determining the next computing unit in the computing units, and repeating the unit data structure matching step until all the computing units are traversed to obtain the unit data structures corresponding to all the computing units.
7. The method of claim 6, wherein the determining a current unit data structure corresponding to the current computing unit based on the computing unit node map comprises:
searching for a parent node direction of the node diagram of the current computing unit based on the current computing unit to obtain a parent node computing unit aiming at the current computing unit, and searching for a child node direction of the node diagram of the computing unit based on the current computing unit to obtain a next parent node computing unit aiming at the current computing unit;
determining a current computing unit domain corresponding to the current computing unit based on the parent node computing unit and the next parent node computing unit;
and determining a current unit data structure determined by the current computing unit based on the current computing unit domain.
8. A target data extraction apparatus, the apparatus comprising:
the acquisition module is used for acquiring data to be extracted and at least one preset calculation unit; wherein the computing unit declares a corresponding data format;
the computing module is used for dividing the computing units based on the data format to obtain at least two computing unit domains, and distributing corresponding data extraction operators based on the computing unit domains;
the generation module is used for inputting the data to be extracted into a data extraction unit composed of all the calculation units, and acquiring the target data based on the data extraction operator.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311243486.7A 2023-09-25 2023-09-25 Target data extraction method, device, computer equipment and storage medium Pending CN117407156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311243486.7A CN117407156A (en) 2023-09-25 2023-09-25 Target data extraction method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311243486.7A CN117407156A (en) 2023-09-25 2023-09-25 Target data extraction method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117407156A true CN117407156A (en) 2024-01-16

Family

ID=89489881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311243486.7A Pending CN117407156A (en) 2023-09-25 2023-09-25 Target data extraction method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117407156A (en)

Similar Documents

Publication Publication Date Title
CN112287182A (en) Graph data storage and processing method and device and computer storage medium
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN117407156A (en) Target data extraction method, device, computer equipment and storage medium
CN117539962B (en) Data processing method, device, computer equipment and storage medium
CN116541137A (en) Transaction processing method, apparatus, computer device, storage medium, and program product
CN118152504A (en) Unstructured data indexing method, device, apparatus, medium and program product
CN117459519A (en) Traceable file processing method, traceable file processing device, computer equipment and storage medium
CN116684404A (en) Resource interaction data downloading method, device, computer equipment and storage medium
JP5410155B2 (en) Data division system and data division method
CN117610815A (en) Resource quota data processing method, device, computer equipment and storage medium
CN117131128A (en) Data synchronization method, device, computer equipment and storage medium
CN116932677A (en) Address information matching method, device, computer equipment and storage medium
CN116910115A (en) Group query method, device, computer equipment and storage medium
CN117076476A (en) Object information processing method, device, computer equipment and storage medium
CN117391702A (en) Account data verification method, account data verification device, computer equipment and storage medium
CN116880927A (en) Rule management method, device, computer equipment and storage medium
CN114880311A (en) Data processing method, data processing device, storage medium and computer equipment
CN116069991A (en) Server data acquisition method, device, computer equipment and storage medium
CN116028448A (en) Identification code determining method, device, equipment and storage medium of electronic file
CN117453561A (en) Test script calling method, device, computer equipment and storage medium
CN116506506A (en) Service dynamic change method, device, computer equipment and storage medium
CN117349131A (en) System error information display method and device and computer equipment
CN117910035A (en) Data import and export method, system, computer device and storage medium
CN117278625A (en) Message conversion method, device, computer equipment and storage medium
CN116680729A (en) Data processing method and device based on intelligent contract and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination