CN117763024A - Data fragment extraction method and device - Google Patents

Data fragment extraction method and device Download PDF

Info

Publication number
CN117763024A
CN117763024A CN202311665771.8A CN202311665771A CN117763024A CN 117763024 A CN117763024 A CN 117763024A CN 202311665771 A CN202311665771 A CN 202311665771A CN 117763024 A CN117763024 A CN 117763024A
Authority
CN
China
Prior art keywords
data
field
information
query
fragmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311665771.8A
Other languages
Chinese (zh)
Inventor
江峰
褚占峰
张亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202311665771.8A priority Critical patent/CN117763024A/en
Publication of CN117763024A publication Critical patent/CN117763024A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The application provides a data fragment extraction method and device, wherein the method comprises the following steps: acquiring a query statement aiming at data information in a data table to be fragmented; executing the query statement to determine a calculation result corresponding to the data information of the fragment basis field based on calculation logic in the query statement; and configuring data extraction tasks of the number of fragments according to the value of the calculation result. The technical scheme can solve the technical problems of single data query method and low data extraction efficiency in the related technology.

Description

Data fragment extraction method and device
Technical Field
The present invention relates to the field of network technologies, and in particular, to a method and an apparatus for extracting data fragments.
Background
ETL (Extract-Transform-Load) describes the process of extracting (Extract), transforming (Transform) and loading (Load) data from a source end to a destination end, so as to integrate scattered, messy and non-uniform data in an enterprise business system, and the integrated data can provide data analysis and decision basis for enterprise decisions. It can be seen that the processing means implementing the ETL functions is a component running the basic functions in various data-related applications, which play an important role in providing data support for the upper applications, and thus the implementation of the ETL will largely determine the operation of the upper applications.
With the development of the digitizing technology, the task process of realizing the ETL is firstly faced with the process of extracting mass data, and the process of acquiring data through a single query statement in the related technology cannot meet the requirement of efficient extraction of increasingly-increased data, so that the efficiency of data extraction is low, and even the progress of the whole data integration is affected.
Disclosure of Invention
In view of this, the present application provides a data fragment extraction method and apparatus, so as to solve the technical problems of single data query method and low data extraction efficiency in the related art.
In order to achieve the above purpose, the present application provides the following technical solutions:
according to a first aspect of the present application, a data fragment extraction method is provided, the method comprising:
acquiring a query statement aiming at data information in a data table to be fragmented, wherein the query statement comprises a target field, a fragment basis field, the number of fragments and calculation logic aiming at the fragment basis field; the computation logic includes performing a plurality of operations in any order: integer arithmetic, absolute value arithmetic and residual arithmetic are taken;
executing the query statement to determine a calculation result corresponding to the data information of the fragment basis field based on the calculation logic;
And configuring the data extraction tasks of the fragmentation number according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
According to a second aspect of the present application, there is provided a data fragment extraction apparatus, the apparatus comprising:
the method comprises the steps of obtaining a query statement aiming at data information in a data table to be fragmented, wherein the query statement comprises a target field, a fragment basis field, the number of fragments and calculation logic aiming at the fragment basis field; the computation logic includes performing a plurality of operations in any order: integer arithmetic, absolute value arithmetic and residual arithmetic are taken;
the execution unit is used for executing the query statement to determine a calculation result corresponding to the data information of the fragment basis field based on the calculation logic;
the configuration unit configures the data extraction tasks of the fragmentation number according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
According to a third aspect of the present application, there is provided an electronic device comprising:
A processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute instructions to implement the method of any of the first aspects above.
According to a fourth aspect of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor implement the steps of the method as described in any of the first aspects above.
According to the technical scheme, the data information corresponding to the field in the data table to be fragmented is calculated based on calculation logic different from that in the related art, so that the residual operation result in the form of positive integer of the data information of the field in the data table to be fragmented or the positive integer value of the residual operation result is obtained, and further the data extraction tasks of the number of fragments can be configured according to the value of the calculation result, so that each data extraction task can extract the data information of the target field in the data table to be fragmented in parallel, the problem that the fragmentation processing cannot be performed on positive and negative floating points and negative integers due to the single data query method in the related art is solved, the technical problem that the single data query method and the low data extraction efficiency in the related art are solved, the problem that the data extraction cannot be performed on the data information in the form of non-positive integers such as the floating points and the negative integers due to the single data query method in the related art is solved, and the data extraction efficiency can be improved.
Drawings
FIG. 1 is a flow chart of a method for data fragment extraction according to an exemplary embodiment of the present application;
FIG. 2 is a flow chart of another data fragment extraction method provided in accordance with an exemplary embodiment of the present application;
FIG. 3 is a flowchart of a method for determining a data table to be fragmented provided in accordance with an exemplary embodiment of the present application;
FIG. 4 is a schematic block diagram of an electronic device in accordance with an exemplary embodiment of the present application;
fig. 5 is a block diagram of a data fragment extraction apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Business intelligence (Business Intelligence, abbreviated as BI), also known as business intelligence or business intelligence, describes a series of concepts and methods that assist in the formulation of business decisions by employing fact-based support systems, namely data analysis using modern data warehouse technology, on-line analytical processing technology, data mining and data presentation technology to enable existing business system data to realize business value.
The ETL process is an important link in business intelligence, and is the basis for data analysis, that is, only after the data warehouse is built, it is possible to build an analysis model based on the data warehouse and display the final result according to the requirement, and even the quality of the ETL process directly determines whether the decision of business intelligence can be successful or not, and of course, the ETL is not limited to the application of realizing business intelligence.
The primary step of the ETL process is to extract the data required by the destination data source system from the source data system, and the efficiency of the data extraction process has an important influence on the implementation efficiency of the whole ETL process. However, the data types of the source data system are not the same, the original single and fixed data query statement cannot meet the query process of data information with diversified types, so that extraction failure of data information with floating point number and negative integer type causes interruption of the ETL process.
In view of this, the present application provides a data fragment extraction method and apparatus, so as to solve the technical problems of single data query method and low efficiency of the data extraction process in the related art.
Fig. 1 is a flowchart of a data fragment extraction method according to an exemplary embodiment of the present application, and as shown in fig. 1, the method may include the following steps:
step 101, obtaining a query statement aiming at data information in a data table to be fragmented, wherein the query statement comprises a target field, a fragment basis field, the number of fragments and calculation logic aiming at the fragment basis field.
In one embodiment, whether an execution plan matched with the received query statement exists in the data cache storage area is judged, if so, the execution plan is directly called to perform data query, so that the compiling time of the execution plan is saved, and the data query efficiency is improved; otherwise, executing the verification process of the obtained query statement to perform data query based on the query statement which completes the verification process, and ending the query operation for the query statement which does not pass the verification process, and further generating and sending corresponding prompt information, so that a user who receives the prompt information can determine the reason that the query statement does not pass the verification, and the correction efficiency of the query statement is improved.
In another embodiment, before obtaining the query statement for the data information in the data table to be fragmented, the method may further include: in the case where the data table to be fragmented is executed for the first time, a data table in which the total amount of data information in the data table to be fragmented is described is determined as the data table to be fragmented.
Determining newly added data information in a preset duration in the data table to be fragmented under the condition that the data table to be fragmented is not subjected to the data extraction task for the first time; and determining the data table recording the newly added data information as the data table to be fragmented.
Further, in the process of determining the data information newly added in the preset duration in the data table to be fragmented, the same user as the user currently performing data query in the historical query users performing the data extraction task can be determined, further, the historical data extraction task information of the user is determined, and the query can be performed through database logs or system log records in the process of determining the historical data extraction task information.
Under the condition that historical data extraction task information belonging to the current query user exists, determining the latest data extraction record in the historical data extraction task information, and under the condition that the data extraction record and the data extraction task of the current query user relate to the same target field, determining newly added data information in the data information corresponding to the target field in the data table to be fragmented within a preset duration, and further determining that the data table recording the newly added data information within the preset duration is the data table to be fragmented, so that the data extraction user only needs to perform incremental extraction on the data table to be fragmented, and the data extraction efficiency is improved.
In yet another embodiment, the slice basis field in the query statement may be data information corresponding to any tag information in the data table to be sliced. According to the method and the device, the positive integer value of the residual operation result of the positive integer form of the data information corresponding to any one piece of tag information in the data table to be fragmented or the positive integer value of the residual operation result of the data information corresponding to any one piece of tag information can be determined, and then the data extraction task is configured based on the determined numerical value result, so that the problem that the fragmentation processing cannot be carried out on the data information in the non-positive integer form such as floating point number, negative integer and the like due to the single data query method in the related art is solved.
Step 102, executing the query statement to determine a calculation result corresponding to the data information of the segment-by-segment field based on the calculation logic in the query statement, wherein the calculation result includes a positive integer value of the remainder operation result of the data information of the segment-by-segment field or a positive integer value of the remainder operation result of the data information of the segment-by-segment field.
In one embodiment, the computation logic in the query statement includes performing a plurality of operations in any order: further, after the data information of the segment basis field is processed through any sequence process including the integer operation, the absolute value operation and the remainder operation, the calculation result may include a remainder operation result in a positive integer form of the data information corresponding to the segment basis field or a positive integer value of the remainder operation result of the segment basis field.
And step 103, configuring the data extraction tasks of the number of fragments according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
In an embodiment, by creating processes of the number of fragments according to the value of the calculation result, different processes in the node executing the data query perform logic calculation on the data information in the fragment basis field at the same time, and further determine the data information belonging to the data extraction task of the process according to the result of the logic calculation.
In another embodiment, the data extraction task of the number of fragments may be issued to a plurality of processing nodes, so that the plurality of processing nodes may perform logic computation on the data information corresponding to the fragments according to the fields at the same time. Specifically, the plurality of processing nodes can process the data extraction tasks which do not exceed the threshold value, so that the plurality of processing nodes can simultaneously extract the data according to the data information corresponding to the field, and finally the node distributed with the data extraction tasks gathers the extraction results of the plurality of processing nodes.
According to the embodiment, after the query statement aiming at the data information in the data table to be fragmented is obtained, the data information corresponding to the fragments according to the fields in the data table to be fragmented is operated based on the calculation logic different from that in the related art, so that the residual operation result in the positive integer form of the data information of the fragments according to the fields in the data table to be fragmented or the positive integer value of the residual operation result is obtained, and further the data extraction tasks of the number of fragments can be configured according to the value of the calculation result, so that each data extraction task can extract the data information of the target field in the data table to be fragmented in parallel, the problem that the data information in the non-positive integer form such as the positive and negative integer forms cannot be fragmented due to the single data query method in the related art is solved, the technical problems that the data query method is single in the related art and the data extraction efficiency is low are solved, and the data extraction efficiency is improved.
For further description of the present application, please refer to the following examples:
fig. 2 is a flowchart of another data fragment extraction method according to an exemplary embodiment of the present application, and as shown in fig. 2, the method may include the following steps:
step 201, determining a query sentence corresponding to the query information input by the user.
In an embodiment, a user may input a database query statement according to a business query requirement, and the input query statement may be a statement customized by the user based on the business query requirement.
In another embodiment, the user may input only the field information to be queried to automatically generate, by the device receiving the field information, an executable standard query statement that matches the field information, such as a standard query statement executable by a business system.
Further, the user can input a fuzzy field matched with the field information to be queried, and the device receiving the fuzzy field determines the field to be queried corresponding to the fuzzy field based on a preset mapping relation, and further automatically generates a query statement corresponding to the field to be queried according to the determined field to be queried.
The query statement generated based on the query information input by the user may include identification information of a data table to be fragmented corresponding to the query information and a target field for performing fragmentation, specifically, the target field is data information to be subjected to data extraction in the data table to be fragmented, in an actual application process, the target field may be tag information to be subjected to data extraction in the data table to be fragmented and all data information corresponding to the tag information, or the target field may be tag information to be subjected to data extraction in the data table to be fragmented and part of data information corresponding to the tag information.
Table 1 below is merely used to explain the inventive principles of the present application, and is not used to limit the technical solution of the present application, as an exemplary data table, the data information in the data table is abstracted to be data_xy, the tag information of the data information is abstracted to be field_x, and then table 1 shows a data table containing data table identification information table, and the data table contains n rows and m columns of data information, in which the tag information of the data information data_11 to data_n1 is field_1, the tag information of the data information data_1m to data_nm is field_n, in practical application, the tag information may be the identification name of service data information such as name, age, gender, etc., and the data information may be specific service data information corresponding to the tag information, such as in the case that the tag information field_1 is age, the data information data_11 to data_n1 may be field_11, and li ….
TABLE 1
In the process of determining the query statement, if field_1, field_2 and field_3 fields in the table are matched according to the query information input by the user, the determined query statement may include the identification information table of the to-be-fragmented data table corresponding to the query information and fields field_1, field_2 and field_3 determined based on the query information, and in the case that the query statement includes only fields field_1, field_2 and field_3, the target field queried by the query statement includes all data information corresponding to fields field_1, field_2 and field_3 respectively; further, the target field may be further limited, that is, the extraction range of the data information to be extracted is added after the target field, and then the target field queried by the query statement includes part of the data information corresponding to the fields field_1, field_2, field_3 in the data table to be fragmented and matching with the extraction range.
The generated query statement may also include a number of fragments, a fragment-by-field, and computational logic for the fragment-by-field. Specifically, the number of fragments may be the number of fragments customized by the user through the input query information; or the device receiving the query information input by the user automatically determines the number according to the preset instruction information so as to execute the query operation task in parallel by the sub-tasks of the determined task fragment number.
Further, the device receiving the query information input by the user can determine the number of fragments having pertinence to the current running condition of the device according to the running condition of the device itself. Specifically, the device may determine, according to a threshold value exceeded by the current running load of the device itself, that the number corresponding to the threshold value is a number of slices, for example, in a case where the device itself completes slicing all the data information, it may be configured that when the running load of the device is not high, the determined number of slices is smaller; the greater the number of slices that can be determined when the device operating load is too high. Further, the current running load condition of the device may be the utilization rate of all the total processors, the physical memory occupied by the active process, the total utilization rate of the physical driver, etc., which is not limited in the running condition of the device.
The fragment basis field in the query statement may be dependent on tag information specified by the user in the data to be fragmented, or may be automatically selected by the device receiving the query request entered by the user.
Specifically, the slice basis field may be any tag information in the to-be-sliced data table, for example, in the case that the query request input by the user contains tag information field_1, field_2, field_3, and accordingly, in the process of inputting information, the user may select the tag information for determining the slice basis field from the tag information contained in the query request, for example, the user selects the tag information field_2 in the query request as the tag information for determining the slice basis field.
In another embodiment, the device automatically determines tag information in the query request as tag information for determining the fragment-dependent field according to a preconfigured selection rule after receiving the query request input by the user, where the preconfigured selection rule may be that the first tag information in the query request is determined as tag information for determining the fragment-dependent field, such as in a case that the query request includes tag information field_1, field_2, field_3, the first tag information may be automatically determined as tag information for determining the fragment-dependent field, i.e. field_1.
In still another embodiment, the tag information with the least sum of the data information involved in the query request is determined as the tag information for determining the slice basis field, so as to improve the operation efficiency of the device in the slice process, for example, in the case that the query request input by the user contains field_1, field_2 and field_3, the sum of the data information corresponding to each tag information, that is, the data information data_11 to data_n1 corresponding to the tag information field_1, the data information data_12 to data_n2 corresponding to the tag information field_2 and the data information data_13 to data_n3 corresponding to the tag information field_3, may be determined respectively. Further, if the sum of the data amounts of the data information data_11 to data_n1 is x1, the sum of the data amounts of the data information data_12 to data_n2 is x2, the sum of the data amounts of the data information data_13 to data_n3 is x3, and x2< x1< x3, the tag information field_2 corresponding to the data information data_12 to data_n2 having the smallest sum of the data amounts can be automatically determined as the tag information for determining the slice basis field.
For the determined slice basis field, the generated query statement may further include a logic rule for performing calculation processing on the slice basis field, specifically, for the slice basis field that cannot directly perform remainder calculation in the related art, in addition to performing remainder calculation on the slice basis field, integer number taking calculation and absolute value taking calculation on the slice basis field are added in the application, so that the data to be extracted can be extracted in a sliced manner under the condition that the data to be extracted is a non-positive integer number.
Further, each calculation processing procedure of the slice-by-field is not limited by the operation sequence, and in the process of executing the calculation logic for the slice-by-field, the following operation procedures may be executed in any order: for example, in the case that field_2 is determined to be the tag information for determining the slice basis field, the integer operation, the absolute value operation and the remainder operation can be performed on the data information corresponding to field_2 preferentially, and then the absolute value operation and the remainder operation can be performed; or the absolute value of the data information corresponding to field_2 can be preferentially calculated, then the integer calculation is carried out, and then the remainder calculation is carried out; or the data information corresponding to field_2 can be subjected to residual operation, integer operation and absolute value operation; or the data information corresponding to field_2 can be subjected to the residual operation, the absolute value operation and the integer operation; or the data information corresponding to field_2 can be subjected to integer operation, residual operation and absolute value operation; or the absolute value calculation, the residual calculation and the integer calculation can be performed on the data information corresponding to field_2 preferentially.
Specifically, the integer arithmetic mentioned in the application may be a round (data) function, a ceil (data) function, a floor (data) function, etc. in Oracle, and the absolute value arithmetic may be an abs (data) function, a fabs (data) function, etc.
Taking field_2 as tag information for determining a slice basis field in a table, taking data information corresponding to the tag information field_1, field_2 and field_3 as a target field to be extracted as an example, the following operation processes can be executed according to any order: the integer arithmetic, absolute value arithmetic and remainder arithmetic are illustrated:
in the case of performing the operation of taking the absolute value of the data information corresponding to the tag information field_2, and then performing the operation of taking the integer, the corresponding query statement may be: SELECT field_1,field_2,field_3FROM table*WHERE TRUNC (ABS (field_2, p))=n, where p is the number of slices and n is the subtask number.
Of course, the absolute value of the data information corresponding to the tag information field_2 may be preferentially calculated, then the integer calculation is performed, and then the remainder calculation is performed, and the corresponding query statement may be: SELECT field_1,field_2,field_3FROM table*WHERE MOD (trunk (ABS (field_2)), p) =n, where p is the number of slices and n is the subtask number.
Similarly, the data information corresponding to the tag information field_2 may be subjected to integer taking operation, absolute value taking operation and remainder taking operation preferentially, and the corresponding query statement may be: SELECT field_1,field_2,field_3FROM table*WHERE MOD (ABS (field_2)), p) =n, where p is the number of slices and n is the subtask number.
The number of slices p and the subtask number n in the query statement are exemplified here by an application instance: when the total number of split tasks is 4, that is, p=4, the values corresponding to the subtask numbers n may be 0, 1, 2, and 3, and accordingly, the query statements processed by the four subtasks task [0], task [1], task [2], and task [3] respectively have different values of n, that is, n in the query statements corresponding to the four subtasks task [0], task [1], task [2], and task [3] respectively has values of 0, 1, 2, and 3.
Taking a query statement SELECT field_1,field_2,field_3FROM table*WHERE MOD (trunk (ABS (field_2)), p) =n as an example, the query statements corresponding to the four subtasks task [0], task [1], task [2], task [3] respectively are:
Task[0]:
SELECT field_1,field_2,field_3FROM table*WHERE MOD(TRUNC(ABS(field_2)),p)=0;
Task[1]:
SELECT field_1,field_2,field_3FROM table*WHERE MOD(TRUNC(ABS(field_2)),p)=1;
Task[2]:
SELECT field_1,field_2,field_3FROM table*WHERE MOD(TRUNC(ABS(field_2)),p)=2;
Task[3]:
SELECT field_1,field_2,field_3FROM table*WHERE MOD(TRUNC(ABS(field_2)),p)=3。
step 202, judging whether an execution plan matched with the received query statement exists in the data cache area, and if so, entering step 203; otherwise, step 206 is entered.
Step 203, directly calling the execution plan in the data cache memory area to perform data query.
Before executing the determined query statement, whether an execution plan matched with the determined query statement exists in the data cache storage area or not can be judged preferentially, and if so, the execution plan compiled in the data cache storage area is directly called, so that the compiling time of the execution plan is saved, and the data query efficiency is improved.
Step 204, executing a verification process of the query statement, and entering step 205 to perform data query on the query statement with the verification completed under the condition that the verification process of the query statement is completed; otherwise, sending prompt information.
Under the condition that an execution plan matched with the determined query statement does not exist in the data cache storage area, data query can be conducted on the query statement passing the check under the condition that the check is completed on the query statement, and query operation is finished on the query statement not passing the check, and further prompt information is generated and sent.
The verification process performed on the query statement may involve a combination of one or more of grammar verification, semantic verification, query authority verification. The grammar checking process can involve checking whether unrecognizable words, unrecognizable symbols and ambiguous words exist in the query statement; the semantic verification process may involve whether the table name, field information, etc. mentioned in the query statement are actually present, whether they are identifiable query objects or whether they are executable computational logic, etc.; the query permission verification process may involve permission verification of the user who sent the query information such that the query result is returned only to users who have the query permission, while no query operation is performed for users who do not have the query permission.
The grammar and the semantics of the determined query statement are calibrated based on grammar verification and semantic verification, and when errors exist in the grammar, the semantics or the query authority, prompt information about the errors can be generated and returned to an application program calling the query function of the query statement, and then presented to a user inputting the query information, and the prompt information returned to the user can comprise query error reasons, such as grammar verification errors, semantic verification errors, query authority verification errors and the like, and further, accurate verification failure reasons in the query error reasons, such as existing unrecognizable words, unrecognizable symbols, ambiguous words and the like, can be accurately obtained, so that the user receiving the prompt information can acquire the specific verification failure reasons related to the received query error, and further, the processing efficiency of the verification failure conditions is improved.
Under the condition that the check process is completed by the query statement, a corresponding execution plan can be determined for the query statement which completes the check, and meanwhile, the determined execution plan is stored in the data cache storage area, so that the call requirement of the same execution plan can be responded in time, and repeated compiling is not needed.
Step 206, determining the data table to be fragmented corresponding to the query statement.
In an embodiment, the data table to be fragmented corresponding to the query statement may be determined according to the identification information of the table involved in the query statement, for example, if the table name is table in the determined query statement, the data table with the table name of the data table to be fragmented corresponding to the query statement may be determined.
In another embodiment, the to-be-fragmented data table may be determined in real time by combining the query statement and the historical query situation, and referring to fig. 3, fig. 3 is a flowchart of a method for determining the to-be-fragmented data table according to an exemplary embodiment of the present application, and as shown in fig. 3, the method for determining the to-be-fragmented data table may involve the following specific steps:
step 301, judging whether a data table to be fragmented corresponding to a query statement is executed with a data extraction task for the first time, if yes, entering step 302; otherwise, step 303 is entered.
In step 302, a data table describing the total data information in the data table to be fragmented is determined as the data table to be fragmented.
In the application, whether the data table to be fragmented corresponding to the determined query statement is the data table of which the data extraction task is executed for the first time can be confirmed, and further, when the data table to be fragmented is the data extraction task executed for the first time, the total data information recorded in the data table to be fragmented is determined as the data table to be fragmented, so that the data information in the determined data table to be fragmented is extracted.
Step 303, determining newly added data information in a preset duration in a data table to be fragmented; and determining the data table recording the newly added data information as the data table to be fragmented.
In the process of determining the newly added data information in the preset time length in the data sheet to be fragmented, the same user as the user currently carrying out data inquiry in the historical inquiry users carrying out the data extraction tasks can be determined, further the historical data extraction task information of the user is determined, and the inquiry and the like can be carried out through database logs or system log records in the process of determining the historical data extraction task information.
Under the condition that historical data extraction task information belonging to the current query user exists, determining the latest data extraction record in the historical data extraction task information, and under the condition that the data extraction record corresponds to the same tag information or the same data information in the tag information in the data table in the data extraction task of the current query user, determining the data information which is newly added in the data information corresponding to the tag information within a preset time length, and further determining that the data table recording the data information which is newly added within the preset time length is the data table to be segmented, so that the data extraction user only needs to perform incremental extraction on the data table to be segmented, and the data extraction efficiency is improved.
Step 207, determining a target field in the data table to be fragmented corresponding to the query statement.
Screening data information corresponding to the target field in the query statement in the data table to be fragmented, and further determining the data information corresponding to the target field in the query statement in the data table to be fragmented as the target field corresponding to the query statement.
Step 208, executing the calculation logic for the determined fragments in the data table to be fragmented according to the field.
And executing calculation logic aiming at the determined segment basis field in the data table to be segmented so as to determine a calculation result corresponding to the segment basis field, wherein after the data information of the segment basis field is processed through any sequence process comprising integer taking operation, absolute value taking operation and residual calculating operation, the calculation result can comprise a residual calculating operation result in a positive integer form of the data information corresponding to the segment basis field or a positive integer value of the residual calculating result of the segment basis field.
Step 209, configuring the data extraction tasks of the number of fragments according to the value of the calculation result, so that the data extraction tasks can extract the target fields in the data table to be fragmented in parallel.
And configuring the data extraction tasks of the number of fragments according to the value of the calculation result, so that the configured data extraction tasks can extract the target fields in the data table to be fragmented in parallel.
In an embodiment, by creating processes of the number of fragments according to the value of the calculation result, different processes in the node executing the data query perform logic calculation on the fragments according to the fields at the same time, and further determine data information of the data extraction task belonging to the process according to the result of the logic calculation.
In another embodiment, the data extraction task of the number of fragments may be issued to multiple processing nodes, and then the multiple processing nodes perform logic computation on the fragments according to the field at the same time.
In the process of determining the processing nodes, the number of the nodes can be determined according to the number of the fragments, so that the data extraction tasks of the number of the fragments are respectively issued to the corresponding number of the processing nodes, each processing node processes only one received data extraction task, in this case, the threshold value of the data extraction task of the processing node to be subjected to the data extraction task is understood to be 1, and correspondingly, each processing node processes only one data extraction task; or, issuing the data extraction tasks with the number of fragments according to the amount of the data extraction tasks preconfigured in the processing nodes, for example, issuing the data extraction tasks with the number of less than or equal to 3 to the processing nodes under the condition that the amount of the data extraction tasks preconfigured in the processing nodes is 3, so that each processing node extracts the data extraction tasks with the number not exceeding a threshold value, and under the condition that each processing node finishes processing the data extraction tasks, summarizing the data processing results of each processing node by the node issuing the data extraction tasks.
Through the embodiment, the query statement corresponding to the query information input by the user is determined, and the data table to be fragmented corresponding to the query statement and the data information corresponding to the query statement in the data table to be fragmented are determined under the condition that the determined query statement does not exist in the data cache storage area and the query statement passes verification, so that the calculation logic is executed according to the data information corresponding to the field in the determined data table to be fragmented, the data extraction tasks of the number of fragments are configured according to the value of the calculation result, and the data extraction is performed on the target field in parallel by the configured data extraction tasks, thereby improving the data extraction efficiency; in addition, before configuring and issuing a data extraction task, query sentences applicable to various field types (such as negative integers, positive floating points and negative floating points) are determined based on query information input by a user, and target fields in a data table to be fragmented are processed according to the query sentences, so that the technical problems of single data query method and low data extraction efficiency in the related art are solved.
Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the data fragment extraction device on a logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present application, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Referring to fig. 5, fig. 5 is a block diagram of a data slicing and extracting apparatus according to an exemplary embodiment of the present application, as shown in fig. 5, in a software implementation, the data slicing and extracting apparatus may include:
an obtaining unit 501, configured to obtain a query statement for data information in a data table to be fragmented, where the query statement includes a target field, a per-fragment field, a number of fragments, and calculation logic for the per-fragment field;
the execution unit 502 executes the query statement to determine a calculation result corresponding to the data information of the segment-dependent field based on the calculation logic, where the calculation result includes a remainder operation result of the segment-dependent field in a positive integer form or a positive integer value of the remainder operation result of the segment-dependent field;
the configuration unit 503 configures the data extraction tasks of the number of fragments according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
Optionally, the execution unit is specifically configured to:
the following operations are performed in any order: an integer operation, an absolute value operation and a remainder operation.
Optionally, the method further comprises:
a transmitting unit 504 configured to issue the data extraction tasks of the number of fragments to a plurality of processing nodes, so that the plurality of processing nodes process the data extraction tasks that do not exceed a threshold value;
and a summarizing unit 505 summarizing the extraction results of the plurality of processing nodes.
Optionally, before the obtaining the query statement for the data information in the data table to be fragmented, the method further includes:
a first determining unit 506 that determines, as a data table to be fragmented, a data table in which the total amount of data information in the data table to be fragmented is described, in a case where the data table to be fragmented is executed for the first time;
a second determining unit 507, configured to determine, when the data table to be fragmented is not subjected to the data extraction task for the first time, data information newly added in a preset duration in the data table to be fragmented; and determining the data table recording the newly added data information as a data table to be fragmented.
Optionally, before the obtaining the query statement for the data information in the data table to be fragmented, the method further includes:
a judging unit 508 that judges whether or not there is an execution plan in the data cache area that matches the received query statement;
The query unit 509 directly invokes the execution plan to query data if the execution plan exists; otherwise, executing the checking process of the query statement to perform data query based on the query statement completing the checking process.
The device corresponds to the method, and more details are not repeated.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method for extracting data fragments, the method comprising:
acquiring a query statement aiming at data information in a data table to be fragmented, wherein the query statement comprises a target field, a fragment basis field, the number of fragments and calculation logic aiming at the fragment basis field; the computation logic includes performing a plurality of operations in any order: integer arithmetic, absolute value arithmetic and residual arithmetic are taken;
executing the query statement to determine a calculation result corresponding to the data information of the fragment basis field based on the calculation logic;
and configuring the data extraction tasks of the fragmentation number according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
2. The method of claim 1, wherein the determining the manner in which the slice depends on the field comprises: and determining the label information with the least sum of the data information in the data table to be fragmented as a fragment basis field.
3. The method of claim 1, wherein the number of slices is determined based on a current operating load of the device.
4. The method of claim 1, wherein the calculation result comprises a positive integer value of the remainder result of the data information of the slice-dependent field or a positive integer value of the remainder result of the data information of the slice-dependent field.
5. The method according to claim 1, wherein configuring the data extraction task of the number of slices according to the value of the calculation result includes:
and creating processes of the number of fragments according to the value of the calculation result, so that different processes in the node for executing the data query can simultaneously perform logic calculation on the data information in the fragment basis field, and further determining the data information belonging to the data extraction task of the process according to the result of the logic calculation.
6. The method of claim 1, wherein the manner of determining the number of tiles comprises any one of:
determining the number of fragments according to query information input by a user;
the device for receiving the query information input by the user automatically determines the number of fragments according to the preset instruction information;
The device receives inquiry information input by a user, and determines the number of fragments with pertinence to the current running condition of the device according to the running condition of the device;
the device determines the number of fragments according to the current running load of the device.
7. The method according to claim 1, wherein the method of determining the slice basis field comprises any one of:
determining the slicing basis field according to label information appointed by a user in data to be sliced;
automatically selecting the shard basis field by a device receiving a user input query request;
after receiving a query request input by a user, the device automatically determines label information in the query request as label information for determining a fragment basis field according to a preconfigured selection rule;
the tag information with the least sum of the data information involved in the query request is determined as the tag information for determining the slice basis field.
8. A data fragment extraction apparatus, the apparatus comprising:
the method comprises the steps of obtaining a query statement aiming at data information in a data table to be fragmented, wherein the query statement comprises a target field, a fragment basis field, the number of fragments and calculation logic aiming at the fragment basis field; the computation logic includes performing a plurality of operations in any order: integer arithmetic, absolute value arithmetic and residual arithmetic are taken;
The execution unit is used for executing the query statement to determine a calculation result corresponding to the data information of the fragment basis field based on the calculation logic;
the configuration unit configures the data extraction tasks of the fragmentation number according to the value of the calculation result, so that the data extraction tasks extract the data information corresponding to the target field in the data table to be fragmented in parallel.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute instructions to implement the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-7.
CN202311665771.8A 2019-11-28 2019-11-28 Data fragment extraction method and device Pending CN117763024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311665771.8A CN117763024A (en) 2019-11-28 2019-11-28 Data fragment extraction method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911195281.XA CN110928941B (en) 2019-11-28 2019-11-28 Data fragment extraction method and device
CN202311665771.8A CN117763024A (en) 2019-11-28 2019-11-28 Data fragment extraction method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201911195281.XA Division CN110928941B (en) 2019-11-28 2019-11-28 Data fragment extraction method and device

Publications (1)

Publication Number Publication Date
CN117763024A true CN117763024A (en) 2024-03-26

Family

ID=69847604

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201911195281.XA Active CN110928941B (en) 2019-11-28 2019-11-28 Data fragment extraction method and device
CN202311665771.8A Pending CN117763024A (en) 2019-11-28 2019-11-28 Data fragment extraction method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201911195281.XA Active CN110928941B (en) 2019-11-28 2019-11-28 Data fragment extraction method and device

Country Status (1)

Country Link
CN (2) CN110928941B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688159B (en) * 2021-09-08 2024-04-05 京东科技控股股份有限公司 Data extraction method and device
CN115935090B (en) * 2023-03-10 2023-06-16 北京锐服信科技有限公司 Data query method and system based on time slicing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391508B (en) * 2016-05-16 2020-07-17 顺丰科技有限公司 Data loading method and system
CN107436883B (en) * 2016-05-26 2020-06-30 北京京东尚科信息技术有限公司 Data extraction method, device and system based on remainder
CN106294886A (en) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 A kind of method and system of full dose extracted data from HBase

Also Published As

Publication number Publication date
CN110928941A (en) 2020-03-27
CN110928941B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
US20220391763A1 (en) Machine learning service
US20200356901A1 (en) Target variable distribution-based acceptance of machine learning test data sets
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
EP3353672B1 (en) Method and apparatus for transferring data between databases
US20150379423A1 (en) Feature processing recipes for machine learning
US10878335B1 (en) Scalable text analysis using probabilistic data structures
CN111026568B (en) Data and task relation construction method and device, computer equipment and storage medium
AU2021205017A1 (en) Processing data utilizing a corpus
CN106897342B (en) Data verification method and equipment
CN110019298B (en) Data processing method and device
KR20180002758A (en) DATA PROCESSING METHOD AND SYSTEM
CN110928941B (en) Data fragment extraction method and device
CN109033365B (en) Data processing method and related equipment
KR101772333B1 (en) INTELLIGENT JOIN TECHNIQUE PROVIDING METHOD AND SYSTEM BETWEEN HETEROGENEOUS NoSQL DATABASES
CN110019357B (en) Database query script generation method and device
US20160292282A1 (en) Detecting and responding to single entity intent queries
CN112749197B (en) Data fragment refreshing method, device, equipment and storage medium
CN111400245B (en) Art resource migration method and device
CN114691496A (en) Unit testing method, unit testing device, computing equipment and medium
US8321844B2 (en) Providing registration of a communication
US20230010147A1 (en) Automated determination of accurate data schema
CN115291889B (en) Data blood relationship establishing method and device and electronic equipment
US11715037B2 (en) Validation of AI models using holdout sets
US20220309384A1 (en) Selecting representative features for machine learning models
US20230060245A1 (en) System and method for automated account profile scoring on customer relationship management platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination