CN113204602B - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113204602B
CN113204602B CN202110496553.0A CN202110496553A CN113204602B CN 113204602 B CN113204602 B CN 113204602B CN 202110496553 A CN202110496553 A CN 202110496553A CN 113204602 B CN113204602 B CN 113204602B
Authority
CN
China
Prior art keywords
target
data
window
field
hash value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110496553.0A
Other languages
Chinese (zh)
Other versions
CN113204602A (en
Inventor
陈振强
靳峥
赵鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202110496553.0A priority Critical patent/CN113204602B/en
Publication of CN113204602A publication Critical patent/CN113204602A/en
Application granted granted Critical
Publication of CN113204602B publication Critical patent/CN113204602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: responding to a window function call instruction, and acquiring a target field contained in the window function call instruction; determining a logic form according to the hash value of the target field; and performing window function processing on the logic form. According to the packet fields carried by the window function call instruction, the hash value of the contents of the packet fields is calculated, and then the packet fields with different contents are distributed into different logic forms, so that the logic forms store the contents of the packet fields with the same hash value. One logical form can contain the contents of a plurality of packet fields with the same hash value, so that physical windows do not need to be respectively established for the contents of each packet field, windows are further divided more reasonably, and the window function processing efficiency is improved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to a database data processing technology of big data, in particular to a data processing method, a device, equipment and a storage medium.
Background
The window function is also called OLAP (online analytical processing) function and is used for analyzing and processing database data in real time. Window functions are database functions that are often used in analytical services. And the window function displays the data processing results of the attributes of the original form data, such as the ordering of a certain field, in the window on the basis of keeping the original form data.
Currently, when responding to a window function, a physical window is established according to the content of each seed field in the packet field aimed by the window function. For example, if the packet field contains N types of content, N physical windows are established, and computing resources are configured for each physical window.
However, in a big data environment, if the data in a single window is small, but the number of windows is large, the occupation amount of the system memory resources is large. If the data volume of a single window is large, the window processing process is slower. Therefore, how to reasonably divide the window and improve the window function processing efficiency is called as a problem to be solved urgently.
Disclosure of Invention
The invention provides a data processing method, a device, equipment and a storage medium, which are used for reasonably dividing logic windows and improving window function processing efficiency.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
and performing window function processing on the logic form.
In a second aspect, embodiments of the present invention also provide a server comprising a processor and a memory, the memory for storing instructions that when executed cause the processor to:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
and performing window function processing on the logic form.
In a third aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions for performing a data processing method as shown in embodiments of the present application when executed by a computer processor.
According to the data processing method provided by the invention, a window function call instruction is responded to, and a target field contained in the window function call instruction is obtained; determining a logic form according to the hash value of the target field; and performing window function processing on the logic form. Compared with the prior art that a physical window is configured for each field content, the window allocation is unreasonable, and the window function processing efficiency is poor. One logic form can contain contents of a plurality of target fields with the same hash value, so that physical windows do not need to be respectively established for the contents of each target field, windows are further divided more reasonably, and window function processing efficiency is improved.
Drawings
FIG. 1 is a flow chart of a data processing method in a first embodiment of the invention;
FIG. 2 is a flow chart of a data processing method in a second embodiment of the invention;
FIG. 3 is a schematic diagram of a data processing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server in a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
The database is built in the server, the client sends a window function call instruction to the server to request the server to execute the window function call instruction, the original data in the database is processed, and the client receives a processing result fed back by the server.
After the client sends a window function call instruction to the server, the server obtains the content of the packet field according to the packet field carried by the window function call instruction. Setting a physical window table for each grouping field, storing data of one grouping field content in each physical window table, and allocating N physical window tables to record the data of N grouping field contents respectively and allocating system resources for each physical window table assuming that the grouping field contains N contents.
Illustratively, assume that the window function call instruction is select rank () over (partition by color order by price) from items, which means that a (select) color field is selected from the raw data (items) as a packet field (partition by), and the packet is sorted (rank) with a value field as a sort field (order by), and output (over). Raw data (items) are the table shown in fig. 1.
TABLE 1
Serial number ID Color Value price
1 Red red 10
2 Green 11
3 Blue 6
4 Red red 9
5 Gray grey 9
6 Blue 7
7 Red red 7
8 Red red 10
9 Pink 12
The current approach is to build physical window tables for red, green, blue, gray and pink, respectively, and then sort the physical window tables by value. In a big data scenario, the distribution of actual data is not standard, and the content of the packet field is of a great variety. The number of physical window tables cannot be known until the window function call instruction is executed, and thus there is uncertainty in the buffer size of the configured physical window tables. When the number of physical window tables is excessive, occupied resources are beyond expectations, so that the number of physical window tables cannot respond. Further, the number of records corresponding to the contents of each packet field is very different. The number of records contained in a single physical window table is large, so that the pre-allocated cache cannot support table calculation, and the response speed of a window function call instruction is reduced.
Based on the above, the embodiment of the invention provides a window function call instruction response mode capable of controlling the number of windows, so as to reasonably divide the logic windows and improve the window function processing efficiency, and the specific implementation mode is as follows.
Example 1
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the method may be applied to a case of processing original data in response to a window function, and the method may be executed by a server, and specifically includes the following steps:
step 110, responding to the window function call instruction, and acquiring a target field contained in the window function call instruction.
The window function call instruction may be parsed by a structured query language (Structured Query Language, SQL) to obtain a target field contained by the window function call instruction.
The window function call instruction may be ordered based on the target field of the original data, averaged based on the target field of the original data, maximum calculated based on the target field of the original data, minimum calculated based on the target field of the original data, and so forth. Further, the window function call instruction may be based on performing different operations on multiple target fields of the original data, such as performing a grouping operation on one target field, performing a sorting operation on another target field based on the target field, and so on.
The window function call instruction is used for analyzing the original data, and the analysis object can be one or more target fields in the original data. In the above example, the field to be grouped is determined by configuring the grouping field (partition by), and then the original data is grouped according to the content of the grouping field, where the field configured by the grouping field may be the target field. On this basis, the field which needs to be used as the ordering basis in each logic form is determined by configuring an ordering field (order by), and the field configured by the packet field can be another target field.
Step 120, determining the logic form according to the hash value of the target field.
The original data is the data contained in the database, and the original data contains the target field. The raw data is composed of a plurality of records, each of which may be represented by a row. Each record contains the contents of a plurality of fields including the contents of the target field. The database header includes a plurality of fields, each record containing the contents of a field.
In the above example, the header of Table 1 includes three fields, sequence number, color, and value. Each row in table 1 represents a record that includes the contents of the fields, e.g., in the first record, the content of the sequence number field is "1", the content of the color field is "red", and the content of the value field is "10".
And traversing the target field of each record in the original data in turn. The hash value is determined based on the contents of the target field. Records with the same hash value are taken as a logical form. After traversing the original data, the number of the obtained logic forms is smaller than or equal to the number of the hash values.
And 130, performing window function processing on the logic form.
And respectively carrying out window function processing in each logic form according to the window function call instruction to obtain the processing result of each logic form. And combining the processing results to obtain the processing results of the window function call instruction.
Alternatively, the logical form may be processed by a compute engine in the server. Multiple logical forms may be processed by the compute engine without requiring separate configuration of processing resources for each logical form.
Alternatively, since the number of logical forms is fixed, computing resources may be configured for the logical forms based on the resources allocable in the server. And respectively executing window function processing of each logic form by using the allocated computing resources.
According to the data processing method provided by the invention, a window function call instruction is responded to, and a target field contained in the window function call instruction is obtained; determining a logic form according to the hash value of the target field; and performing window function processing on the logic form. Compared with the prior art that a physical window is configured for each field content, the window allocation is unreasonable, and the window function processing efficiency is poor. One logic form can contain contents of a plurality of target fields with the same hash value, so that physical windows do not need to be respectively established for the contents of each target field, windows are further divided more reasonably, and window function processing efficiency is improved.
Example two
Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention, which is a further explanation of the above embodiment, and includes:
step 210, in response to the window function call instruction, a target field included in the window function call instruction is obtained.
In the embodiment of the present application, the target field may be a packet field.
And 220, acquiring target packet data according to the original data and the packet field, wherein the target packet data is the content of the packet field recorded in any one of the original data.
The original data consists of a plurality of records, and the records are read one by one according to the original data. The contents of the packet field in the record, i.e., the target packet data, are obtained. In the above example, if the packet field is a color field, the target packet data is the content of the color field in the record. For example, a record with a sequence number of "1", the content of the color field thereof, i.e., the target packet data is "red". The content of the color field of the record with the sequence number of "2", namely the target packet data, is "green".
Step 230, determining the hash value of the target packet data according to the hash function.
The target packet data acquired in step 220 is used as a reference to a hash function, and the hash value of the target packet data is calculated using the hash function. The hash function is used for obtaining a hash value corresponding to the input parameter.
Further, before determining the hash value of the target packet data according to the hash function in step 230, the method further includes: determining a hash value of a hash function according to a preset logic form quantity; and configuring the mapping relation between the target packet data and the hash value. Accordingly, step 230, determining the hash value of the target packet data according to the hash function may be implemented by: and determining the hash value of the target packet data according to the mapping relation.
Prior to step 230, the target packet data contained in the target field is acquired in advance. The same target packet data may appear multiple times in the original data. Hash values are respectively configured for each target packet data. The hash value of the hash function can be determined according to the preset logic form quantity. The preset number of logical forms may be a number of logical forms expected by the user. The preset logical form number is smaller than the type number of the target packet data. And configuring a hash value for each target molecule data, establishing a mapping relation between the target packet data and the hash value, and storing the mapping relation. In performing step 230, the hash value mapped by the target packet data is determined according to the stored mapping relationship.
In the above example, the hash function may be configured as follows:
myhash ('red')=1
myhash ('green')=2
myhash ('blue')=3
myhash ('gray') =1
myhash ('pink') =2
Through the configuration, the mapping relation between the target packet data and the hash value is obtained. If the target packet data is 'red', the hash value obtained according to the hash function is '1'. Similarly, a hash value for each target packet data may be obtained.
Step 240, determining a target logic form to which the target packet data belongs according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value.
If the target logical form has been established, the target packet data is added to the target logical form. And searching the content of a corresponding field belonging to the same record as the target packet data from the original data according to the header of the target logical form, wherein the corresponding field is a field contained in the target logical form.
In the target logic form, if target packet data with different values exist, clustering the same kind of target packet data so that the same target packet data are continuously arranged in the target form.
If the target logic form is not established, the header of the processing result is obtained according to the window function call instruction, and the target logic form is established according to the header. The header includes the fields to be analyzed by the window function call instruction.
In the above example, the original data is sequentially read, and after determining the target logical form described by each target packet data read, the target packet data is added to the target logical form. The same target grouping data in the target logic form are arranged continuously. Table 2 is a target logical form with a hash value of "1" for the target packet data in the above example. Table 3 is a target logical form with a hash value of "2" for the target packet data in the above example. Table 4 is a target logical form with a hash value of "3" for the target packet data in the above example.
TABLE 2
Serial number ID Color Value price
1 Red red 10
4 Red red 9
7 Red red 7
8 Red red 10
5 Gray grey 9
TABLE 3 Table 3
Serial number ID Color Value price
3 Blue 6
6 Blue 7
TABLE 4 Table 4
Serial number ID Color Value price
2 Green 11
9 Pink 12
And 250, obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form.
The window function call instruction may be parsed by a structured query language (Structured Query Language, SQL) to obtain a window instruction type contained by the window function call instruction.
The embodiment of the application can be applied to window function call instructions analyzed through fields. The types of window function call instructions include a first type based on the ordering result and a second type based on the extracted value.
The first type includes a rank order result instruction (rank) or a rank order instruction (rowumber). A sort instruction (rowumber) is used to sort based on the target field, to generate a sequence number for each row of records queried, and to sort in turn without repetition. The rank result instruction (rank) is used to return the rank of each row within the partition of the result set. Unlike the sort instruction (rowumber), the sort result instruction (rank) considers the case where the sort field value is the same. The second type includes a sum instruction (sum), an average instruction (avg), a minimum instruction (min), or a maximum instruction (max).
In one implementation, the window instruction type carried by the window function call instruction is a first type. Step 250, obtaining a window processing result according to the window instruction type and the target logic form carried by the window function call instruction, which can be implemented by the following ways:
and if the type of the window instruction carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction. And determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data. In the target logical form, sorting is performed according to the target sorting data and the target grouping data.
The first type may be a sort type. And acquiring the ordering field according to the window function call instruction, and ordering the data in the target logic form according to the content of the ordering field.
Grouping is carried out in the target logic form according to the target grouping data, and the target ordering data in each grouping are not ordered. In each packet, the target ordering data in the same line with the target packet data in the original data is ordered. The ordering may be in ascending or descending order.
Alternatively, sorting according to the target sorting data and the target grouping data may be implemented by: grouping records according to target grouping data in a target logic form to obtain record groups, so that the target ordering data of each group of record groups are the same; in each record packet, the order of records is ordered according to the target ordering data.
Recording packets are obtained from the same target packet data. The arrangement order of the records in the record group is determined according to the target ordering data in the record group.
In the above example, table 2 contains four records with red target ranking data, which form a record group. In this record packet, the order of records within the packet is ordered according to the target ordering data (i.e., the content of the value price field), resulting in a packet ordering result as shown in table 5.
TABLE 5
Serial number ID Color Value price
1 Red red 10
8 Red red 10
4 Red red 9
7 Red red 7
And by analogy, sequencing in each record packet according to the target sequencing data to obtain a sequencing result of each record packet. And combining all the groups, and sequencing the display sequence of the record groups according to the numerical value of the target group data to obtain the window processing result of the window function call instruction.
Illustratively, since the type of the window function call instruction in the above example is rank, based on the ranking, the numerical ranking identifier of the target ranking data in the record packet is obtained according to the numerical value of the target ranking data, and in the obtained result, the ranking field is used to indicate, so as to obtain the dequeued result as shown in table 6.
TABLE 6
Numerical ranking identifier rank Serial number ID Color Value price
1 6 Blue 7
2 3 Blue 6
1 2 Green 11
1 5 Gray grey 9
1 9 Pink 12
1 1 Red red 10
1 8 Red red 10
3 4 Red red 9
4 7 Red red 7
In another implementation, the window function call instruction carries a window instruction type that is a second type. Step 250, obtaining a window processing result according to the window instruction type and the target logic form carried by the window function call instruction, which can be implemented by the following ways:
if the window instruction type carried by the window function call instruction is the second type, respectively acquiring statistical results of a plurality of target logic forms; and determining a window processing result according to the target statistical result and the second type.
The second type may be a sum, average, minimum or maximum of some field in the calculated raw data.
If the window instruction type carried by the window function call instruction is the second type, calculating a statistical result according to the second type in each target logic form. And then, counting the counting results obtained by the plurality of target logic forms to obtain window processing results.
Further, the window instruction type carried by the window function call instruction also includes a parameter limiting the window size. This parameter is used to represent the window size at the time of processing. The above-described first type or second type of processing is performed, for example, based on the data of the first N lines and/or the last N lines recorded in a certain line in the original data. The instruction may be 1) unbounded preceding and current row starting from the current window to the current line, 2) N preceding and current row: current and previous N rows, 3) current row and M following: current and following M rows or 4) N preceding and M following: current row, front N rows and back M rows. The data within the window may be processed in the manner described above.
The data processing method provided by the embodiment of the invention can respond to the window function call instruction comprising the first type, the second type and the third type window function type, so that the original data can be rapidly processed by the computing cluster in a big data scene, and the processing efficiency of the window function is improved.
Example III
Fig. 3 is a schematic structural diagram of a data processing apparatus according to a third embodiment of the present invention, where the present embodiment is applicable to a case where original data is processed in response to a window function, and the apparatus may be executed by a server, and specifically includes: a field acquisition module 310, a logical form determination module 320, and a window function processing module 330.
A field obtaining module 310, configured to obtain, in response to the window function call instruction, a target field included in the window function call instruction;
a logic form determining module 320, configured to determine a logic form according to the hash value of the target field;
and the window function processing module 330 is configured to perform window function processing on the logic form.
On the basis of the above embodiment, the logic form determining module 320 is configured to:
the target field is a grouping field, target grouping data is obtained according to the original data and the grouping field, and the target grouping data is the content of the grouping field recorded in any one of the original data;
determining a hash value of the target packet data according to the hash function;
determining a target logic form to which target packet data belong according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value;
accordingly, the window function processing module 330 is configured to:
and obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form.
On the basis of the above embodiment, the window function processing module 330 is configured to:
if the window instruction type carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction;
determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data;
in the target logical form, sorting is performed according to the target sorting data and the target grouping data.
On the basis of the above embodiment, the window function processing module 330 is configured to: :
grouping records according to target grouping data in a target logic form to obtain record groups, so that the target ordering data of each group of record groups are the same;
in each record packet, the order of records is ordered according to the target ordering data.
On the basis of the embodiment, the method further comprises a hash function configuration module. The hash function configuration module is used for:
determining a hash value of a hash function according to a preset logic form quantity;
configuring a mapping relation between target packet data and a hash value;
accordingly, the window function processing module 330 is configured to:
and determining the hash value of the target packet data according to the mapping relation.
On the basis of the above embodiment, the window function processing module 330 is configured to:
if the window instruction type carried by the window function call instruction is the second type, respectively acquiring statistical results of a plurality of target logic forms;
and determining a window processing result according to the target statistical result and the second type.
In the data processing device provided by the invention, a field acquisition module 310 responds to a window function call instruction to acquire a target field contained in the window function call instruction; the logic form determination module 320 determines a logic form according to the hash value of the target field; the window function processing module 330 performs window function processing on the logical form. Compared with the prior art that a physical window is configured for each field content, the window allocation is unreasonable, and the window function processing efficiency is poor. One logic form can contain contents of a plurality of target fields with the same hash value, so that physical windows do not need to be respectively established for the contents of each target field, windows are further divided more reasonably, and window function processing efficiency is improved.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention, and as shown in fig. 4, the server includes a processor 40, a memory 41, an input device 42 and an output device 44; the number of processors 40 in the server may be one or more, one processor 40 being taken as an example in fig. 4; the processor 40, memory 41, input device 42 and output device 44 in the server may be connected by a bus or other means, for example by a bus connection in fig. 4.
The memory 41 is a computer readable storage medium, and may be used to store software programs, computer executable programs, and modules, such as program instructions/modules (e.g., a field acquisition module 410, a logic form determination module 420, and a window function processing module 440 in a data processing apparatus) corresponding to a locally stored migration method of cloud computing in an embodiment of the present invention. The processor 40 executes various functional applications of the server and data processing, i.e., a migration method of the local storage implementing the cloud computing described above, by running software programs, instructions, and modules stored in the memory 41.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 41 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 41 may further comprise memory located remotely from processor 40, which may be connected to a server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 is operable to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the server. The output device 44 may include a display device such as a display screen.
The instructions, when executed, cause the processor 40 to:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
and performing window function processing on the logic form.
On the basis of the above embodiment, the processor is arranged to determine the logical form by:
the target field is a grouping field, target grouping data is obtained according to the original data and the grouping field, and the target grouping data is the content of the grouping field recorded in any one of the original data;
determining a hash value of the target packet data according to the hash function;
determining a target logic form to which target packet data belong according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value;
accordingly, the processor is configured to perform window function processing on the logical form by:
and obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form.
On the basis of the above embodiment, the processor is arranged to obtain the window processing result by:
if the window instruction type carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction;
determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data;
in the target logical form, sorting is performed according to the target sorting data and the target grouping data.
Based on the above embodiments, the processor is arranged to sort by: grouping records according to target grouping data in a target logic form to obtain record groups, so that the target ordering data of each group of record groups are the same;
in each record packet, the order of records is ordered according to the target ordering data.
On the basis of the above embodiment, the processor is arranged to perform the hash function setting by: determining a hash value of a hash function according to a preset logic form quantity;
configuring a mapping relation between target packet data and a hash value;
accordingly, the processor is arranged to determine the hash value of the target packet data by:
and determining the hash value of the target packet data according to the mapping relation.
On the basis of the above embodiment, the processor is arranged to obtain the window processing result by: if the window instruction type carried by the window function call instruction is the second type, respectively acquiring statistical results of a plurality of target logic forms;
and determining a window processing result according to the target statistical result and the second type.
Example five
A fifth embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the data processing method provided by any of the embodiments of the present invention, the method comprising:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
and performing window function processing on the logic form.
On the basis of the above embodiment, determining the logical form according to the hash value of the target field includes:
the target field is a grouping field, target grouping data is obtained according to the original data and the grouping field, and the target grouping data is the content of the grouping field recorded in any one of the original data;
determining a hash value of the target packet data according to the hash function;
determining a target logic form to which target packet data belong according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value;
correspondingly, the window function processing is carried out on the logic form, which comprises the following steps:
and obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form.
On the basis of the above embodiment, obtaining a window processing result according to a window instruction type and a target logic form carried by a window function call instruction includes:
if the window instruction type carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction;
determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data;
in the target logical form, sorting is performed according to the target sorting data and the target grouping data.
On the basis of the above embodiment, sorting according to the target sorting data and the target grouping data includes:
grouping records according to target grouping data in a target logic form to obtain record groups, so that the target ordering data of each group of record groups are the same;
in each record packet, the order of records is ordered according to the target ordering data.
On the basis of the above embodiment, before determining the hash value of the target packet data according to the hash function, further includes:
determining a hash value of a hash function according to a preset logic form quantity;
configuring a mapping relation between target packet data and a hash value;
accordingly, determining the hash value of the target packet data according to the hash function includes:
and determining the hash value of the target packet data according to the mapping relation.
On the basis of the above embodiment, obtaining a window processing result according to a window instruction type and a target logic form carried by a window function call instruction includes:
if the window instruction type carried by the window function call instruction is the second type, respectively acquiring statistical results of a plurality of target logic forms;
and determining a window processing result according to the target statistical result and the second type. Of course, the storage medium containing computer executable instructions provided in the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the data processing method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method of the embodiments of the present invention.
It should be noted that, in the embodiment of the migration device for local storage of cloud computing, each unit and module included in the migration device are only divided according to functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (8)

1. A method of data processing, comprising:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
performing window function processing on the logic form;
the target field is a packet field, and the determining the logic form according to the hash value of the target field includes:
acquiring target packet data according to the original data and the packet field, wherein the target packet data is the content of the packet field recorded in any one of the original data;
determining a hash value of the target packet data according to a hash function;
determining a target logic form to which the target packet data belongs according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value;
correspondingly, the window function processing for the logic form comprises the following steps:
obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form;
before the hash value of the target packet data is determined according to the hash function, the method further comprises:
and determining the hash value of the hash function according to the preset logic form quantity, wherein the preset logic form quantity is smaller than the type quantity of the target packet data.
2. The method of claim 1, wherein obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form comprises:
if the window instruction type carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction;
determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data;
and in the target logic form, sequencing according to the target sequencing data and the target grouping data.
3. The method of claim 2, wherein said ordering according to said target ordering data and said target packet data comprises:
grouping records according to the target grouping data in the target logic form to obtain record groups, so that the target ordering data of each group of record groups are the same;
in each record packet, the order of records is ordered according to the target ordering data.
4. The method of claim 1, further comprising, prior to determining the hash value of the target packet data according to a hash function:
configuring a mapping relation between target packet data and the hash value;
correspondingly, the determining the hash value of the target packet data according to the hash function includes:
and determining the hash value of the target packet data according to the mapping relation.
5. The method according to claim 1, wherein obtaining the window processing result according to the window instruction type carried by the window function call instruction and the target logic form includes:
if the window instruction type carried by the window function call instruction is the second type, respectively acquiring statistical results of a plurality of target logic forms;
and determining a window processing result according to the target statistical result and the second type.
6. A server comprising a processor and a memory for storing instructions that when executed cause the processor to:
responding to a window function call instruction, and acquiring a target field contained in the window function call instruction;
determining a logic form according to the hash value of the target field;
performing window function processing on the logic form;
the target field is a packet field, and the determining the logic form according to the hash value of the target field includes:
acquiring target packet data according to the original data and the packet field, wherein the target packet data is the content of the packet field recorded in any one of the original data;
determining a hash value of the target packet data according to a hash function;
determining a target logic form to which the target packet data belongs according to the hash value, wherein the target logic form comprises at least one packet data with the same hash value;
correspondingly, the window function processing for the logic form comprises the following steps:
obtaining a window processing result according to the window instruction type carried by the window function call instruction and the target logic form;
before the hash value of the target packet data is determined according to the hash function, the method further comprises:
and determining the hash value of the hash function according to the preset logic form quantity, wherein the preset logic form quantity is smaller than the type quantity of the target packet data.
7. The server according to claim 6, wherein the processor is configured to obtain the window processing result by:
if the window instruction type carried by the window function call instruction is the first type, acquiring an ordering field contained in the window function call instruction;
determining target sorting data according to the original data, the target grouping data and the sorting field, wherein the target sorting data is the content of the sorting field of the target grouping data record in the original data;
and in the target logic form, sequencing according to the target sequencing data and the target grouping data.
8. A storage medium containing computer executable instructions for performing the data processing method of any of claims 1-5 when executed by a computer processor.
CN202110496553.0A 2021-05-07 2021-05-07 Data processing method, device, equipment and storage medium Active CN113204602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110496553.0A CN113204602B (en) 2021-05-07 2021-05-07 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110496553.0A CN113204602B (en) 2021-05-07 2021-05-07 Data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113204602A CN113204602A (en) 2021-08-03
CN113204602B true CN113204602B (en) 2023-08-01

Family

ID=77030211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110496553.0A Active CN113204602B (en) 2021-05-07 2021-05-07 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113204602B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102171680A (en) * 2008-10-05 2011-08-31 微软公司 Efficient large-scale filtering and/or sorting for querying of column based data encoded structures
CN103294831A (en) * 2013-06-27 2013-09-11 中国人民大学 Multidimensional-array-based grouping aggregation calculating method in column storage database
CN103605756A (en) * 2013-11-22 2014-02-26 北京国双科技有限公司 Data processing method and data processing device for on-line analysis processing
CN105930388A (en) * 2016-04-14 2016-09-07 中国人民大学 OLAP grouping aggregation method based on function dependency relationship
CN107992516A (en) * 2017-10-27 2018-05-04 平安科技(深圳)有限公司 Electronic device, the method for data query and storage medium
CN108376169A (en) * 2018-02-26 2018-08-07 众安信息技术服务有限公司 A kind of data processing method and device for on-line analytical processing
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
CN110781183A (en) * 2019-09-10 2020-02-11 中国平安财产保险股份有限公司 Method and device for processing incremental data in Hive database and computer equipment
CN110895479A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Data processing method, device and equipment
WO2020233146A1 (en) * 2019-05-23 2020-11-26 创新先进技术有限公司 Data operation record storage method, system and apparatus, and device
CN112035571A (en) * 2020-08-19 2020-12-04 深圳乐信软件技术有限公司 Data synchronization method, device, equipment and storage medium
CN112307062A (en) * 2020-09-18 2021-02-02 苏宁云计算有限公司 Database aggregation query method, device and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102171680A (en) * 2008-10-05 2011-08-31 微软公司 Efficient large-scale filtering and/or sorting for querying of column based data encoded structures
CN103294831A (en) * 2013-06-27 2013-09-11 中国人民大学 Multidimensional-array-based grouping aggregation calculating method in column storage database
CN103605756A (en) * 2013-11-22 2014-02-26 北京国双科技有限公司 Data processing method and data processing device for on-line analysis processing
CN105930388A (en) * 2016-04-14 2016-09-07 中国人民大学 OLAP grouping aggregation method based on function dependency relationship
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
CN107992516A (en) * 2017-10-27 2018-05-04 平安科技(深圳)有限公司 Electronic device, the method for data query and storage medium
CN108376169A (en) * 2018-02-26 2018-08-07 众安信息技术服务有限公司 A kind of data processing method and device for on-line analytical processing
CN110895479A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Data processing method, device and equipment
WO2020233146A1 (en) * 2019-05-23 2020-11-26 创新先进技术有限公司 Data operation record storage method, system and apparatus, and device
CN110781183A (en) * 2019-09-10 2020-02-11 中国平安财产保险股份有限公司 Method and device for processing incremental data in Hive database and computer equipment
CN112035571A (en) * 2020-08-19 2020-12-04 深圳乐信软件技术有限公司 Data synchronization method, device, equipment and storage medium
CN112307062A (en) * 2020-09-18 2021-02-02 苏宁云计算有限公司 Database aggregation query method, device and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hive的利器:强大而实用的开窗函数;大数据分享与学习;《微信公众平台》;全文 *
IM~2:一种改进的MIN/MAX窗口函数优化技术;宋光旋;赵大鹏;王晓玲;;华东师范大学学报(自然科学版)(第01期);全文 *

Also Published As

Publication number Publication date
CN113204602A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
US9633104B2 (en) Methods and systems to operate on group-by sets with high cardinality
CN111913955A (en) Data sorting processing device, method and storage medium
CN107077513B (en) Communication for efficient repartitioning of data
US10904107B2 (en) Service resource management system and method thereof
CN112527848B (en) Report data query method, device and system based on multiple data sources and storage medium
CN108733790B (en) Data sorting method, device, server and storage medium
CN110955704A (en) Data management method, device, equipment and storage medium
WO2015057214A1 (en) Regulating enterprise database warehouse resource usage
CN115964374B (en) Query processing method and device based on pre-calculation scene
US20240095260A1 (en) Multi-subgraph matching method and apparatus, and device
CN114237908A (en) Resource arrangement optimization method and system for edge computing
WO2021147815A1 (en) Data calculation method and related device
CN107273413B (en) Intermediate table creating method, intermediate table inquiring method and related devices
CN113204602B (en) Data processing method, device, equipment and storage medium
CN110909072B (en) Data table establishment method, device and equipment
CN116126862A (en) Data table association method, device, equipment and storage medium
CN109918277A (en) Electronic device, the evaluation method of system log cluster analysis result and storage medium
CN109885651A (en) A kind of question pushing method and device
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN110929207B (en) Data processing method, device and computer readable storage medium
CN114579506A (en) Inter-processor communication method, system, storage medium, and processor
CN110309177B (en) Data processing method and related device
CN111782688A (en) Request processing method, device and equipment based on big data analysis and storage medium
CN112905351B (en) GPU and CPU load scheduling method, device, equipment and medium
CN108491541A (en) One kind being applied to distributed multi-dimensional database conjunctive query method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant