CN111708809B - Associated query method, device, equipment and storage medium based on data inclination - Google Patents

Associated query method, device, equipment and storage medium based on data inclination Download PDF

Info

Publication number
CN111708809B
CN111708809B CN202010581205.9A CN202010581205A CN111708809B CN 111708809 B CN111708809 B CN 111708809B CN 202010581205 A CN202010581205 A CN 202010581205A CN 111708809 B CN111708809 B CN 111708809B
Authority
CN
China
Prior art keywords
data
data set
query request
association
tilt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010581205.9A
Other languages
Chinese (zh)
Other versions
CN111708809A (en
Inventor
李慎刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010581205.9A priority Critical patent/CN111708809B/en
Publication of CN111708809A publication Critical patent/CN111708809A/en
Application granted granted Critical
Publication of CN111708809B publication Critical patent/CN111708809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of big data, and discloses a data inclination-based associated query method, device, equipment and storage medium, which are used for reducing the probability of failure in querying associated data. The associated query method based on data tilting comprises the following steps: reading the first form data, the second form data and the corresponding data amount; obtaining a first non-inclined data set, a first inclined data set, a second non-inclined data set, a second inclined data set or a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set according to the first table data and the second table data; and determining a first target data set or a second target data set based on the plurality of tilted data sets, the plurality of non-tilted data sets and the associated query request, and transmitting the first target data set or the second target data set to the target terminal.

Description

Associated query method, device, equipment and storage medium based on data inclination
Technical Field
The present invention relates to the field of big data, and in particular, to a data inclination-based association query method, apparatus, device, and storage medium.
Background
Currently, when processing large amounts of data from different applications and data sources, large-scale memory computing platforms, such as Spark computing platforms, are widely used, which is a fast and general-purpose computing engine designed for large-scale data processing, and can be used to perform a variety of operations, including SQL queries, text processing, machine learning, and the like. The basic principle of Spark computing engines is to divide data into small pieces of time, and process these small pieces of data in a manner similar to batch processing.
In the prior art, when SQL query such as left association, full connection and equivalent connection is performed on two tables based on Spark, if data inclination occurs, some hot spot data needs to be scattered and distributed on other nodes according to the specific distribution condition of the data, and a great deal of time is required to analyze and process the problem of data inclination, so that the problems of low data association query efficiency and high failure rate are caused.
Disclosure of Invention
The invention mainly aims to solve the problems of low query efficiency and high query failure rate caused by data inclination when data association query is carried out.
The first aspect of the invention provides a data inclination-based association query method, which comprises the following steps: acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain a first data volume and a second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request; when at least one of the first data amount and the second data amount is larger than a tilting threshold value, judging whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount; if the first data volume is larger than the second data volume, a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set are obtained according to the first table data and the second table data; determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to the target terminal, wherein the first target data set is a first equal-value connection data set, a first left association data set or a first full-connection data set; if the second data volume is larger than the first data volume, a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set are obtained according to the first table data and the second table data; and determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equivalent connection data set, a second left association data set or a second full connection data set.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining an association query request of the target terminal, reading first table data and second table data based on the association query request, and counting data amounts of the first table data and the second table data to obtain a first data amount and a second data amount, where the association query request is an equivalent connection query request, a left association query request, or a full connection query request, and includes: acquiring an association query request of a target terminal, reading first table data and second table data based on the association query request, dividing the first table data into a plurality of first column data, and dividing the second table data into a plurality of second column data; performing data processing on the plurality of first column data to obtain a plurality of first sub-table data, and counting the data quantity of the plurality of first sub-table data to obtain a plurality of first sub-table data quantity; performing data processing on the plurality of second column data to obtain a plurality of second sub-table data, and counting the data quantity of the plurality of second sub-table data to obtain a plurality of second sub-table data quantity; adding each first sub-table data volume in the plurality of first sub-table data volumes to obtain a first data volume; and adding each second sub-table data volume in the plurality of second sub-table data volumes to obtain a second data volume.
Optionally, in a second implementation manner of the first aspect of the present invention, if the first data amount is greater than the second data amount, obtaining the first non-oblique data set, the first oblique data set, the second non-oblique data set, and the second oblique data set according to the first table data and the second table data includes: if the first data volume is larger than the second data volume, processing the first table data into first marked data, and carrying out left association on the second table data and the first marked data to obtain a first result set, wherein the first result set comprises a plurality of first small data identifiers; extracting a data set with a first small data identifier as a null value from the first result set, and re-adding the first small data identifier which is not the null value to obtain a first non-inclined data set; extracting a data set with a first small data identifier which is not null value from the first result set to obtain a first inclined data set; adding a first big data identifier to the second table data to obtain first table identifier data, and performing left association on the first table identifier data and the first mark data to obtain a second result set, wherein the second result set comprises a plurality of first small table list data; extracting a data set with the first small list data as a null value from the second result set, and deleting the corresponding first small list data to obtain a second non-inclined data set; and extracting a data set with the first small list data not being null value from the second result set, deleting the corresponding first list data, and obtaining a second inclined data set.
Optionally, in a third implementation manner of the first aspect of the present invention, the determining a first target data set according to the first non-oblique data set, the first oblique data set, the second non-oblique data set, the second oblique data set, and the association query request, and transmitting the first target data set to the target terminal, where the first target data set is a first equal-value connection data set, a first left-association data set, or a first full-connection data set includes: fully connecting the first non-inclined data set with the second non-inclined data set to obtain a first initial full data set, and fully connecting the first inclined data set with the second inclined data set to obtain a second initial full data set; combining the first initial full data set with the second initial full data set to obtain a first full data set; when the association query request is the equivalent connection query request, extracting a data set with a first small data identifier which is not null and a data set with a first large data identifier which is not null from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first large data identifiers to obtain a first equivalent connection data set; when the association query request is the left association query request, extracting a data set with the first big data identifier not being a null value from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first big data identifiers to obtain a first left association data set; and when the association query request is the full-connection query request, extracting the first full-data set, deleting the plurality of first small data identifiers and the plurality of first big data identifiers, and obtaining the first full-connection data set.
Optionally, in a fourth implementation manner of the first aspect of the present invention, if the second data amount is greater than the first data amount, obtaining a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set according to the first table data and the second table data includes: if the second data volume is larger than the first data volume, processing the second table data into second marked data, and performing left association on the first table data and the second marked data to obtain a third result set, wherein the third result set comprises a plurality of second small data identifiers; extracting a data set with the second small data identifier as a null value from the third result set, and re-adding the second small data identifier which is not the null value to obtain a third non-inclined data set; extracting a data set with a second small data identifier which is not null value from the third result set to obtain a third inclined data set; adding a second big data identifier to the first form data to obtain second form identifier data, and performing left association on the second form identifier data and the second marker data to obtain a fourth result set, wherein the fourth result set comprises a plurality of second small list data; extracting a data set with the second small list data as a null value from the fourth result set, and deleting the corresponding second small list data to obtain a fourth non-inclined data set; and extracting a data set with the second small list data not being null value from the fourth result set, deleting the corresponding second list data, and obtaining a fourth inclined data set.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the determining, according to the third non-oblique data set, the third oblique data set, the fourth non-oblique data set, the fourth oblique data set, and the association query request, a second target data set, and transmitting the second target data set to the target terminal, where the second target data set is a second equivalent connection data set, a second left association data set, or a second full connection data set includes: fully connecting the third non-inclined data set with the fourth non-inclined data set to obtain a third initial full data set, and fully connecting the third inclined data set with the fourth inclined data set to obtain a fourth initial full data set; combining the third initial full data set with the fourth initial full data set to obtain a second full data set; when the association query request is the equivalent connection query request, extracting a data set with a second small data identifier which is not null and a data set with a second large data identifier which is not null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second large data identifiers to obtain a second equivalent connection data set; when the association query request is the left association query request, extracting a data set with second big data identifiers not being null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second big data identifiers to obtain a second left association data set; and when the association query request is the full-connection query request, extracting the second full-data set, deleting the plurality of second small data identifiers and the plurality of second large data identifiers, and obtaining a second full-connection data set.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set, and the association query request, and transmitting the second target data set to the target terminal, the second target data set is a second equivalent connection data set, a second left association data set, or a second full connection data set, the data-inclination-based association query method further includes: and when the first data volume is smaller than or equal to the inclination threshold value and the second data volume is smaller than or equal to the inclination threshold value, correspondingly connecting the first table data and the second table data according to the association query request to obtain a third target data set, wherein the third target data set is a third equivalent connection data set, a third left association data set or a third full connection data set.
The second aspect of the present invention provides a data-tilting-based associative query apparatus, including: the data acquisition module is used for acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain a first data volume and a second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request; the judging module is used for judging whether the first data volume is larger than the second data volume or the second data volume is larger than the first data volume when at least one of the first data volume and the second data volume is larger than a tilting threshold value; the first data set extraction module is used for obtaining a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set according to the first table data and the second table data if the first data amount is larger than the second data amount; the first association module is used for determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to the target terminal, wherein the first target data set is a first equivalent connection data set, a first left association data set or a first full connection data set; the second data set extraction module is used for obtaining a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set according to the first table data and the second table data if the second data amount is larger than the first data amount; and the second association module is used for determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equal value connection data set, a second left association data set or a second full connection data set.
Optionally, in a first implementation manner of the second aspect of the present invention, the data acquisition module is specifically configured to: acquiring an association query request of a target terminal, reading first table data and second table data based on the association query request, dividing the first table data into a plurality of first column data, and dividing the second table data into a plurality of second column data; performing data processing on the plurality of first column data to obtain a plurality of first sub-table data, and counting the data quantity of the plurality of first sub-table data to obtain a plurality of first sub-table data quantity; performing data processing on the plurality of second column data to obtain a plurality of second sub-table data, and counting the data quantity of the plurality of second sub-table data to obtain a plurality of second sub-table data quantity; adding each of the plurality of second sub-table data amounts to obtain a first data amount; and adding each second sub-table byte quantity in the plurality of second sub-table byte quantities to obtain a second data quantity.
Optionally, in a second implementation manner of the second aspect of the present invention, the first data set extraction module is specifically configured to: if the first data volume is larger than the second data volume, processing the first table data into first marked data, and carrying out left association on the second table data and the first marked data to obtain a first result set, wherein the first result set comprises a plurality of first small data identifiers; extracting a data set with a first small data identifier as a null value from the first result set, and re-adding the first small data identifier which is not the null value to obtain a first non-inclined data set; extracting a data set with a first small data identifier which is not null value from the first result set to obtain a first inclined data set; adding a first big data identifier to the second table data to obtain first table identifier data, and performing left association on the first table identifier data and the first mark data to obtain a second result set, wherein the second result set comprises a plurality of first small table list data; extracting a data set with the first small list data as a null value from the second result set, and deleting the corresponding first small list data to obtain a second non-inclined data set; and extracting a data set with the first small list data not being null value from the second result set, deleting the corresponding first list data, and obtaining a second inclined data set.
Optionally, in a third implementation manner of the second aspect of the present invention, the first association module is specifically configured to: fully connecting the first non-inclined data set with the second non-inclined data set to obtain a first initial full data set, and fully connecting the first inclined data set with the second inclined data set to obtain a second initial full data set; combining the first initial full data set with the second initial full data set to obtain a first full data set; when the association query request is the equivalent connection query request, extracting a data set with a first small data identifier which is not null and a data set with a first large data identifier which is not null from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first large data identifiers to obtain a first equivalent connection data set; when the association query request is the left association query request, extracting a data set with the first big data identifier not being a null value from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first big data identifiers to obtain a first left association data set; and when the association query request is the full-connection query request, extracting the first full-data set, deleting the plurality of first small data identifiers and the plurality of first big data identifiers, and obtaining the first full-connection data set.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the second data set extraction module is specifically configured to: if the second data volume is larger than the first data volume, processing the second table data into second marked data, and performing left association on the first table data and the second marked data to obtain a third result set, wherein the third result set comprises a plurality of second small data identifiers; extracting a data set with the second small data identifier as a null value from the third result set, and re-adding the second small data identifier which is not the null value to obtain a third non-inclined data set; extracting a data set with a second small data identifier which is not null value from the third result set to obtain a third inclined data set; adding a second big data identifier to the first form data to obtain second form identifier data, and performing left association on the second form identifier data and the second marker data to obtain a fourth result set, wherein the fourth result set comprises a plurality of second small list data; extracting a data set with the second small list data as a null value from the fourth result set, and deleting the corresponding second small list data to obtain a fourth non-inclined data set; and extracting a data set with the second small list data not being null value from the fourth result set, deleting the corresponding second list data, and obtaining a fourth inclined data set.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the second association module is specifically configured to: fully connecting the third non-inclined data set with the fourth non-inclined data set to obtain a third initial full data set, and fully connecting the third inclined data set with the fourth inclined data set to obtain a fourth initial full data set; combining the third initial full data set with the fourth initial full data set to obtain a second full data set; when the association query request is the equivalent connection query request, extracting a data set with a second small data identifier which is not null and a data set with a second large data identifier which is not null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second large data identifiers to obtain a second equivalent connection data set; when the association query request is the left association query request, extracting a data set with second big data identifiers not being null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second big data identifiers to obtain a second left association data set; and when the association query request is the full-connection query request, extracting the second full-data set, deleting the plurality of second small data identifiers and the plurality of second large data identifiers, and obtaining a second full-connection data set.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the data inclination-based association query device further includes: and the third association module is used for correspondingly connecting the first form data and the second form data according to the association query request when the first data volume is smaller than or equal to the inclination threshold value and the second data volume is smaller than or equal to the inclination threshold value, so as to obtain a third target data set, wherein the third target data set is a third equivalent connection data set, a third left association data set or a third full connection data set.
A third aspect of the present invention provides an association query device based on data tilting, comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the instructions in the memory to cause the data-tilt-based associative query apparatus to perform the data-tilt-based associative query method described above.
A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described data tilt-based associative query method.
In the technical scheme provided by the invention, an association query request of a target terminal is obtained, first form data and second form data are read based on the association query request, and the data volume of the first form data and the data volume of the second form data are counted to obtain the first data volume and the second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request; when at least one of the first data amount and the second data amount is larger than a tilting threshold value, judging whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount; if the first data volume is larger than the second data volume, a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set are obtained according to the first table data and the second table data; determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to the target terminal, wherein the first target data set is a first equal-value connection data set, a first left association data set or a first full-connection data set; if the second data volume is larger than the first data volume, a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set are obtained according to the first table data and the second table data; and determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equivalent connection data set, a second left association data set or a second full connection data set. In the embodiment of the invention, the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification are extracted from the first table data and the second table data according to the first table data volume and the second table data volume, and the target data set is acquired based on the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification, so that the efficiency of inquiring the associated data is improved, and the probability of failed inquiring the associated data is reduced.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a related query method based on data skew in an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a related query method based on data skew in an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a related query device based on data skew in an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a related query device based on data skew in an embodiment of the present invention;
FIG. 5 is a schematic diagram of an embodiment of a related query device based on data tilting in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a data-inclination-based associated query method, a data-inclination-based associated query device and a data-storage medium.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and one embodiment of a data tilt-based association query method in an embodiment of the present invention includes:
101. acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain the first data volume and the second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request;
the service obtains an equivalent connection query request, a left association query request or a full connection query request from the target terminal, reads the first form data and the second form data according to the equivalent connection query request, the left association query request or the full connection query request, and reads the first data volume corresponding to the first form data and the second data volume corresponding to the second form data.
The association inquiry request of the target terminal is a left association (left join) request, an equal-value connection (inner join) request, or a full-connection (full join) request. The left association query request can be understood as taking the left table of the two specified tables as a main table, namely a first table, and reserving all data of the first table and data of a second table partially conforming to the association condition; equivalent connection may be understood as retaining data in the first and second tables that are equal in field; full connection is understood to be the union that retains left and right associated data.
The server acquires first form data and second form data which need to be connected with data, and counts the data volume of the first form data and the data volume of the second form according to the byte number or the record number to obtain the first data volume and the second data volume.
For example, the first table data and the second table data obtained by analyzing the association query request are the a table data and the B table data, and the specific table data are shown in the following table 1 and the following table 2:
Table 1: form data A, i.e. first form data
user_id enterprise_id
Zhangsan E1
Lisi E1
Wangwu E1
Table 2: b form data, i.e. second form data
user_id Age
Zhangsan 18
Lisi 19
Zhaoliu 20
In the above table, user_id is a person name, enterprise _id is an enterprise name, and Age is Age. The server respectively counts the data of a user_id column and the data of a enterprise _id column in the A table data to obtain the byte numbers of the two sub-tables or the record numbers of the two sub-tables, and adds the byte numbers of the two sub-tables and the record numbers of the two sub-tables to obtain the A data volume; the server respectively counts the data of the user_id column and the data of the Age column in the B table data to obtain the byte numbers of the two sub-tables or the record numbers of the two sub-tables, and adds the byte numbers of the two sub-tables and the record numbers of the two sub-tables to obtain the B data volume.
It can be understood that the execution subject of the present invention may be a data-tilting-based association query device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
102. When at least one of the first data amount and the second data amount is larger than the inclination threshold value, judging whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount;
When any one of the first data amount and the second data amount is larger than the inclination threshold, the server judges whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount, if the first data amount is larger than the second data amount, the first form data is processed, and then corresponding data association is carried out through the first form data, the second form data and the processed first form data; and if the second data amount is larger than the inclination threshold value, processing the second table data, and then carrying out corresponding data association through the first table data, the second table data and the processed second table data. When the first data quantity and the second data quantity are not larger than the inclination threshold value, the server firstly broadcasts the form data with smaller data quantity into each node of the associated thread, and then directly carries out data left association, data equivalent connection or data full connection on the first form data and the second form data according to the association query request.
103. If the first data volume is larger than the second data volume, obtaining a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set according to the first table data and the second table data;
If the server determines that the first data amount is greater than the second data amount, the server extracts a first non-oblique data set, a first oblique data set, a second non-oblique data set, and a second oblique data set based on the first table data and the second table data.
For example, assume that the tilt threshold is measured in recording lines, the tilt threshold is 10kw, the first data amount is 12kw, and the second data amount is 11kw. It can be seen that the first data size is larger than the second data size, the server processes the first table data, and after the first table data is processed, the server obtains the first non-tilt data set, the first tilt data set, the second non-tilt data set and the second tilt data set according to the first table data, the second table data and the processed first table data.
The tilt threshold may be measured in bytes or in the number of recording lines.
104. Determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to a target terminal, wherein the first target data set is a first equivalent connection data set, a first left association data set or a first full connection data set;
The server determines a first equivalent connection data set, a first left association data set or a first full connection data set according to the association query request, the first non-tilt data, the first tilt data set, the second non-tilt data set and the second tilt data set, and transmits the first equivalent connection data set, the first left association data set or the first full connection data set to the target terminal.
The server fully connects the first non-inclined data set and the second non-inclined data set to obtain a first transition data set; the server fully connects the first inclined data set and the second inclined data set to obtain a second transition data set; and merging the first transition data set and the second transition data set to obtain a first complete transition data set. And the server correspondingly correlates the first transition data set and the second transition data set according to the correlation query request to obtain a first target data set. Assuming that the association query request is an equivalent connection query request, the server reserves partial data in the first complete transition data set and deletes partial data, so as to obtain a first equivalent connection data set; assuming that the association query request is a left association query request, the server reserves partial data in the first complete transition data set and deletes partial data, so as to obtain a first left association data set; and assuming that the association query request is a full connection query request, reserving a data part in the first complete transition data set, and deleting the identification part, so that the first full connection data set is obtained. Specifically, which data is reserved and which data is deleted, refer to step 204.
105. If the second data volume is larger than the first data volume, obtaining a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set according to the first table data and the second table data;
if the server determines that the second data amount is greater than the first data amount, the server extracts a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set based on the first table data and the second table data.
For example, assume that the tilt threshold is measured in bytes, the tilt threshold is 200M, the first amount of data is 210M, and the second amount of data is 220M. It can be seen that the second data amount is larger than the first data amount, the server processes the second table data, and after processing the second table data, the server obtains a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set through the first table data, the second table data, and the processed second table data.
The tilt threshold may be measured in bytes or in the number of recording lines.
106. And determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equivalent connection data set, a second left association data set or a second full connection data set.
The server determines a second equivalent connection data set, a second left association data set or a second full connection data set according to the association query request, the third non-tilt data, the third tilt data set, the fourth non-tilt data set and the fourth tilt data set, and transmits the second equivalent connection data set, the second left association data set or the second full connection data set to the target terminal.
The server fully connects the third non-inclined data set and the fourth non-inclined data set to obtain a third transition data set; the server fully connects the third inclined data set and the fourth inclined data set to obtain a fourth transition data set; and combining the third transition data set and the fourth transition data set to obtain a second complete transition data set. And the server correspondingly correlates the third transition data set and the fourth transition data set according to the correlation query request to obtain a second target data set. Assuming that the association query request is an equivalent connection query request, the server reserves partial data in the second complete transition data set and deletes partial data, so as to obtain a first equivalent connection data set; assuming that the associated query request is a left associated query request, the server reserves partial data in the second complete transition data set and deletes partial data, thereby obtaining a first left associated data set; and if the association query request is a full connection query request, partial data is reserved in the second complete transition data set, and the identification part is deleted, so that the first full connection data set is obtained. For the left association query, the equivalent connection query, and the full connection query, see description of step 206, which data needs to be specifically retained and deleted in the second full transition data set.
It should be noted that the present invention also relates to blockchain technology, and the first target data set and the second target data set may be stored in a blockchain.
In the embodiment of the invention, the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification are extracted from the first table data and the second table data according to the first table data volume and the second table data volume, and the target data set is acquired based on the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification, so that the efficiency of inquiring the associated data is improved, and the probability of failed inquiring the associated data is reduced.
Referring to fig. 2, another embodiment of the related query method based on data tilting in an embodiment of the present invention includes:
201. Acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain the first data volume and the second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request;
the service obtains an equivalent connection query request, a left association query request or a full connection query request from the target terminal, reads the first form data and the second form data according to the equivalent connection query request, the left association query request or the full connection query request, and reads the first data volume corresponding to the first form data and the second data volume corresponding to the second form data.
Specifically, the server divides the first table data into a plurality of first column data by adopting a group by function, the plurality of first column data comprises a plurality of complete first column data, and the server performs data processing on each first column data in the plurality of first column data to obtain a plurality of independent first sub-table data; the server counts the data volume of the plurality of independent first sub-table data to obtain a plurality of independent first sub-table data volume, and the server adds the plurality of independent first sub-table data volume to obtain a first data volume. The server divides the second table data into a plurality of second column data by adopting a group by function, the plurality of second column data comprises a plurality of complete second column data, and the server performs data processing on each two second column data in the plurality of second column data to obtain a plurality of independent second sub-table data; the server counts the data volume of the plurality of independent second sub-table data to obtain a plurality of independent second sub-table data volume, and the server adds the plurality of independent second sub-table data volume to obtain a second data volume.
202. When at least one of the first data amount and the second data amount is larger than the inclination threshold value, judging whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount;
When any one of the first data amount and the second data amount is larger than the inclination threshold, the server judges whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount, if the first data amount is larger than the second data amount, the first form data is processed, and then corresponding data association is carried out through the first form data, the second form data and the processed first form data; and if the second data amount is larger than the inclination threshold value, processing the second table data, and then carrying out corresponding data association through the first table data, the second table data and the processed second table data. When the first data quantity and the second data quantity are not larger than the inclination threshold value, the server firstly broadcasts the form data with smaller data quantity into each node of the associated thread, and then directly carries out data left association, data equivalent connection or data full connection on the first form data and the second form data according to the association query request.
203. If the first data volume is larger than the second data volume, obtaining a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set according to the first table data and the second table data;
If the server determines that the first data amount is greater than the second data amount, the server extracts a first non-oblique data set, a first oblique data set, a second non-oblique data set, and a second oblique data set based on the first table data and the second table data.
Specifically, if the server determines that the first data volume is greater than the second data volume, processing the first form data into first mark data, and performing left association on the second form data and the first mark data to obtain a first result set comprising a plurality of first small data identifiers; the server extracts a data set with the first small data identifier as a null value from the first result set, and re-adds the first small data identifier which is not the null value to obtain a first non-inclined data set; the server extracts a data set with a first small data identifier not being a null value from the first result set to obtain a first inclined data set; then the server adds a first big data identifier for the second form data to obtain first form identifier data, and associates the first form identifier data with the first mark data to the left to obtain a second result set comprising a plurality of first small form list data; extracting a data set with the first small list data as a null value from the second result set, and deleting the corresponding first small list data, thereby obtaining a second non-inclined data set; and extracting the data set with the first small list data not being null from the second result set, and deleting the corresponding first list data, thereby obtaining a second inclined data set.
204. Determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to a target terminal, wherein the first target data set is a first equivalent connection data set, a first left association data set or a first full connection data set;
The server determines a first equivalent connection data set, a first left association data set or a first full connection data set according to the association query request, the first non-tilt data, the first tilt data set, the second non-tilt data set and the second tilt data set, and transmits the first equivalent connection data set, the first left association data set or the first full connection data set to the target terminal.
Specifically, the server fully connects the first non-inclined data set with the second non-inclined data set, and fully connects the first inclined data set with the second inclined data set to obtain a first initial full data set and a second initial full data set; secondly, the server extracts a union set of the first initial full data set and the second full data set to obtain a first full data set; then when the associated query request is an equivalent connection query request, the server extracts a data set with a first small data identifier which is not null and a data set with a first large data identifier which is not null from the first full data set, and deletes a plurality of corresponding first small data identifiers and a plurality of first large data identifiers, so as to obtain a first equivalent connection data set; when the associated query request is a left associated query request, the server extracts a data set with a first big data identifier which is not null from the first full data set, and deletes a plurality of corresponding first small data identifiers and a plurality of first big data identifiers to obtain a first left associated data set; when the associated query request is a full connection query request, the server extracts a first full data set, and deletes a plurality of first small data identifiers and a plurality of first large data identifiers in the first full data set, thereby obtaining a first full connection data set.
205. If the second data volume is larger than the first data volume, obtaining a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set according to the first table data and the second table data;
if the server determines that the second data amount is greater than the first data amount, the server extracts a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set based on the first table data and the second table data.
Specifically, if the server determines that the second data amount is greater than the first data amount, processing the second table data into second marked data, and performing left association on the first table data and the second marked data to obtain a third result set comprising a plurality of second small data identifiers; the server extracts the data set with the second small data identifier as a null value from the third result set, and re-adds the second small data identifier which is not the null value to obtain a third non-inclined data set; extracting a data set with the second small data identifier not being a null value from the third result set to obtain a third inclined data set; the server adds a second big data identifier for the first form data to obtain second form identifier data, and associates the second form identifier data with second mark data to the left to obtain a fourth result set comprising a plurality of second small form list data; extracting a data set with the second small list data as a null value from the fourth result set, and deleting the corresponding second small list data to obtain a fourth non-inclined data set; and extracting a data set with the second small list data not being null from the fourth result set, and deleting the corresponding second small list data to obtain a fourth inclined data set.
For ease of understanding, step 205 is specifically described below in connection with the actual application scenario:
after broadcasting the first table data entry, second tag data is obtained, and the second tag data is as shown in table 3 below:
Table 3: second marking data
user_id enterprise_id 1-smaller_mark
Zhangsan E1 1
Wangwu E1 1
The server left correlates the first table data with the second tag data to obtain a third result set as shown in table 4 below:
table 4: third result set
user_id Age enterprise_id 1-smaller_mark
Zhangsan 18 E1 1
Lisi 19 Null Null
Zhaoliu 20 Null Null
The column data corresponding to 1-smaller _mark is a plurality of second small data identifiers, the server extracts a data set with the second small data identifier being Null value Null from the third result set, and re-adds the second small data identifier which is not Null value, and the obtained third non-inclined data set is specifically shown in the following table 5:
Table 5: third non-oblique data set
user_id Age enterprise_id 1-smaller_mark
Lisi 19 Null 1
Zhaoliu 20 Null 1
The server extracts the data set with the second small data identifier not being Null from the third result set to obtain a third inclined data set, as shown in the following table 6:
Table 6: third inclination data set
user_id Age enterprise_id 1-smaller_mark
Zhangsan 18 E1 1
The server adds a second big data identifier to the first form data to obtain second form identifier data, and associates the second form identifier data with the second tag data to the left to obtain a fourth result set, wherein the fourth result set is specifically shown in the following table 7:
Table 7: fourth result set
user_id enterprise_id bigger_mark 2-smaller_mark
Zhangsan E1 1 1
Lisi E1 1 Null
Wangwu E1 1 1
Wherein, the plurality of table data in the column data corresponding to bigger _mark are a plurality of second big data identifications, and the plurality of second small table column data in the column data corresponding to 2-smaller _mark. The server extracts the data set with the Null value Null of the second small list data from the fourth result set, and deletes the corresponding second small list data to obtain a fourth non-inclined data set as shown in the following table 8:
Table 8: fourth non-oblique data set:
user_id enterprise_id bigger_mark
lisi E1 1
the server extracts the data set of which the second small list data is not Null from the fourth result set, and deletes the corresponding second small list data to obtain a fourth inclined data set as shown in the following table 9:
Table 9: fourth inclination data set
user_id enterprise_id bigger_mark
zhangsan E1 1
wangwu E1 1
206. Determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equivalent connection data set, a second left association data set or a second full connection data set;
the server determines a second equivalent connection data set, a second left association data set or a second full connection data set according to the association query request, the third non-tilt data, the third tilt data set, the fourth non-tilt data set and the fourth tilt data set, and transmits the second equivalent connection data set, the second left association data set or the second full connection data set to the target terminal.
Specifically, the server fully connects the third non-inclined data set with the fourth non-inclined data set, and fully connects the third inclined data set with the fourth inclined data set to obtain a third initial full data set and a fourth initial full data set; the server obtains a second full data set by taking the union of the third initial full data set and the fourth initial full data set; when the associated query request is an equivalent connection query request, extracting a data set with second small data identifiers and a plurality of second large data identifiers which are not null in the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of second large data identifiers, so as to obtain a second equivalent connection data set; when the associated query request is a left associated query request, extracting a data set with second big data identifiers which are not null values from a second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of second big data identifiers so as to obtain a second left associated data set; and when the associated query request is a full-connection query request, extracting a second full data set and deleting a plurality of second small data identifiers and a plurality of second large data identifiers in the second full data set, so as to obtain the second full-connection data set.
For ease of understanding, step 206 is specifically described below in connection with the actual application scenario:
The third initial full dataset obtained after the server fully concatenates the third non-oblique dataset with the fourth non-oblique dataset is specifically shown in table 10 below:
Table 10: third initial full dataset
user_id enterprise_id Age bigger_mark 1-smaller_mark
lisi E1 19 1 1
zhaoliu E1 20 Null 1
The fourth initial full dataset obtained after the server fully connects the third oblique dataset with the fourth oblique dataset is shown in table 11 below:
Table 11: fourth initial full dataset
user_id enterprise_id Age bigger_mark 1-smaller_mark
Zhangsan E1 18 1 1
Wangwu E1 Null 1 Null
The server takes the union of the third initial full dataset and the fourth initial full dataset, and the second full dataset is obtained as shown in table 12 below:
Table 12: second full dataset
user_id enterprise_id Age bigger_mark 1-smaller_mark
Zhangsan E1 18 1 1
Lisi E1 19 1 1
Wangwu E1 Null 1 Null
Zhaoliu E1 Null Null Null
When the associated query request is an equivalent connection query request, the second equivalent connection dataset extracted by the server is as shown in Table 13 below:
Table 13: second equivalent join dataset
user_id enterprise_id Age
Zhansgan E1 18
Lisi E1 19
When the associated query request is a left associated query request, the second left associated dataset extracted by the server is as shown in Table 14 below:
table 14: second left associated data set
user_id enterprise_id Age
Zhansgan E1 18
Lisi E1 19
Wangwu E1 Null
When the associated query request is a full connection query request, the second full connection dataset extracted by the server is as shown in Table 15 below:
table 15: second full connection data set
user_id enterprise_id Age
Zhansgan E1 18
Lisi E1 19
Wangwu E1 Null
Zhaoliu Null 20
207. When the first data volume is smaller than or equal to the inclination threshold value and the second data volume is smaller than or equal to the inclination threshold value, corresponding connection is carried out on the first table data and the second table data according to the association query request, and a third target data set is obtained, wherein the third target data set is a third equivalent connection data set, a third left association data set or a third full connection data set.
When the first data volume is smaller than or equal to the inclination threshold value and the second data volume is smaller than or equal to the inclination threshold value, broadcasting the data with smaller data volume in the first table data and the second table data to each node of the associated thread, and then directly carrying out left association, full connection or equivalent connection on the first table data and the second table data by the server according to the associated query request.
For example, the tilt threshold is 200M, the first data amount is 150M, the second data amount is 170M, the server first broadcasts the first form data to each node of the associated thread, and then performs corresponding data association on the first form data and the second form data according to the associated query request.
In the embodiment of the invention, the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification are extracted from the first table data and the second table data according to the first table data volume and the second table data volume, and the target data set is acquired based on the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification, so that the efficiency of inquiring the associated data is improved, and the probability of failed inquiring the associated data is reduced.
The above description is made on the related query method based on data tilting in the embodiment of the present invention, and the following description is made on the related query device based on data tilting in the embodiment of the present invention, referring to fig. 3, and one embodiment of the related query device based on data tilting in the embodiment of the present invention includes:
The data acquisition module 301 is configured to acquire an association query request of a target terminal, read the first table data and the second table data based on the association query request, and count a data amount of the first table data and a data amount of the second table data to obtain a first data amount and a second data amount, where the association query request is an equivalent connection query request, a left association query request or a full connection query request;
A judging module 302, configured to judge whether the first data amount is greater than the second data amount or the second data amount is greater than the first data amount when at least one of the first data amount and the second data amount is greater than the tilt threshold;
The first data set extraction module 303 is configured to obtain a first non-oblique data set, a first oblique data set, a second non-oblique data set, and a second oblique data set according to the first table data and the second table data if the first data amount is greater than the second data amount;
A first association module 304, configured to determine a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set, and the association query request, and transmit the first target data set to the target terminal, where the first target data set is a first equal-value connection data set, a first left-associated data set, or a first full-connection data set;
the second data set extracting module 305 is configured to obtain a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set according to the first table data and the second table data if the second data amount is greater than the first data amount;
the second association module 306 is configured to determine a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set, and the association query request, and transmit the second target data set to the target terminal, where the second target data set is a second equal-value connection data set, a second left-associated data set, or a second full-connection data set.
In the embodiment of the invention, the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification are extracted from the first table data and the second table data according to the first table data volume and the second table data volume, and the target data set is acquired based on the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification, so that the efficiency of inquiring the associated data is improved, and the probability of failed inquiring the associated data is reduced.
Referring to fig. 4, another embodiment of the related query device based on data skew in the embodiment of the present invention includes:
The data acquisition module 301 is configured to acquire an association query request of a target terminal, read the first table data and the second table data based on the association query request, and count a data amount of the first table data and a data amount of the second table data to obtain a first data amount and a second data amount, where the association query request is an equivalent connection query request, a left association query request or a full connection query request;
A judging module 302, configured to judge whether the first data amount is greater than the second data amount or the second data amount is greater than the first data amount when at least one of the first data amount and the second data amount is greater than the tilt threshold;
The first data set extraction module 303 is configured to obtain a first non-oblique data set, a first oblique data set, a second non-oblique data set, and a second oblique data set according to the first table data and the second table data if the first data amount is greater than the second data amount;
A first association module 304, configured to determine a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set, and the association query request, and transmit the first target data set to the target terminal, where the first target data set is a first equal-value connection data set, a first left-associated data set, or a first full-connection data set;
the second data set extracting module 305 is configured to obtain a third non-oblique data set, a third oblique data set, a fourth non-oblique data set, and a fourth oblique data set according to the first table data and the second table data if the second data amount is greater than the first data amount;
the second association module 306 is configured to determine a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set, and the association query request, and transmit the second target data set to the target terminal, where the second target data set is a second equal-value connection data set, a second left-associated data set, or a second full-connection data set.
Optionally, the data acquisition module 301 may be further specifically configured to:
Acquiring an association query request of a target terminal, reading first table data and second table data based on the association query request, dividing the first table data into a plurality of first column data, and dividing the second table data into a plurality of second column data;
Performing data processing on the plurality of first column data to obtain a plurality of first sub-table data, and counting the data quantity of the plurality of first sub-table data to obtain a plurality of first sub-table data quantity;
Performing data processing on the plurality of second column data to obtain a plurality of second sub-table data, and counting the data quantity of the plurality of second sub-table data to obtain a plurality of second sub-table data quantity;
Adding each first sub-table data volume of the plurality of first sub-table data volumes to obtain a first data volume;
And adding each second sub-table data volume in the plurality of second sub-table data volumes to obtain a second data volume.
Optionally, the first data set extraction module 303 may be further specifically configured to:
If the first data volume is larger than the second data volume, processing the first form data into first marking data, and carrying out left association on the second form data and the first marking data to obtain a first result set, wherein the first result set comprises a plurality of first small data identifiers;
Extracting a data set with a first small data identifier as a null value from the first result set, and re-adding the first small data identifier to obtain a first non-inclined data set;
extracting a data set with a first small data identifier not being a null value from a first result set to obtain a first inclined data set;
adding a first big data identifier for the second table data to obtain first table identifier data, and performing left association on the first table identifier data and the first mark data to obtain a second result set, wherein the second result set comprises a plurality of first small table list data;
extracting a data set with the first small list data as a null value from the second result set, and deleting the corresponding first small list data to obtain a second non-inclined data set;
And extracting a data set with the first small list data not being null value from the second result set, and deleting the corresponding first list data to obtain a second inclined data set.
Optionally, the first association module 304 may be further specifically configured to:
Fully connecting the first non-inclined data set with the second non-inclined data set to obtain a first initial full data set, and fully connecting the first inclined data set with the second inclined data set to obtain a second initial full data set;
Combining the first initial full data set with the second initial full data set to obtain a first full data set;
when the association query request is the equivalent connection query request, extracting a data set with a first small data identifier which is not null and a data set with a first large data identifier which is not null from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first large data identifiers to obtain a first equivalent connection data set;
when the association query request is the left association query request, extracting a data set with the first big data identifier not being a null value from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first big data identifiers to obtain a first left association data set;
And when the association query request is the full-connection query request, extracting the first full-data set, deleting the plurality of first small data identifiers and the plurality of first big data identifiers, and obtaining the first full-connection data set.
Optionally, the second data set extraction module 305 may be further specifically configured to:
If the second data quantity is larger than the first data quantity, processing the second form data into second marked data, and carrying out left association on the first form data and the second marked data to obtain a third result set, wherein the third result set comprises a plurality of second small data identifiers;
Extracting a data set with the second small data identifier as a null value from the third result set, and re-adding the second small data identifier to obtain a third non-inclined data set;
extracting a data set with the second small data identifier not being a null value from the third result set to obtain a third inclined data set;
adding a second big data identifier to the first form data to obtain second form identifier data, and performing left association on the second form identifier data and second mark data to obtain a fourth result set, wherein the fourth result set comprises a plurality of second small form list data;
Extracting a data set with the second small list data as a null value from the fourth result set, and deleting the corresponding second small list data to obtain a fourth non-inclined data set;
And extracting a data set with the second small list data not being null value from the fourth result set, and deleting the corresponding second list data to obtain a fourth inclined data set.
Optionally, the second association module 306 may be further specifically configured to:
Fully connecting the third non-inclined data set with the fourth non-inclined data set to obtain a third initial full data set, and fully connecting the third inclined data set with the fourth inclined data set to obtain a fourth initial full data set;
Combining the third initial full data set with the fourth initial full data set to obtain a second full data set;
when the association query request is the equivalent connection query request, extracting a data set with a second small data identifier which is not null and a data set with a second large data identifier which is not null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second large data identifiers to obtain a second equivalent connection data set;
when the association query request is the left association query request, extracting a data set with second big data identifiers not being null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second big data identifiers to obtain a second left association data set;
And when the association query request is the full-connection query request, extracting the second full-data set, deleting the plurality of second small data identifiers and the plurality of second large data identifiers, and obtaining a second full-connection data set.
Optionally, the data-tilting-based association query device further includes:
and the third association module 307 is configured to, when the first data amount is less than or equal to the tilt threshold and the second data amount is less than or equal to the tilt threshold, correspondingly connect the first table data and the second table data according to the association query request, and obtain a third target data set, where the third target data set is a third equivalent connection data set, a third left association data set, or a third full connection data set.
In the embodiment of the invention, the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification are extracted from the first table data and the second table data according to the first table data volume and the second table data volume, and the target data set is acquired based on the plurality of inclined data sets, the plurality of non-inclined data sets and the data identification, so that the efficiency of inquiring the associated data is improved, and the probability of failed inquiring the associated data is reduced.
The related query device based on data tilting in the embodiment of the present invention is described in detail above in fig. 3 and fig. 4 from the point of view of the modularized functional entity, and the related query device based on data tilting in the embodiment of the present invention is described in detail below from the point of view of hardware processing.
Fig. 5 is a schematic structural diagram of a data-tilting-based association query device 500 according to an embodiment of the present invention, where the data-tilting-based association query device 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage mediums 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on the associated query device 500 based on data tilting. Still further, the processor 510 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the data-tilt-based association query device 500.
The data-tilt-based association query device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the data-tilt-based associative query apparatus structure illustrated in fig. 5 does not constitute a limitation of the data-tilt-based associative query apparatus, and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components.
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of a data tilt-based associative query method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The data tilt-based association query method is characterized by comprising the following steps of:
Acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain a first data volume and a second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request;
when at least one of the first data amount and the second data amount is larger than a tilting threshold value, judging whether the first data amount is larger than the second data amount or the second data amount is larger than the first data amount;
If the first data volume is larger than the second data volume, a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set are obtained according to the first table data and the second table data;
Determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to the target terminal, wherein the first target data set is a first equal-value connection data set, a first left association data set or a first full-connection data set;
if the second data volume is larger than the first data volume, a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set are obtained according to the first table data and the second table data;
And determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equivalent connection data set, a second left association data set or a second full connection data set.
2. The method for obtaining the association query request of the target terminal according to claim 1, wherein the obtaining the association query request of the target terminal, reading the first table data and the second table data based on the association query request, and counting the data volume of the first table data and the data volume of the second table data to obtain the first data volume and the second data volume, and the association query request is an equivalent connection query request, a left association query request or a full connection query request includes:
acquiring an association query request of a target terminal, reading first table data and second table data based on the association query request, dividing the first table data into a plurality of first column data, and dividing the second table data into a plurality of second column data;
Performing data processing on the plurality of first column data to obtain a plurality of first sub-table data, and counting the data quantity of the plurality of first sub-table data to obtain a plurality of first sub-table data quantity;
performing data processing on the plurality of second column data to obtain a plurality of second sub-table data, and counting the data quantity of the plurality of second sub-table data to obtain a plurality of second sub-table data quantity;
adding each first sub-table data volume in the plurality of first sub-table data volumes to obtain a first data volume;
and adding each second sub-table data volume in the plurality of second sub-table data volumes to obtain a second data volume.
3. The data-tilt-based associative query method of claim 1, wherein the deriving a first non-tilt data set, a first tilt data set, a second non-tilt data set, and a second tilt data set from the first table data and the second table data if the first data amount is greater than the second data amount comprises:
If the first data volume is larger than the second data volume, processing the first table data into first marked data, and carrying out left association on the second table data and the first marked data to obtain a first result set, wherein the first result set comprises a plurality of first small data identifiers;
Extracting a data set with a first small data identifier as a null value from the first result set, and re-adding the first small data identifier which is not the null value to obtain a first non-inclined data set;
Extracting a data set with a first small data identifier which is not null value from the first result set to obtain a first inclined data set;
Adding a first big data identifier to the second table data to obtain first table identifier data, and performing left association on the first table identifier data and the first mark data to obtain a second result set, wherein the second result set comprises a plurality of first small table list data;
Extracting a data set with the first small list data as a null value from the second result set, and deleting the corresponding first small list data to obtain a second non-inclined data set;
and extracting a data set with the first small list data not being null value from the second result set, deleting the corresponding first list data, and obtaining a second inclined data set.
4. The data-tilt-based association query method of claim 1, wherein the determining a first target data set from the first non-tilt data set, the first tilt data set, the second non-tilt data set, the second tilt data set, and the association query request and transmitting the first target data set to the target terminal, the first target data set being a first equal-value connection data set, a first left-association data set, or a first full-connection data set comprises:
Fully connecting the first non-inclined data set with the second non-inclined data set to obtain a first initial full data set, and fully connecting the first inclined data set with the second inclined data set to obtain a second initial full data set;
Combining the first initial full data set with the second initial full data set to obtain a first full data set;
when the association query request is the equivalent connection query request, extracting a data set with a first small data identifier which is not null and a data set with a first large data identifier which is not null from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first large data identifiers to obtain a first equivalent connection data set;
when the association query request is the left association query request, extracting a data set with the first big data identifier not being a null value from the first all-data set, and deleting a plurality of corresponding first small data identifiers and a plurality of corresponding first big data identifiers to obtain a first left association data set;
And when the association query request is the full-connection query request, extracting the first full-data set, deleting the plurality of first small data identifiers and the plurality of first big data identifiers, and obtaining the first full-connection data set.
5. The data-tilt-based associative query method of claim 1, wherein if the second amount of data is greater than the first amount of data, deriving a third non-tilt data set, a third tilt data set, a fourth non-tilt data set, and a fourth tilt data set from the first table data and the second table data comprises:
if the second data volume is larger than the first data volume, processing the second table data into second marked data, and performing left association on the first table data and the second marked data to obtain a third result set, wherein the third result set comprises a plurality of second small data identifiers;
extracting a data set with the second small data identifier as a null value from the third result set, and re-adding the second small data identifier which is not the null value to obtain a third non-inclined data set;
Extracting a data set with a second small data identifier which is not null value from the third result set to obtain a third inclined data set;
Adding a second big data identifier to the first form data to obtain second form identifier data, and performing left association on the second form identifier data and the second marker data to obtain a fourth result set, wherein the fourth result set comprises a plurality of second small list data;
extracting a data set with the second small list data as a null value from the fourth result set, and deleting the corresponding second small list data to obtain a fourth non-inclined data set;
And extracting a data set with the second small list data not being null value from the fourth result set, deleting the corresponding second list data, and obtaining a fourth inclined data set.
6. The data-tilt-based association query method of claim 1, wherein determining a second target data set from the third non-tilt data set, the third tilt data set, the fourth non-tilt data set, the fourth tilt data set, and the association query request and transmitting the second target data set to the target terminal, the second target data set being a second equal-value connection data set, a second left-association data set, or a second full-connection data set comprises:
Fully connecting the third non-inclined data set with the fourth non-inclined data set to obtain a third initial full data set, and fully connecting the third inclined data set with the fourth inclined data set to obtain a fourth initial full data set;
Combining the third initial full data set with the fourth initial full data set to obtain a second full data set;
when the association query request is the equivalent connection query request, extracting a data set with a second small data identifier which is not null and a data set with a second large data identifier which is not null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second large data identifiers to obtain a second equivalent connection data set;
when the association query request is the left association query request, extracting a data set with second big data identifiers not being null from the second full data set, and deleting a plurality of corresponding second small data identifiers and a plurality of corresponding second big data identifiers to obtain a second left association data set;
And when the association query request is the full-connection query request, extracting the second full-data set, deleting the plurality of second small data identifiers and the plurality of second large data identifiers, and obtaining a second full-connection data set.
7. The data-tilt-based associative query method of any of claims 1-6, wherein after determining a second target data set from the third non-tilt data set, the third tilt data set, the fourth non-tilt data set, the fourth tilt data set, and the associative query request and transmitting the second target data set to the target terminal, the second target data set is a second equal-value connection data set, a second left-associated data set, or a second full-connection data set, the data-tilt-based associative query method further comprises:
And when the first data volume is smaller than or equal to the inclination threshold value and the second data volume is smaller than or equal to the inclination threshold value, correspondingly connecting the first table data and the second table data according to the association query request to obtain a third target data set, wherein the third target data set is a third equivalent connection data set, a third left association data set or a third full connection data set.
8. A data-tilt-based associative query apparatus, the data-tilt-based associative query apparatus comprising:
the data acquisition module is used for acquiring an association query request of a target terminal, reading first form data and second form data based on the association query request, and counting the data volume of the first form data and the data volume of the second form data to obtain a first data volume and a second data volume, wherein the association query request is an equivalent connection query request, a left association query request or a full connection query request;
The judging module is used for judging whether the first data volume is larger than the second data volume or the second data volume is larger than the first data volume when at least one of the first data volume and the second data volume is larger than a tilting threshold value;
The first data set extraction module is used for obtaining a first non-inclined data set, a first inclined data set, a second non-inclined data set and a second inclined data set according to the first table data and the second table data if the first data amount is larger than the second data amount;
the first association module is used for determining a first target data set according to the first non-inclined data set, the first inclined data set, the second non-inclined data set, the second inclined data set and the association query request, and transmitting the first target data set to the target terminal, wherein the first target data set is a first equivalent connection data set, a first left association data set or a first full connection data set;
The second data set extraction module is used for obtaining a third non-inclined data set, a third inclined data set, a fourth non-inclined data set and a fourth inclined data set according to the first table data and the second table data if the second data amount is larger than the first data amount;
And the second association module is used for determining a second target data set according to the third non-inclined data set, the third inclined data set, the fourth non-inclined data set, the fourth inclined data set and the association query request, and transmitting the second target data set to the target terminal, wherein the second target data set is a second equal value connection data set, a second left association data set or a second full connection data set.
9. A data tilt-based associative query apparatus, the data tilt-based associative query apparatus comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
The at least one processor invoking the instructions in the memory to cause the data-tilt-based associative query apparatus to perform the data-tilt-based associative query method of any of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the data tilt-based associative query method according to any of claims 1-7.
CN202010581205.9A 2020-06-23 2020-06-23 Associated query method, device, equipment and storage medium based on data inclination Active CN111708809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581205.9A CN111708809B (en) 2020-06-23 2020-06-23 Associated query method, device, equipment and storage medium based on data inclination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581205.9A CN111708809B (en) 2020-06-23 2020-06-23 Associated query method, device, equipment and storage medium based on data inclination

Publications (2)

Publication Number Publication Date
CN111708809A CN111708809A (en) 2020-09-25
CN111708809B true CN111708809B (en) 2024-05-03

Family

ID=72542378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581205.9A Active CN111708809B (en) 2020-06-23 2020-06-23 Associated query method, device, equipment and storage medium based on data inclination

Country Status (1)

Country Link
CN (1) CN111708809B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
CN108268586A (en) * 2017-09-22 2018-07-10 广东神马搜索科技有限公司 Across the data processing method of more tables of data, device, medium and computing device
CN111241111A (en) * 2020-02-12 2020-06-05 网易(杭州)网络有限公司 Data query method and device, data comparison method and device, medium and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
CN108268586A (en) * 2017-09-22 2018-07-10 广东神马搜索科技有限公司 Across the data processing method of more tables of data, device, medium and computing device
CN111241111A (en) * 2020-02-12 2020-06-05 网易(杭州)网络有限公司 Data query method and device, data comparison method and device, medium and equipment

Also Published As

Publication number Publication date
CN111708809A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
Manzoor et al. Fast memory-efficient anomaly detection in streaming heterogeneous graphs
US20200301961A1 (en) Image retrieval method and apparatus, system, server, and storage medium
CN102782643B (en) Use the indexed search of Bloom filter
CN106326475B (en) Efficient static hash table implementation method and system
WO2018177275A1 (en) Method and apparatus for integrating multi-data source user information
EP3292481B1 (en) Method, system and computer program product for performing numeric searches
CN104794123A (en) Method and device for establishing NoSQL database index for semi-structured data
CN110597852A (en) Data processing method, device, terminal and storage medium
WO2008050107A1 (en) Fuzzy database matching
CN111580965A (en) Data request processing method and system
CN107766529A (en) A kind of mass data storage means for sewage treatment industry
CN111858678A (en) Redis-based key value deletion method, computer device, apparatus and storage medium
US20140280929A1 (en) Multi-tier message correlation
CN111858659A (en) Data query method, device and equipment based on row key salt value and storage medium
CN104424316A (en) Data storage method, data searching method, related device and system
CN111708809B (en) Associated query method, device, equipment and storage medium based on data inclination
CN116126997B (en) Document deduplication storage method, system, device and storage medium
CN116126864A (en) Index construction method, data query method and related equipment
US8281000B1 (en) Variable-length nonce generation
US11501020B2 (en) Method for anonymizing personal information in big data and combining anonymized data
CN112148925B (en) User identification association query method, device, equipment and readable storage medium
CN114860806A (en) Data query method and device of block chain, computer equipment and storage medium
CN116263770A (en) Method, device, terminal equipment and medium for storing business data based on database
CN111368294B (en) Virus file identification method and device, storage medium and electronic device
CN114398373A (en) File data storage and reading method and device applied to database storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant