CN111221885A - Method and system for calculating data ranking - Google Patents

Method and system for calculating data ranking Download PDF

Info

Publication number
CN111221885A
CN111221885A CN202010010013.2A CN202010010013A CN111221885A CN 111221885 A CN111221885 A CN 111221885A CN 202010010013 A CN202010010013 A CN 202010010013A CN 111221885 A CN111221885 A CN 111221885A
Authority
CN
China
Prior art keywords
field
sorted
field value
fields
ranking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010010013.2A
Other languages
Chinese (zh)
Inventor
陈中演
褚振华
张磊
聂鑫伟
李策
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202010010013.2A priority Critical patent/CN111221885A/en
Publication of CN111221885A publication Critical patent/CN111221885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for obtaining data ranking, which comprises the following steps: (1) determining a second sorting field in a second table and a number of primary fields corresponding to each field value of the second sorting field based on a primary field and a first sorting field of the data in a first table; (2) sorting the second sorted field based on each field value of the second sorted field; (3) determining a rank of each field value of the sorted second sorted fields based on the sorted second sorted fields and a number of primary fields corresponding to each field value of the sorted second sorted fields; and (4) establishing a correspondence of the ranking of each field value of the sorted second sorted fields to each field value of the main field to obtain a ranking of each field value of the main field.

Description

Method and system for calculating data ranking
Technical Field
The invention relates to the field of big data, in particular to a method and a system for calculating data ranking.
Background
In the field of big data, there is a need to rank massive amounts of data. For example, the union of china ag company holds the data of billions of bank cards, and it needs to sort each bank card in order of the number of transaction strokes or the amount of transaction money to obtain the specific rank of each bank card among all the bank cards.
To address this need, there are two existing approaches in the big data domain. One is to order all data directly using the MR engine and the other is to order all data using the MR engine and TotalOrderPartitioner module. However, the existing solution may be inapplicable because there is often a problem of severe data skew in the massive amount of data. For example, with the first scheme, all data is entered into one reduce module during sorting, which makes the amount of data in the reduce module too large, and thus easily causes the sorting task to fail. For the second scheme, the division points of the data to be sequenced are obtained through sampling, and the data are distributed to the corresponding reduce modules according to the division points. However, due to data skew issues, a large amount of duplicate data may be entered into the same reduce module, making the sorting task run for a long time (e.g., up to ten and several hours).
Disclosure of Invention
In one aspect, embodiments of the present invention provide a method for obtaining a data ranking, comprising the steps of: (1) determining a second sorting field in a second table and a number of primary fields corresponding to each field value of the second sorting field based on a primary field and a first sorting field of the data in a first table; (2) sorting the second sorted field based on each field value of the second sorted field; (3) determining a rank of each field value of the sorted second sorted fields based on the sorted second sorted fields and a number of primary fields corresponding to each field value of the sorted second sorted fields; and (4) establishing a correspondence of the ranking of each field value of the sorted second sorted fields to each field value of the main field to obtain a ranking of each field value of the main field.
Embodiments of the present invention also provide a system for obtaining a data ranking, comprising: a deskew module to determine a second sort field in a second table and a number of primary fields corresponding to each field value of the second sort field based on a primary field and a first sort field of the data in a first table; an ordering module to order the second ordered fields based on each field value of the second ordered fields; a ranking module to determine a ranking of each field value of the ordered second ordered fields based on the ordered second ordered fields and a number of dominant fields corresponding to each field value of the ordered second ordered fields; and a fusion module for establishing a correspondence of the ranking of each field value of the sorted second sorted fields to each field value of the main field to obtain a ranking of each field value of the main field.
In another aspect, an embodiment of the present invention further provides a method for obtaining a data rank, including the steps of: (1) determining a second sorting field in a second table, a second partition field corresponding to each field value of the second sorting field, and the number of main fields corresponding to each field value of the second sorting field in the case of having different field values of the second partition field, based on the main field, the first sorting field, and the first partition field of the data in the first table; (2) sorting the second partition field and the second sort field based on each field value of the second partition field and each field value of the second sort field; (3) determining a ranking of each field value of the sorted second sorted fields with a different field value of the second sorted fields based on the sorted second partition fields, the sorted second sorted fields, and a number of primary fields corresponding to each field value of the sorted second sorted fields; and (4) establishing a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field with a different field value of the second sorted field to obtain a ranking of each field value of the main field with a different field value of the second sorted field.
Embodiments of the present invention also provide a system for obtaining a data ranking, comprising: a deskew module for determining a second sort field in a second table, a second partition field corresponding to each field value of the second sort field, and a number of main fields corresponding to each field value of the second sort field in case of having different field values of the second partition field, based on a main field, a first sort field, and a first partition field of the data in a first table; an ordering module to order the second partition field and the second ordering field based on each field value of the second partition field and each field value of the second ordering field; a ranking module for determining a ranking of each field value of the sorted second sorted fields with a different field value of the second sorted fields based on the sorted second partition fields, the sorted second sorted fields, and a number of primary fields corresponding to each field value of the sorted second sorted fields; and a fusion module for establishing a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field in case of field values with different second sorted fields to obtain a ranking of each field value of the main field in case of field values with different second sorted fields.
In yet another aspect, embodiments of the present invention provide a computer-readable medium having computer-readable instructions stored thereon, which, when executed by a computer, are capable of performing the method according to the various embodiments.
The embodiment of the invention can accurately and successfully rank the mass data in the big data and obviously accelerate the operation speed. For example, taking large data about a bank card as an example, the embodiment of the present invention can not only successfully complete ranking, but also shorten the operation time from tens of hours to four hours.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 illustrates a flow diagram for computing a ranking based on ranking results according to one embodiment of the invention.
FIG. 2 illustrates a flow diagram for computing a ranking based on ranking results according to another embodiment of the invention.
FIG. 3 illustrates a system diagram for computing big data rankings with a data skew problem according to yet another embodiment of the invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following detailed description of embodiments of the invention refers to the accompanying drawings.
In the art, big data refers to a collection of data containing massive data content that cannot be managed and processed with conventional software tools within a certain time frame. In this context, the description will be given taking as an example big data about a bank card.
In the art, data skewing refers to the large amount of data being repeated centrally over a small portion of the data span. For example, a group of large data has billions of data, and each data may have a value ranging from 1 to 10000000. If one hundred million pieces of data have a value of 1 and the number of pieces of data having other values is much less than one hundred million, the set of large data is considered to have a data skew problem.
In the art, big data may be stored in one or more tables, typically Hive tables. In some embodiments of the present invention, the table (i.e., the first table) may include a plurality of fields, such as a main field, an ordering field, a partition field, and the like. In other embodiments, a second table may be generated based on the table, the second table including, for example, a sort field, a primary field number field, a partition field, a sort name field, etc., that contains non-identical fields. In still other embodiments, a third table may also be generated based on the first and second tables, the third table including, for example, a main field, a partition field, a row name field, and the like. Each field represents a column of data in the table, with one field value representing a row in the column of data.
The technical scheme provided by the invention can be used for processing the big data of the bank card. This big data is simply referred to herein as raw data. In some embodiments, in the table storing the raw data, the bank card number may be stored in the main field, the transaction amount or transaction number for each card may be stored in the sort field, and the card type (e.g., debit, credit, etc.) may be stored in the partition field. Of course, the technical solution of the present invention may also be applied to other types of big data, so that different contents are stored in each field for different types of big data. For example, in scenarios where merchants need to be ranked, merchant numbers may be stored in a main field and merchant types may be stored in a partition field.
In one aspect, the present invention provides a method for ranking pieces of big data, the method comprising the steps of:
(1) the number of primary fields included in each sorting field is determined based on the primary field and the sorting field of the respective data.
In some embodiments of this step, the MR engine may be used to aggregate the sorted fields of the first table storing the raw data and count the number of specific primary fields contained in each sorted field. A second table may then be generated, the second table including the sort field and the number field of the primary field.
Taking the big data about the bank cards as an example, assuming that the first table includes information of one hundred bank cards (i.e. one hundred pieces of data, respectively located in one hundred rows of the first table), all bank cards need to be ranked based on the number of transaction strokes, the actual value of which is between 1 and 5. In the first table, the bank card number may be stored in the main field as its field value, and the transaction number may be stored in the sort field as its field value. Thus, in the second table generated, there will be an ordering field and a main field number field. In the second table, only five pieces of data are present, which are respectively located in five rows and respectively correspond to five kinds of stroke numbers, and the value of each kind of stroke number is used as the field value of the sorting field. Each piece of data also comprises a numerical value (namely the number of the main field) of the bank card corresponding to each stroke counted according to the first table, and the numerical value is respectively stored in the numerical fields of the main field.
(2) The sort fields are sorted based on each field value in the sort fields.
This step will sort the sort fields in the second table according to the size of the field values in the second table. In some embodiments, the above ordering may be implemented using a SPARK engine. In some embodiments, the sort fields may be sorted in order from large to small, or may be sorted in order from small to large.
Taking the large data about the bank card as an example, it is assumed that the sorting fields are sorted in order of the field values from large to small. The first piece in the sorted sort field is data with a number of strokes of 5, and the last piece is data with a number of strokes of 1.
(3) Determining a ranking for each sorted field based on the sorted fields and the number of master fields in the each sorted field.
In some embodiments of this step, since the sort fields are already sorted, it is desirable to obtain an accurate ranking of the sort fields. The ranking of each piece of data is stored in the ranking field of the second table. The sorting field of the first piece of data is the first name, the sorting field of the second piece of data is the sum of the rank of the first piece of data and the field value of the main field number field of the first piece of data, and so on.
Taking the large data about the bank card as an example, it is assumed that the field value of the numeric field of the main field of the first piece of data is 3. This means that 3 bank cards with transaction number 5 are available, and the rank of the sorting field of the second piece of data is 1+ 3-4, i.e. all bank cards with transaction number 4 are the fourth one.
Fig. 1 shows a flow chart of how the ranking is calculated. Assuming that the current row in the second table is the ith row, the field values of the current row are respectively a (i) _1 and a (i) _2, where a (i) _1 is the field value of the ith row in the sorting field, and a (i) _2 is the field value of the ith row in the major field number segment. The exact Rank of each line of data is Rank _ id (i).
As shown in fig. 1, when the ith row is read, if i is 1, it means that this is the first row, and since the sort fields have been sorted in order from large to small, the rank of the first row is 1. If row i is not the first row, then the rank of this row is the rank of the previous row plus the field value of the main field numeric field of the row. In other words, the process of ranking the top row plus the field values of the main field numeric field of the top row may be iteratively performed in obtaining the ranking of each field value in the sorted field.
(4) Establishing a corresponding relation between the ranking of each sorting field and each of the main fields to obtain the ranking of each of the main fields.
Through the above three steps, the rank of each field value of the sorting field of the second table can be known. In this step, the rank may be associated with a field value of the main field of the first table. That is, there may be a correspondence between the field value of the rank field of the second table and the field value of the main field of the first table. The correspondence reflects a true and accurate ranking of each field value of the main field in the big data. In some embodiments, the corresponding relationship may be stored in the first table or the newly generated third table.
Taking the big data about the bank card as an example, assuming that the card numbers of three bank cards with transaction number 5 are A, B, C respectively, the ranks are all 1. Assuming that there are two bank cards with transaction number 4, and the card numbers are D, E respectively, the rank is 4. Therefore, the ranking of each bank card in all the bank cards is obtained through the method.
In another aspect, the present invention provides a method for ranking pieces of big data, the method comprising the steps of:
(1) determining a number of primary fields included in each sorted field under a different field value of the partition field based on the primary field, the sorted field, and the partition field of the data.
In some embodiments of this step, the MR engine may be used to summarize the sorted fields of the first table storing the original data, and count the number of specific main fields and the field values of specific partition fields corresponding to each field value of the sorted fields. A second table may then be generated, the second table including an ordering field, a primary field number field, and a partition field.
Taking the big data about the bank cards as an example, assuming that the first table includes information of one hundred bank cards (i.e. one hundred pieces of data, located in one hundred rows of the first table respectively), it is necessary to rank all the bank cards under each card type based on the number of strokes of the transaction, the actual value of the number of strokes being between 1 and 5, the card types being debit and credit cards. In the first table, a bank card number may be stored in a main field as its field value, a transaction number may be stored in an ordering field as its field value, and a card type (also referred to as "card attribute") may be stored in a partition field as its field value. Thus, in the second table generated, there will be an ordering field, a primary field number field and a partition field.
In some embodiments, several pieces of data may be included in the generated second table. Some bank cards with the same transaction number may have different card types, and therefore, for each of the transaction numbers, two pieces of data are required to correspond to different card types and different numbers of main fields. For example, assuming that seventy pieces of data of a bank card with a transaction number of 2 are transacted in the first table (ten debit cards and seven credit cards), the field value of the main field number corresponding to one piece of data of the debit card is 10 (which means 10 bank cards belonging to the debit card and the transaction number of 2) as the transaction number of 2 and the field value of the main field number corresponding to the other piece of data of the credit card is 7 (which means 7 bank cards belonging to the credit card and the transaction number of 2) as the transaction number of 2 in the second table.
Assume that the number of transactions for the sort field with debit card type includes 1, 2, 3, 4, and the number of transactions for the sort field with credit card type includes 2, 3, 4, 5. Thus, there are eight data items in the second table. There are four pieces of data corresponding to the debit card type, which correspond to 1, 2, 3, and 4 transactions, respectively. The data corresponding to the credit card type has four data, and the transaction number of the four data is 2, 3, 4 and 5.
(2) Sorting the partition field and the sort field based on each of the partition field and the sort field.
This step will sort the different field values in the partitioned fields based on the generated second table and sort the field values in the sorted fields by size. In some embodiments, the above ordering may be implemented using a SPARK engine. In some embodiments, the sort fields may be sorted in order from large to small, or may be sorted in order from small to large.
Taking the large data about the bank card as an example, it is assumed that the sorting fields are sorted in order of the field values from large to small. The transaction number corresponding to the sorting field with the debit card type comprises 1, 2, 3 and 4, and the transaction number corresponding to the sorting field with the credit card type comprises 2, 3, 4 and 5. Then in the debit card type the first piece of data is data with a number of strokes of 4 and the last piece is data with a number of strokes of 1, while in the credit card type the first piece of data is data with a number of strokes of 5 and the last piece is data with a number of strokes of 2.
In some embodiments, the ordering between the pieces of data belonging to the credit card type and the pieces of data belonging to the debit card type is calculated according to the field values of the partition fields. For example, if the field value of the partition field belonging to the credit card type is "1" and the field value of the partition field belonging to the debit card type is "2", the numerical values of the characters "1" and "2" may be compared and sorted accordingly. Also for example, if the field value belonging to the credit card type is "credit card" and the field value belonging to the debit card type is "debit card", the numerical values of the ASCII codes of the characters "credit card" and "debit card" may be compared and sorted accordingly.
The second sorted list includes two parts, the upper part of the list being a debit card type partition and the lower part being a credit card type partition, assuming that the field value for the partition field belonging to the credit card type is "1" and the field value for the partition field belonging to the debit card type is "2". The field values of the sorting fields are arranged in descending order according to the transaction number in each partition.
(3) Determining a ranking for each partition field and each sort field based on the sorted partition fields, the sorted sort fields, and the number of main fields in each sort field.
In some embodiments of this step, since the sort fields and partition fields are already sorted, it is desirable to obtain an accurate ranking of the sort fields in different partitions. The ranking of each piece of data is stored in the ranking field of the second table. In the same partition, the sorting field of the first piece of data is the first name, the sorting field of the second piece of data is the sum of the rank of the first piece of data and the field value of the main field number field of the first piece of data, and so on.
Taking the large data about the bank card as an example, it is assumed that in the same partition, the field value of the numeric field of the main field of the first piece of data is 3, and the value of the transaction number is between 2 and 5. This means that there are 3 bank cards with transaction number 5 in the partition, and the rank of the sorting field of the second piece of data is 1+ 3-4, i.e. all bank cards with transaction number 4 are the fourth name.
Fig. 2 shows a flow chart of how the ranking is calculated. Assuming that the current row in the second table is the ith row, the field values of the current row are respectively A (i) _1, A (i) _2 and A (i) _3, wherein A (i) _1 is the field value of the ith row in the partition field, A (i) _2 is the field value of the ith row in the sorting field, and A (i) _3 is the field value of the ith row in the main field. The exact Rank of each line of data is Rank _ id (i).
As shown in FIG. 1, when row i is read, if i is equal to 1 (which means this is the first row of the entire table) or A (i-1) _1 is not equal to A (i) _1 (which means this is the first row in the new partition), then row i is ranked 1 because the sort fields have been sorted in order from large to small.
If i is not equal to 1, the next determination is made, i.e., whether A (i-1) _1 is equal to A (i) _ 1. If not, it indicates that the partition of row i-1 is different from row i, which is the first row within the partition, and thus the rank of row i is 1. If so, the rank of the row is the rank of the previous row plus the field value of the main field numeric field of the row.
In other words, the process of ranking the top row plus the field values of the main field number segments of the top row may be iteratively performed in obtaining the ranking of each of the sorted fields having the same field value of the partition field.
(4) Establishing a correspondence of the ranking of each sorting field, the ranking of each partition field, and each of the main fields to obtain a ranking of each of the main fields under different partition fields.
Through the above three steps, the rank of each field value of the sorting fields in different partitions of the second table can be known. In this step, the rank may be associated with a field value of the main field of the first table. That is, there may be a correspondence between the field value of the rank field of the second table and the field value of the main field of the first table. The correspondence reflects a true and accurate ranking of each field value of the main field in different partitions of the big data. In some embodiments, the corresponding relationship may be stored in the first table or the newly generated third table.
FIG. 3 shows a system diagram for computing a ranking of big data with a data skew problem. The system comprises a de-tilt module, a quick sorting module, a ranking calculation module and a fusion module. The system may be adapted to big data that contains data skew issues and may obtain a ranking of pieces of data in the big data. In some embodiments, the big data may contain data such as card attributes, transaction amount, etc., and the system may rank each of the data according to card attributes and transaction amount. In some embodiments, the big data may include merchant category, transaction count, etc. data, and the system may rank each of the data according to merchant category and transaction count. The ranking within the resulting big data can be used for subsequent data analysis. In some embodiments, the tagging system may be set via ranking. For example, a transaction pen kilo-quantile tag may be generated based on the ranking.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user computing device, or entirely on a remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
It should be noted that although several devices and sub-devices for software testing are mentioned in the above detailed description, such partitioning is not mandatory. Indeed, the features and functions of two or more of the devices described above may be embodied in one device, according to embodiments of the invention. Conversely, the features and functions of one apparatus described above may be further divided into embodiments by a plurality of apparatuses.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (23)

1. A method for obtaining a data ranking, comprising the steps of:
(1) determining a second sorting field in a second table and a number of primary fields corresponding to each field value of the second sorting field based on a primary field and a first sorting field of the data in a first table;
(2) sorting the second sorted field based on each field value of the second sorted field;
(3) determining a rank of each field value of the sorted second sorted fields based on the sorted second sorted fields and a number of primary fields corresponding to each field value of the sorted second sorted fields; and
(4) establishing a correspondence of the ranking of each field value of the sorted second sorted fields to each field value of the main field to obtain a ranking of each field value of the main field.
2. The method of claim 1, wherein the step (4) further comprises:
creating a third table; and
establishing a correspondence of the ranking of each field value of the ordered second ordered fields to each field value of the main field in the third table to obtain a ranking of each field value of the main field in the third table.
3. The method of claim 1, wherein the step (4) further comprises:
establishing a correspondence of the ranking of each field value of the sorted second sorted fields to each field value of the main field in the first table to obtain a ranking of each field value of the main field in the first table.
4. The method of claim 1, wherein the sorting operation comprises:
arranging each field value of the second sorting field according to a descending order; or
And arranging each field value of the second sorting field according to a sequence from small to large.
5. The method according to claim 1, wherein said step (1) is implemented by a MR engine and/or said step (2) is implemented by a SPARK engine.
6. The method of claim 1, wherein the step (3) further comprises:
iteratively performing the following process in the second table:
for any field value of the second sorted fields in the second table, adding a number of dominant fields corresponding to a last field value of the any field value to a ranking of the any field value, wherein the any field value does not include the first field value of the second sorted fields.
7. A system for obtaining a data ranking, comprising:
a deskew module to determine a second sort field in a second table and a number of primary fields corresponding to each field value of the second sort field based on a primary field and a first sort field of the data in a first table;
an ordering module to order the second ordered fields based on each field value of the second ordered fields;
a ranking module to determine a ranking of each field value of the ordered second ordered fields based on the ordered second ordered fields and a number of dominant fields corresponding to each field value of the ordered second ordered fields; and
a fusion module for establishing a correspondence of the rank of each field value of the sorted second sorted fields to each field value of the main field to obtain a rank of each field value of the main field.
8. The system of claim 7, wherein the fusion module further comprises:
means for creating a third table; and
means for establishing a correspondence of the rank of each field value of the sorted second sorted fields to each field value of the main field in the third table to obtain a rank of each field value of the main field in the third table.
9. The system of claim 7, wherein the fusion module further comprises:
means for establishing a correspondence in the first table of the ranking of each field value of the sorted second sorted fields to each field value of the main field to obtain a ranking of each field value in the main field in the first table.
10. The system of claim 7, wherein the sorting module is capable of sorting each field value of the second sorted field in a descending order or the sorting module is capable of sorting each field value of the second sorted field in a descending order.
11. The system of claim 7, wherein the ranking module comprises:
means for iteratively performing the following in the second table:
for any field value of the second sorted fields in the second table, adding a number of dominant fields corresponding to a last field value of the any field value to a ranking of the any field value, wherein the any field value does not include the first field value of the second sorted fields.
12. A method for obtaining a data ranking, comprising the steps of:
(1) determining a second sorting field in a second table, a second partition field corresponding to each field value of the second sorting field, and the number of main fields corresponding to each field value of the second sorting field in the case of having different field values of the second partition field, based on the main field, the first sorting field, and the first partition field of the data in the first table;
(2) sorting the second partition field and the second sort field based on each field value of the second partition field and each field value of the second sort field;
(3) determining a ranking of each field value of the sorted second sorted fields with a different field value of the second sorted fields based on the sorted second partition fields, the sorted second sorted fields, and a number of primary fields corresponding to each field value of the sorted second sorted fields; and
(4) establishing a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field with a different field value of the second sorted field to obtain a ranking of each field value of the main field with a different field value of the second sorted field.
13. The method of claim 12, wherein the step (4) further comprises:
creating a third table; and
establishing, in the third table, a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field in the case of field values having different second sorted fields to obtain, in the third table, a ranking of each field value of the main field in the case of field values having different second sorted fields.
14. The method of claim 12, wherein the step (4) further comprises:
establishing, in the first table, a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field in the case of field values having different second sorted fields to obtain a ranking of each field value of the main field in the case of field values having different second sorted fields in the first table.
15. The method of claim 12, wherein the sorting operation comprises:
arranging each field value of the second sorting field with the same field value of the second partition field in a descending order; or
Arranging each field value of the second sorting field having the same field value of the second partition field in a descending order.
16. The method according to claim 12, wherein said step (1) is implemented by a MR engine and/or said step (2) is implemented by a SPARK engine.
17. The method of claim 12, wherein the step (3) further comprises:
iteratively performing the following process in the second table:
for any field value of the second sorting fields in the second table having the same field value of the second partition field, adding the number of main fields corresponding to the last field value of the any field value to the ranking of the any field value, wherein the any field value does not include the first field value in the second sorting fields.
18. A system for obtaining a data ranking, comprising:
a deskew module for determining a second sort field in a second table, a second partition field corresponding to each field value of the second sort field, and a number of main fields corresponding to each field value of the second sort field in case of having different field values of the second partition field, based on a main field, a first sort field, and a first partition field of the data in a first table;
an ordering module to order the second partition field and the second ordering field based on each field value of the second partition field and each field value of the second ordering field;
a ranking module for determining a ranking of each field value of the sorted second sorted fields with a different field value of the second sorted fields based on the sorted second partition fields, the sorted second sorted fields, and a number of primary fields corresponding to each field value of the sorted second sorted fields; and
a fusion module for establishing a correspondence of a ranking of each field value of the sorted second sorted fields with each field value of the main field with a different field value of the second sorted field to obtain a ranking of each field value of the main field with a different field value of the second sorted field.
19. The system of claim 18, wherein the fusion module further comprises:
means for creating a third table; and
means for establishing, in the third table, a correspondence of a ranking of each field value of the ordered second ordered fields to each field value of the main field with different field values of the second partitioned fields to obtain, in the third table, a ranking of each field value of the main field with different field values of the second partitioned fields.
20. The system of claim 18, wherein the fusion module further comprises:
means for establishing in the first table a correspondence of a ranking of each field value of the sorted second sorted fields to each field value of the main field with different field values of the second sorted fields to obtain in the first table a ranking of each field value of the main field with different field values of the second sorted fields.
21. The system of claim 18, wherein the sorting module is capable of sorting each field value of the second sorting field having the same field value of the second partition field in a descending order or the sorting module is capable of sorting each field value of the second sorting field having the same field value of the second partition field in a descending order.
22. The system of claim 18, wherein the ranking module comprises:
means for iteratively performing the following in the second table:
for any field value of the second sorting fields in the second table having the same field value of the second partition field, adding the number of main fields corresponding to the last field value of the any field value to the ranking of the any field value, wherein the any field value does not include the first field value in the second sorting fields.
23. A computer readable medium having computer readable instructions stored thereon which, when executed by a computer, are capable of performing the method of any one of claims 1-6 or 12-18.
CN202010010013.2A 2020-01-06 2020-01-06 Method and system for calculating data ranking Pending CN111221885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010010013.2A CN111221885A (en) 2020-01-06 2020-01-06 Method and system for calculating data ranking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010010013.2A CN111221885A (en) 2020-01-06 2020-01-06 Method and system for calculating data ranking

Publications (1)

Publication Number Publication Date
CN111221885A true CN111221885A (en) 2020-06-02

Family

ID=70828145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010010013.2A Pending CN111221885A (en) 2020-01-06 2020-01-06 Method and system for calculating data ranking

Country Status (1)

Country Link
CN (1) CN111221885A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210870A (en) * 1990-03-27 1993-05-11 International Business Machines Database sort and merge apparatus with multiple memory arrays having alternating access
CN106384253A (en) * 2016-09-30 2017-02-08 中国银联股份有限公司 Consumption behavior analysis method in bankcard transaction and consumption behavior analysis device thereof
CN107577531A (en) * 2016-07-05 2018-01-12 阿里巴巴集团控股有限公司 Load-balancing method and device
CN110008382A (en) * 2018-12-26 2019-07-12 阿里巴巴集团控股有限公司 A kind of method, system and the equipment of determining TopN data
CN110297957A (en) * 2019-05-20 2019-10-01 菜鸟智能物流控股有限公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210870A (en) * 1990-03-27 1993-05-11 International Business Machines Database sort and merge apparatus with multiple memory arrays having alternating access
CN107577531A (en) * 2016-07-05 2018-01-12 阿里巴巴集团控股有限公司 Load-balancing method and device
CN106384253A (en) * 2016-09-30 2017-02-08 中国银联股份有限公司 Consumption behavior analysis method in bankcard transaction and consumption behavior analysis device thereof
CN110008382A (en) * 2018-12-26 2019-07-12 阿里巴巴集团控股有限公司 A kind of method, system and the equipment of determining TopN data
CN110297957A (en) * 2019-05-20 2019-10-01 菜鸟智能物流控股有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
何洪英等: "定名次快速排序方法" *
姚晔: "Excel应用提取分类汇总公式的计算区域" *
张均香: "体育比赛中并列名次的排序" *
潘大志等: "一个实用排序算法的构造与实现" *

Similar Documents

Publication Publication Date Title
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN107844414A (en) A kind of spanned item mesh based on defect report analysis, parallelization defect positioning method
CN112579586A (en) Data processing method, device, equipment and storage medium
CN113946690A (en) Potential customer mining method and device, electronic equipment and storage medium
CN111143359A (en) Query statement generation method and device
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN110765100B (en) Label generation method and device, computer readable storage medium and server
CN113138906A (en) Call chain data acquisition method, device, equipment and storage medium
WO2018205391A1 (en) Method, system and apparatus for evaluating accuracy of information retrieval, and computer-readable storage medium
CN116431498A (en) Performance test method and device, electronic equipment and computer readable storage medium
CN116340172A (en) Data collection method and device based on test scene and test case detection method
CN114116811B (en) Log processing method, device, equipment and storage medium
CN111221885A (en) Method and system for calculating data ranking
CN115809228A (en) Data comparison method and device, storage medium and electronic equipment
CN111159213A (en) Data query method, device, system and storage medium
CN114443802A (en) Interface document processing method and device, electronic equipment and storage medium
CN113656586A (en) Emotion classification method and device, electronic equipment and readable storage medium
KR20060119439A (en) Query matching method and system for outputting results matched to query by processing the query according to various logics
CN111930815A (en) Method and system for constructing enterprise portrait based on industry attribute and business attribute
JP2020166443A (en) Data processing method recommendation system, data processing method recommendation method, and data processing method recommendation program
CN112347095B (en) Data table processing method, device and server
CN111309623B (en) Coordinate class data classification test method and device
CN112925856B (en) Entity relationship analysis method, entity relationship analysis device, entity relationship analysis equipment and computer storage medium
JP2009134375A (en) Financing examination support system and its method
CN115409615A (en) Method, system, terminal device and storage medium for batch processing based on account dimension

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination