CN111881181A - Data statistics method, device and equipment based on distributed database - Google Patents

Data statistics method, device and equipment based on distributed database Download PDF

Info

Publication number
CN111881181A
CN111881181A CN202010709976.1A CN202010709976A CN111881181A CN 111881181 A CN111881181 A CN 111881181A CN 202010709976 A CN202010709976 A CN 202010709976A CN 111881181 A CN111881181 A CN 111881181A
Authority
CN
China
Prior art keywords
data
database
sorting
retrieval
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010709976.1A
Other languages
Chinese (zh)
Other versions
CN111881181B (en
Inventor
刘霞
曹鹏飞
邢耘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010709976.1A priority Critical patent/CN111881181B/en
Publication of CN111881181A publication Critical patent/CN111881181A/en
Application granted granted Critical
Publication of CN111881181B publication Critical patent/CN111881181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a data statistics method, a data statistics device and data statistics equipment based on a distributed database. The method comprises the following steps: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; obtaining retrieval data corresponding to the index field value in the first database; sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field. By the method, the data in the distributed databases can be counted, the data counting efficiency is improved, and the consumption of time and resources is reduced.

Description

Data statistics method, device and equipment based on distributed database
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a data statistics method, a data statistics device and data statistics equipment based on a distributed database.
Background
The database is a data set which is stored in a computer and stores and manages data according to a certain data structure. With the development of society, the generated services and the corresponding data are increasing, higher requirements are put on the data throughput capacity of the database, and the processing capacity of the database is greatly limited by storing all the data in the same database table. The problem can be effectively solved by using the distributed database system to store data.
The distributed database system comprises a plurality of smaller databases, and each database stores all or part of data in the distributed database system. Therefore, the concurrent processing capacity of the database is improved, and the condition that excessive data stored in a single database table is not beneficial to reading and writing is avoided.
However, the distributed database system stores data in the corresponding distributed database based on the preset routing field, that is, there is no strong association between other fields except the routing field and the database in which the data is stored. When a user needs to count data corresponding to other fields except for a routing field, the data to be counted often has a stored part in each distributed database, and the statistical results of each distributed database need to be counted again after the data is counted in each distributed database. Therefore, the operation flow is increased, the operation counting efficiency is reduced, and more time and resources are consumed for carrying out data counting on the distributed database.
Disclosure of Invention
An object of the embodiments of the present specification is to provide a method, an apparatus, and a device for data statistics based on distributed databases, so as to solve a technical problem of how to conveniently and quickly implement statistics on data in multiple distributed databases.
In order to solve the above technical problem, an embodiment of the present specification provides a data statistics method based on a distributed database, including:
receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;
obtaining retrieval data corresponding to the index field value in the first database;
sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
An embodiment of this specification further provides a data statistics apparatus based on a distributed database, including:
the request receiving module is used for receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;
a data obtaining module, configured to obtain, in the first database, retrieval data corresponding to the index field value;
the data sending module is used for sending the retrieval data to a second database so that the second database merges the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
The embodiment of the present specification further provides a first database management system, including a memory and a processor; the memory to store computer program instructions; the processor to execute the computer program instructions to implement the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; obtaining retrieval data corresponding to the index field value in the first database; sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
In order to solve the above technical problem, an embodiment of the present specification further provides a data statistics method based on a distributed database, including:
receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database;
sorting the retrieval data according to the sorting field;
merging the retrieval data with the same sorting field value;
and feeding back the combined retrieval data.
An embodiment of this specification further provides a data statistics apparatus based on a distributed database, including:
the data receiving module is used for receiving the retrieval data and the sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database;
the data sorting module is used for sorting the retrieval data according to the sorting field;
the data merging module is used for merging the retrieval data with the same sorting field value;
and the data feedback module is used for feeding back the merged retrieval data.
The embodiment of the present specification further provides a second database management system, which includes a memory and a processor; the memory is to store computer program instructions; the processor is configured to execute the computer program instructions to implement the steps of: receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database; sorting the retrieval data according to the sorting field; merging the retrieval data with the same sorting field value; and feeding back the combined retrieval data.
As can be seen from the technical solutions provided by the embodiments of the present specification, after the retrieval data is obtained from the distributed database according to the index field, the embodiments of the present specification send the retrieval data to the second database to implement sorting and merging of the retrieval data, thereby completing statistics of data in the distributed database. The method overcomes the defects that the storage capacity of the distributed database is small and the data of the specific field cannot be directly acquired in a certain database, improves the efficiency of statistical operation aiming at the distributed database and reduces the consumption of time and resources.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of a data statistics system according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a distributed database based data statistics method according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a distributed database based data statistics method according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of a distributed database based data statistics method according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of a data statistics apparatus based on a distributed database according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of a data statistics apparatus based on a distributed database according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a first database management system in accordance with an embodiment of the present disclosure;
FIG. 8 is a block diagram of a second database management system in accordance with an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.
For a better understanding of the inventive concepts of the present application, a data statistics system of the embodiments of the present specification will be described first. As shown in fig. 1, the data statistics system 100 includes a first database 110 and a second database 120.
The first database 110 is a database for storing business data. When a user has a query requirement, a data statistics request is sent to the first database 110 to obtain the service data in the first database 110. The service data corresponds to a plurality of fields. When a user inquires corresponding service data, a plurality of inquiry fields can be specified in advance, and the corresponding service data can be obtained according to the inquiry fields. Specifically, the first database 110 includes a first database 110 management system, and the first database 110 management system calls and processes data in the first database 110 based on various instructions.
In some embodiments, the first database 110 may be a distributed database. That is, at least two first databases 110 exist to store portions of the service data corresponding to the service data that needs to be stored. In the case where the first database 110 is a distributed database, the first database 110 has a fixed routing field. When the service data is stored in the first database 110, the service data is allocated to the corresponding first database 110 according to the routing field for storage. When a user needs to count data in all the first databases 110 according to other fields different from the routing field, the user often cannot directly obtain corresponding service data in the first databases 110.
The second database 120 may be used to rank the data in the database. Specifically, the data may be sorted according to the field value corresponding to the specified field. The second database 120 may also merge data with the same field value based on preset fields to realize statistics on the service data. Specifically, the second database 120 includes a second database 120 management system, and the second database 120 management system performs operations such as calling and processing on data in the second database 120 based on various instructions.
In the case where the first database 110 is a distributed database, the second database 120 may not be a distributed database. Specifically, the second database 120 may be a non-relational database. A non-relational database is distinguished from a relational database in that the data stored in the database has no relationship. For example, the non-relational database may directly store data based on key values, directly store the data into a certain set during storage, and do not divide the data according to the format of the data. The non-relational database has the characteristics of high read-write speed, strong flexibility and the like.
The first database 110 may have an interaction with the client 130. The client 130 may be a terminal device used by a user, such as a smart phone, a personal computer, a server, an industrial personal computer, or a wearable device. The user may send a data statistics request to the first database through the client 130 to obtain corresponding data.
It should be noted that in practical application, the user client in the embodiment of the present specification is not limited to use to interact with the first database, and when a program in the server triggers an instruction to obtain statistical data, the server may directly generate a data statistics request and send the data statistics request to the first database, without user operation. The device for sending the request to the first database is not particularly limited, and will not be described herein again.
Based on the data statistical system, a data statistical method based on a distributed database according to an embodiment of the present specification is described with reference to fig. 2 of the present specification. The execution subject of the method is the data statistical system. The specific implementation steps of the method are as follows.
S210: the client sends a data statistics request to the first database.
The data statistics request is a request for performing statistics on corresponding data. When the first database is a distributed database, the service data is stored in different first databases, and when the service data stored in the first database corresponds to a plurality of fields, the data is often required to be respectively obtained for different first databases when there is no strong association between the field targeted by the user during data statistics and the database identifier. Correspondingly, the data statistics request may also be a request sent to all the first databases.
The data statistics request comprises an ordering field and an index field value corresponding to the index field. The index field is used for carrying out preliminary screening on the data in the first database. For example, when a user needs to acquire data of a certain day, the date may be used as an index field, and a specific date value may be used as an index field value to acquire the retrieved data from the first database.
The ranking field may be used to categorize the retrieved data. For example, when a user needs to acquire data generated by different devices, the device identification number may be used as a sorting field, so that data corresponding to the same device identification number are sequentially arranged after sorting, and merging of the data is facilitated.
In some embodiments, the data statistics request may further include a statistics field. After the data targeted by the user is acquired, the field values of the data corresponding to the statistical fields can be counted to complete the data statistics.
Using a specific example to explain, when the banking data is stored in the first database, the banking staff needs to determine the transaction amounts of different websites on a certain day according to the banking data, that is, the transaction amounts of different websites on fixed dates need to be counted respectively, and may send a data statistics request to the first database, where the data statistics request includes the fixed date, the website and the transaction amount, where the date is an index field, the fixed date is an index field value of the index field, the website is a sorting field, and the transaction amount is a statistics field.
S220: the first database retrieves the retrieved data corresponding to the index field value.
When the data statistics request includes an index field value corresponding to an index field, the field value of the index field corresponding to the service data in the first database may be compared, and the service data whose field value is the index field value may be screened. When the index field is a modification date of the service data and the index field value is a certain fixed date, the first database may compare the modification dates of the service data in the databases and filter the service data with the modification date being the fixed date as the retrieval data.
In practical applications, there may be one index field value or multiple index field values corresponding to the index field, which is not limited herein.
In some embodiments, when the first database obtains the search data, the first database management system may assign a data obtaining task including an index field value to the container corresponding to each first database, and execute the data obtaining task by the containers to call the search data in the first database having the index field value.
In practical application, the data statistics request may only include one index field, that is, only one index field value is used for retrieval; the data statistics request may also include a plurality of index fields, and each index field corresponds to a corresponding index field value.
Specifically, the first database management system may obtain the retrieval data in the first database based on a preset instruction.
S230: the first database sends the retrieval data to the second database.
After the first database searches for the retrieval data, if the first database directly performs operations such as sorting and merging on the retrieval data in the first database, the operations are limited by the property that the first database can only perform retrieval according to the routing field, so that when the sorting field is different from the routing field, the operations such as sorting and merging on the retrieval data in the first database are also difficult. And because the first database needs to record all data in the same form for operation, when the data volume of the retrieval data is large, the subsequent processing operation of the retrieval data can not be finished obviously. Therefore, the first database needs to send the retrieval data to the second database.
Because the first databases are distributed databases, that is, at least two first databases exist to respectively send retrieval data to the second database, the second database can support simultaneous writing of a plurality of data.
S240: the second database sorts the retrieved data according to a sort field.
After receiving the sorting fields, the second database may sort the retrieved data according to the sorting fields, where a specific sorting manner may be to determine in advance an arrangement order corresponding to field values of different sorting fields, and sequentially sort the field values corresponding to the sorting fields according to the retrieved data.
A specific example is used for explanation, assuming that the sorting field is a dot number, other fields of the search data are uniformly classified as statistical results, the search data are respectively "dot number 1, statistical result a", "dot number 3, statistical result B", "dot number 2, statistical result C", "dot number 2, statistical result D", "dot number 1, statistical result E", "dot number 3, and statistical result F", and the preset arrangement order of the dot numbers is 1, 2, and 3, the result of sorting the search data may be "dot number 1, statistical result a", "dot number 1, statistical result E", "dot number 2, statistical result C", "dot number 2, statistical result D", "dot number 3, statistical result B", "dot number 3, and statistical result F".
It should be noted that, since the retrieved data needs to be merged according to the sorting field in the subsequent step, if there are multiple records corresponding to the same sorting field value, the corresponding sequence of the records may be directly added to sort the different records, and there is no need to additionally sort the records, so as to avoid unnecessary consumption of time and resources.
Generally, the index field and the sorting field are different fields, so that after retrieval data corresponding to the index field value is obtained based on the index field, the retrieval data are respectively merged according to the sorting field to obtain statistical data. If the index field and the sorting field are the same field in practical application, after searching corresponding retrieval data according to the index field value of the index field, the retrieval data only has one sort field value corresponding to the sorting field, the retrieval data does not need to be sorted according to the sorting field, and the merged retrieval data can be obtained by directly merging all the retrieval data
In some embodiments, since the second database stores data according to key values, that is, only one piece of data exists in the database corresponding to the same key value, so as to implement the retrieval of the data in the database, before storing the retrieved data in the second database, a merging process needs to be performed on the retrieved data first, that is, before storing the data in the second database, an operation needs to be performed on the retrieved data. Specifically, after receiving the search data, the search data may be fed back to the temporary table of the merged search data to be processed.
S250: the second database merges the retrieved data having the same sorting field value.
Because data is generally queried in a non-relational database based on a fixed key value, under the condition that a sorting field is used as a key value, retrieval data with the same sorting field value needs to be merged, and data statistics is performed while data stored in a second database is guaranteed to meet corresponding requirements.
In some embodiments, the manner of merging the retrieved data may be to divide the retrieved data into at least one retrieved data set, the number of retrieved data in the retrieved data set is not greater than a preset set capacity, merge the retrieved data having the same field value corresponding to the index field in each retrieved data set, and merge the retrieved data sets; if the retrieval data corresponding to the index field and having the same field value exists in the adjacent retrieval data sets, merging the retrieval data having the same field value.
To illustrate by using a specific example, assuming that the preset collection capacity is 500, the search data is divided into collections composed of 500 records, and the records in these collections are merged. If the retrieval data with the same index field value exists in the adjacent sets, the index field value of the last merged record in the former set is necessarily the same as that of the first merged record in the latter set, and the retrieval data with the same field value is directly merged; if not, the two sets are directly merged.
In some embodiments, a statistics field is included in the data statistics request. The statistics field is different from the ordering field. When merging the search data, the records with the same sorted field value may be divided into a plurality of search data categories, and field values corresponding to the statistical field in each search data category may be accumulated, and the accumulated field value may be used as the field value of the statistical field corresponding to the search data category.
A specific example is used for explanation, it is assumed that the sorting field is a website number, the statistical field is a transaction amount, the following records of "website number 1, transaction amount 200", "website number 1, transaction amount 0", "website number 1, transaction amount 5000", "website number 1, transaction amount 10000", "website number 1, transaction amount 5000", "website number 1, and transaction amount 100" exist in the retrieval data category corresponding to website number 1, and the records obtained by merging the records are "website number 1, transaction amount 20300" in view of that the transaction amounts can be directly merged in an accumulation manner.
The manner of merging the retrieved data is not limited to accumulating the field values corresponding to the statistical fields, for example, the field values corresponding to the statistical fields in the records may also be obtained, and all the field values may correspond to the sorting field. In practical application, the manner of merging the retrieved data may be selected according to corresponding requirements, which is not limited to the above example and is not described herein again.
If the search data has other fields besides the index field, the sorting field and the statistic field, when the search data is merged, the other fields may be deleted, or the other fields may be merged under the condition that the other fields may be merged. The specific processing manner may be adjusted according to the requirements of the actual situation, and is not limited to the above example, which is not described herein again.
S260: the second database stores the merged search data.
After the second database merges the retrieval data, statistics of field values of the same-class data corresponding to the sorting fields in the retrieval data in the statistical fields is completed, and the merged retrieval data can be stored under the condition that requirements of the second database are met.
S270: and the second database feeds back the merged retrieval data to the client.
And after receiving the combined retrieval data, the second database feeds the combined retrieval data back to the client, so that a data statistics request of the client is fed back, and the statistics requirement of the user on the data is met.
According to the data statistical method, after the retrieval data are obtained from the distributed database according to the index fields, the retrieval data are sent to the second database to achieve sequencing and combination of the retrieval data, and therefore statistics of the data in the distributed database is completed. The method overcomes the defects that the storage capacity of the distributed database is small and the data of the specific field cannot be directly acquired in a certain database, improves the efficiency of statistical operation aiming at the distributed database and reduces the consumption of time and resources.
Based on the method corresponding to fig. 2, an embodiment of the present specification further provides a data statistics method based on a distributed database, as shown in fig. 3, an execution subject of the method is the first database management system, and the method includes the following specific steps.
S310: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database.
For the description of this step, reference may be specifically made to the description in step S210, and details are not described here.
S320: retrieving, in the first database, retrieval data corresponding to the index field value.
For the description of this step, reference may be made to the description in step S220, and details are not described here.
S330: sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
For the description of this step, reference may be specifically made to the descriptions in steps S230, S240, S250, S260, and S270, and details are not repeated here.
Based on the method corresponding to fig. 2, an embodiment of the present specification further provides a data statistics method based on a distributed database, as shown in fig. 4, an execution subject of the method is the second database management system, and the method includes the following specific steps.
S410: receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database.
For the description of this step, reference may be specifically made to the descriptions in steps S210, S220, and S230, which are not described herein again.
S420: and sorting the retrieval data according to the sorting field.
For the description of this step, reference may be made to the description in step S240, and details are not described here.
S430: the retrieved data having the same sorting field value is merged.
For the description of this step, reference may be specifically made to the description in step S250, and details are not described here.
S430: and feeding back the combined retrieval data.
For the description of this step, reference may be specifically made to the descriptions in steps S260 and S270, and details are not described here.
Based on the method corresponding to fig. 3, an embodiment of the present disclosure further provides a data statistics apparatus based on a distributed database, as shown in fig. 5, where the apparatus is disposed in the first database management system, and the apparatus includes the following modules.
A request receiving module 510, configured to receive a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;
a data obtaining module 520, configured to obtain, in the first database, retrieval data corresponding to the index field value;
a data sending module 530, configured to send the retrieved data to a second database, so that the second database merges the retrieved data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
Based on the method corresponding to fig. 4, the embodiment of the present specification further provides a data statistics apparatus based on a distributed database, as shown in fig. 6, the apparatus is disposed in the second database management system, and the apparatus includes the following modules.
A data receiving module 610 for receiving the search data and the sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database;
a data sorting module 620, configured to sort the retrieved data according to the sorting field;
a data merging module 630, configured to merge the retrieved data with the same sorting field value;
and a data feedback module 640, configured to feed back the merged search data.
A first database management system according to an embodiment of the present disclosure is described based on the method corresponding to fig. 3, as shown in fig. 7. The first database management system may include a memory and a processor.
In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state disk, a U disk, or the like. The memory may be used to store computer instructions.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The processor may execute the computer instructions to perform the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; obtaining retrieval data corresponding to the index field value in the first database; sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
A second database management system according to an embodiment of the present disclosure is introduced based on the corresponding method of fig. 4, as shown in fig. 8. The first database management system may include a memory and a processor.
In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state disk, a U disk, or the like. The memory may be used to store computer instructions.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The processor may execute the computer instructions to perform the steps of: receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database; sorting the retrieval data according to the sorting field; merging the retrieval data with the same sorting field value; and feeding back the combined retrieval data.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate a dedicated integrated circuit chip 2. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims (10)

1. A data statistical method based on a distributed database is characterized by comprising the following steps:
receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;
obtaining retrieval data corresponding to the index field value in the first database;
sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
2. The method of claim 1, wherein the second database is a non-relational database.
3. The method of claim 1, wherein said retrieving, in said first database, retrieved data corresponding to said index field value comprises:
distributing data acquisition tasks to containers corresponding to the first databases; the data acquisition task comprises an index field value; the container is used for executing tasks to call data in the first database;
and executing the data acquisition task by utilizing the container to acquire the retrieval data with the index field value.
4. A data statistics device based on a distributed database, comprising:
the request receiving module is used for receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;
a data obtaining module, configured to obtain, in the first database, retrieval data corresponding to the index field value;
the data sending module is used for sending the retrieval data to a second database so that the second database merges the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
5. A first database management system comprising a memory and a processor;
the memory to store computer program instructions;
the processor to execute the computer program instructions to implement the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; obtaining retrieval data corresponding to the index field value in the first database; sending the retrieval data to a second database to cause the second database to merge the retrieval data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.
6. A data statistical method based on a distributed database is characterized by comprising the following steps:
receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database;
sorting the retrieval data according to the sorting field;
merging the retrieval data with the same sorting field value;
and feeding back the combined retrieval data.
7. The method of claim 6, wherein the data statistics request further comprises a statistics field; the merging of the retrieved data having the same sorting field value includes:
dividing the retrieval data into retrieval data categories corresponding to different sorting field values;
obtaining a statistical field value of a statistical field corresponding to the retrieval data in the retrieval data category;
accumulating the statistical field value to obtain a statistical field accumulated value corresponding to the retrieval data category;
and combining the retrieval data in the retrieval data category, and taking the accumulated value of the statistical field as a statistical field value corresponding to the statistical field of the combined retrieval data.
8. The method of claim 6, wherein said merging retrieved data having the same sorting field value comprises:
dividing the retrieval data into at least one retrieval data set; the quantity of the retrieved data in the retrieved data set is not more than the preset set capacity;
merging the retrieval data with the same field value corresponding to the index field in each retrieval data set respectively;
merging the retrieval data sets; if the retrieval data corresponding to the index field and having the same field value exists in the adjacent retrieval data sets, merging the retrieval data having the same field value.
9. A data statistics device based on a distributed database, comprising:
the data receiving module is used for receiving the retrieval data and the sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database;
the data sorting module is used for sorting the retrieval data according to the sorting field;
the data merging module is used for merging the retrieval data with the same sorting field value;
and the data feedback module is used for feeding back the merged retrieval data.
10. A second database management system comprising a memory and a processor;
the memory is to store computer program instructions;
the processor is configured to execute the computer program instructions to implement the steps of: receiving retrieval data and a sorting field; the retrieval data comprises data with the same index field value acquired in a first database according to a data statistics request; the data statistics request comprises the index field value and the sorting field; the first database is a distributed database; sorting the retrieval data according to the sorting field; merging the retrieval data with the same sorting field value; and feeding back the combined retrieval data.
CN202010709976.1A 2020-07-22 2020-07-22 Data statistics method, device and equipment based on distributed database Active CN111881181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010709976.1A CN111881181B (en) 2020-07-22 2020-07-22 Data statistics method, device and equipment based on distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010709976.1A CN111881181B (en) 2020-07-22 2020-07-22 Data statistics method, device and equipment based on distributed database

Publications (2)

Publication Number Publication Date
CN111881181A true CN111881181A (en) 2020-11-03
CN111881181B CN111881181B (en) 2024-03-01

Family

ID=73155950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010709976.1A Active CN111881181B (en) 2020-07-22 2020-07-22 Data statistics method, device and equipment based on distributed database

Country Status (1)

Country Link
CN (1) CN111881181B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090934A (en) * 2014-06-26 2014-10-08 山东金质信息技术有限公司 Standard service platform distributed parallel computing database and retrieval method thereof
CN111125157A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Query data processing method and device, storage medium and processor
CN111368006A (en) * 2020-03-31 2020-07-03 中国工商银行股份有限公司 Mass data strip conditional centralized extraction system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090934A (en) * 2014-06-26 2014-10-08 山东金质信息技术有限公司 Standard service platform distributed parallel computing database and retrieval method thereof
CN111125157A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Query data processing method and device, storage medium and processor
CN111368006A (en) * 2020-03-31 2020-07-03 中国工商银行股份有限公司 Mass data strip conditional centralized extraction system and method

Also Published As

Publication number Publication date
CN111881181B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN109241159B (en) Partition query method and system for data cube and terminal equipment
CN110675194A (en) Funnel analysis method, device, equipment and readable medium
US10579616B2 (en) Data search system, data search method, and program product
CN111061758B (en) Data storage method, device and storage medium
CN111625561B (en) Data query method and device
CN110647512A (en) Data storage and analysis method, device, equipment and readable medium
Chambi et al. Optimizing druid with roaring bitmaps
CN110928900B (en) Multi-table data query method, device, terminal and computer storage medium
CN114281819A (en) Data query method, device, equipment and storage medium
CN110633302B (en) Method and device for processing massive structured data
CN112434056A (en) Method and device for inquiring detailed data
CN111046113B (en) Data retrieval device and method for different types of data warehouses
CN111881181B (en) Data statistics method, device and equipment based on distributed database
CN113625967B (en) Data storage method, data query method and server
CN111159213A (en) Data query method, device, system and storage medium
CN112148782B (en) Market data access method and device
CN114840551A (en) Database table processing method and device, electronic equipment and storage medium
CN112527917A (en) Data processing method and device
CN105468603B (en) Data selecting method and device
CN112131016A (en) Application program internal data processing method, device and equipment
CN112052257A (en) Service processing method, device and equipment
CN109086279B (en) Report caching method and device
CN112835932A (en) Batch processing method and device of service table and nonvolatile storage medium
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN107766380B (en) Method, device and system for balanced distribution and search of service data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant