CN111881181B

CN111881181B - Data statistics method, device and equipment based on distributed database

Info

Publication number: CN111881181B
Application number: CN202010709976.1A
Authority: CN
Inventors: 刘霞; 曹鹏飞; 邢耘
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2024-03-01
Anticipated expiration: 2040-07-22
Also published as: CN111881181A

Abstract

The embodiment of the specification provides a data statistics method, device and equipment based on a distributed database. The method comprises the following steps: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; acquiring search data corresponding to the index field value in the first database; transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field. By the method, statistics can be carried out on the data in the distributed databases, so that the efficiency of data statistics is improved, and the consumption of time and resources is reduced.

Description

Data statistics method, device and equipment based on distributed database

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data statistics method, device and equipment based on a distributed database.

Background

The database is a data set stored in a computer and used for storing and managing data according to a certain data structure. With the development of society, the generated business and corresponding data are increasing, and high requirements are put on the data throughput capacity of the database, and storing all data in the same database table greatly limits the processing capacity of the database. The above problems can be effectively solved by using a distributed database system to store data.

The distributed database system includes a plurality of smaller databases, and each database stores all or a portion of the data in the distributed database system. Therefore, the concurrency processing capability of the database is improved, and the situation that too much data stored in a single database table is unfavorable for reading and writing is avoided.

However, the distributed database system stores data in a corresponding distributed database based on a predetermined routing field, that is, there is no strong correlation between other fields than the routing field and the database in which the data is stored. When the user needs to count the data corresponding to the other fields except the routing field, the data to be counted often has a stored part in each distributed database, and after the data is counted in each distributed database, the statistics result of each distributed database is counted again. Therefore, the operation flow is increased, the operation statistics efficiency is reduced, and more time and resources are consumed for carrying out data statistics on the distributed database.

Disclosure of Invention

The embodiment of the specification aims to provide a data statistics method, device and equipment based on a distributed database, so as to solve the technical problem of how to conveniently and rapidly realize statistics of data in a plurality of distributed databases.

In order to solve the above technical problems, an embodiment of the present disclosure provides a data statistics method based on a distributed database, including:

receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;

acquiring search data corresponding to the index field value in the first database;

transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field.

The embodiment of the specification also provides a data statistics device based on a distributed database, which comprises:

the request receiving module is used for receiving the data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;

A data acquisition module, configured to acquire search data corresponding to the index field value in the first database;

the data sending module is used for sending the search data to a second database so that the second database can merge the search data based on the ordering field; the second database is used for sorting the retrieval data according to the sorting field.

The embodiment of the specification also provides a first database management system, which comprises a memory and a processor; the memory is used for storing computer program instructions; the processor is configured to execute the computer program instructions to implement the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; acquiring search data corresponding to the index field value in the first database; transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field.

In order to solve the above technical problem, an embodiment of the present disclosure further provides a data statistics method based on a distributed database, including:

receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database;

sorting the retrieved data according to the sorting field;

merging the retrieved data having the same rank field value;

and feeding back the combined retrieval data.

the data receiving module is used for receiving the search data and the ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database;

the data sorting module is used for sorting the retrieval data according to the sorting field;

The data merging module is used for merging the search data with the same sequencing field value;

and the data feedback module is used for feeding back the combined retrieval data.

The embodiment of the specification also provides a second database management system, which comprises a memory and a processor; the memory is used for storing computer program instructions; the processor is configured to execute the computer program instructions to implement the steps of: receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database; sorting the retrieved data according to the sorting field; merging the retrieved data having the same rank field value; and feeding back the combined retrieval data.

As can be seen from the technical solutions provided by the embodiments of the present disclosure, in the embodiments of the present disclosure, after retrieving the retrieval data from the distributed database according to the index field, the retrieval data is sent to the second database to implement sorting and merging of the retrieval data, so as to complete statistics of data in the distributed database. The method overcomes the defects that the storage capacity of the distributed database is small and the data of a specific field cannot be directly obtained in a certain database, improves the efficiency of statistics operation aiming at the distributed database, and reduces the consumption of time and resources.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present description, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a data statistics system according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a data statistics method based on a distributed database according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a data statistics method based on a distributed database according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of a data statistics method based on a distributed database according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a data statistics apparatus based on a distributed database according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a data statistics apparatus based on a distributed database according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a first database management system according to an embodiment of the present disclosure;

Fig. 8 is a block diagram of a second database management system according to an embodiment of the present disclosure.

Detailed Description

The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

For a better understanding of the inventive concepts of the present application, a data statistics system according to an embodiment of the present disclosure is first described. As shown in fig. 1, the data statistics system 100 includes a first database 110, a second database 120.

The first database 110 is a database for storing business data. When a user has a query requirement, a data statistics request is sent to the first database 110 to acquire service data in the first database 110. The service data corresponds to a plurality of fields. When a user inquires corresponding service data, a plurality of inquiry fields can be appointed in advance, and the corresponding service data can be acquired according to the inquiry fields. Specifically, the first database 110 includes a first database 110 management system, and the first database 110 management system calls and processes data in the first database 110 based on various instructions.

In some embodiments, the first database 110 may be a distributed database. That is, there are at least two first databases 110 for storing parts of the service data, respectively, corresponding to the service data to be stored. In the case where the first database 110 is a distributed database, the first database 110 corresponds to a fixed routing field. When the service data is stored in the first database 110, the service data is distributed to the corresponding first database 110 for storage according to the routing field. When the user needs to sum the data in all the first databases 110 according to other fields different from the routing field, the corresponding service data cannot be obtained directly in the first databases 110.

The second database 120 may be used to order the data in the databases. Specifically, the data may be ordered according to a field value corresponding to the specified field. The second database 120 may also combine the data with the same field value based on the preset field to implement statistics on the service data. Specifically, the second database 120 includes a second database 120 management system, and the second database 120 management system performs operations such as calling and processing on data in the second database 120 based on various instructions.

In the case where the first database 110 is a distributed database, the second database 120 may not be a distributed database. Specifically, the second database 120 may be a non-relational database. The non-relational database is different from the relational database in that the data stored in the database has no relationship. For example, the non-relational database may store data directly based on key values, and store the data directly into a certain set during storage, without dividing the data according to the format of the data. The non-relational database has the characteristics of high read-write speed, strong flexibility and the like.

The first database 110 may have interactions with the client 130. The client 130 may be a terminal device such as a smart phone, a personal computer, a server, an industrial personal computer, or a wearable device used by a user. The user may send a data statistics request to the first database through the client 130 to obtain corresponding data.

It should be noted that, in practical application, the user client in the embodiment of the present disclosure is not limited to use to interact with the first database, and when the program in the server triggers the instruction for acquiring the statistical data, the server may directly generate the data statistical request and send the data statistical request to the first database, without requiring user operation. The device for sending the request to the first database is not particularly limited, and will not be described herein.

Based on the data statistics system, a data statistics method based on a distributed database according to an embodiment of the present disclosure is described with reference to fig. 2 of the present disclosure. The execution subject of the method is the data statistics system. The specific implementation steps of the method are as follows.

S210: the client sends a data statistics request to the first database.

The data statistics request is a request for making statistics on the corresponding data. When the first database is a distributed database, the service data are stored in different first databases, and when the service data stored in the first database correspond to more fields, the user often needs to acquire data for different first databases respectively when there is no strong correlation between the fields for which statistics is performed and the database identifiers. Accordingly, the data statistics request may also be a request sent to all the first databases.

The data statistics request comprises an ordering field and an index field value corresponding to the index field. The index field is used for primarily screening data in the first database. For example, when a user needs to acquire data for a certain day, the date may be used as an index field, and a specific date value may be used as an index field value, so as to acquire the search data from the first database.

The ranking field may be used to categorize the retrieved data. For example, when a user needs to acquire data generated by different devices, the device identification numbers can be used as the sorting fields, so that the data corresponding to the same device identification number is sequentially arranged after sorting, and the data can be conveniently combined.

In some embodiments, the data statistics request may further include a statistics field. After the data targeted by the user is obtained, the field values of the data corresponding to the statistics fields may be counted to complete the statistics for the data.

By using a specific example to describe, when banking data is stored in the first database, a banking staff needs to determine transaction amounts of different nodes on a certain day according to the banking data, that is, when the transaction amounts of different nodes on a fixed date need to be respectively counted, a data counting request can be sent to the first database, wherein the data counting request comprises the fixed date, the nodes and the transaction amount, the date is an index field, the fixed date is an index field value of the index field, the nodes are sorting fields, and the transaction amount is a counting field.

S220: the first database obtains retrieval data corresponding to the index field value.

When the data statistics request includes an index field value corresponding to an index field, the field value of the service data in the first database corresponding to the index field can be compared, and the service data with the field value being the index field value can be screened. When the index field is the modification date of the service data and the index field value is a certain fixed date, the first database can compare the modification date of the service data in the database, and screen the service data with the modification date of the fixed date as the retrieval data.

In practical applications, the index field may have one index field value or may have a plurality of index field values, which is not limited.

In some embodiments, when the first databases acquire the search data, the first database management system may allocate a data acquisition task including an index field value to containers corresponding to each first database, and execute the data acquisition task by using the containers to call the search data having the index field value in the first database.

In practical application, the data statistics request may only include one index field, that is, only one index field value is used for searching; the data statistics request may also include a plurality of index fields, where each index field corresponds to a corresponding index field value.

Specifically, the first database management system may acquire the search data in the first database based on a preset instruction.

S230: the first database sends the search data to the second database.

After the first database searches to obtain the search data, if operations such as ordering and merging are directly performed on the search data in the first database, the operations such as ordering and merging are difficult to perform on the search data in the first database due to the fact that the first database can only perform searching according to the routing field when the ordering field is different from the routing field. And because the first database needs to record all data in the same form for operation, when the data volume of the search data is large, the subsequent processing operation of the search data cannot be obviously completed. Thus, the first database needs to send the retrieved data to the second database.

Because the first databases are distributed databases, that is, at least two first databases respectively send search data to a second database, the second database can support simultaneous writing of a plurality of data.

S240: the second database orders the retrieved data according to the ordering field.

After receiving the sorting fields, the second database can sort the search data according to the sorting fields, wherein a specific sorting mode can be to pre-determine the sorting sequence corresponding to the field values of different sorting fields, and sort the field values corresponding to each sorting field according to the search data in sequence.

By using a specific example to describe, assuming that the sorting field is a net number, other fields of the search data are uniformly classified as statistical results, where the search data are "net number 1, statistical result a", "net number 3, statistical result B", "net number 2, statistical result C", "net number 2, statistical result D", "net number 1, statistical result E", "net number 3, statistical result F", and the preset order of net numbers is 1, 2, 3, the result of sorting the search data may be "net number 1, statistical result a", "net number 1, statistical result E", "net number 2, statistical result C", "net number 2, statistical result D", "net number 3, statistical result B", "net number 3, statistical result F".

It should be noted that, since the search data needs to be merged according to the sorting field in the subsequent step, if there are multiple records corresponding to the same sorting field value, the different records may be sorted by directly adding the sequence of the corresponding records, and the sorting of the records is not needed, so as to avoid unnecessary consumption of time and resources.

Typically, the index field and the ordering field are different fields, so that after retrieving the retrieval data corresponding to the index field value based on the index field, the retrieval data are combined according to the ordering field to obtain the statistical data. If the index field and the ordering field are the same in practical application, after searching the corresponding search data according to the index field value of the index field, the search data only has one ordering field value corresponding to the ordering field, and the search data can be obtained after merging all the search data without ordering the search data according to the ordering field

In some embodiments, since the second database stores data according to a key value, that is, only one piece of data corresponding to the same key value exists in the database, so as to realize the retrieval of the data in the database, before storing the retrieved data in the second database, the combination processing needs to be performed on the retrieved data, that is, before storing the data in the second database, the operation needs to be performed on the retrieved data. Specifically, after receiving the search data, the combined temporary table of the search data may be fed back to process the search data.

S250: the second database merges the retrieved data with the same rank field value.

Because the data is generally queried based on a fixed key value in the non-relational database, when the sorting field is used as the key value, the search data with the same sorting field value needs to be combined, and the data statistics is performed while the data stored in the second database is ensured to meet the corresponding requirement.

In some embodiments, the manner of merging the search data may be to divide the search data into at least one search data set, where the number of search data in the search data set is not greater than a preset set capacity, merge the search data having the same field value corresponding to the index field in each search data set, and merge the search data sets; and if the adjacent search data sets have the search data with the same field value corresponding to the index field, merging the search data with the same field value.

Describing a specific example, assuming that the preset aggregate capacity is 500, the search data is divided into aggregates composed of 500 records, and the records in the aggregates are combined. If the search data with the same index field value exists in the adjacent set, the index field value of the last combined record in the previous set is necessarily the same as that of the first combined record in the next set, and the search data with the same field value is directly combined; if the two sets are different, the two sets are directly combined.

In some embodiments, the data statistics request includes a statistics field therein. The statistics field is different from the ordering field. When the search data are combined, records with the same sorting field value can be divided into a plurality of search data categories, the field values corresponding to the statistic fields in the search data categories are accumulated, and the accumulated field values are used as the field values of the statistic fields corresponding to the search data categories.

Using a specific example to describe, assuming that the sorting field is a website number, the statistics field is a transaction amount, the following records, "website number 1, transaction amount 200", "website number 1, transaction amount 0", "website number 1, transaction amount 5000", "website number 1, transaction amount 10000", "website number 1, transaction amount 5000", "website number 1, transaction amount 100" exist in the search data category corresponding to website number 1, and the transaction amounts can be directly combined in an accumulated manner, so that the records obtained after the records are combined are "website number 1, transaction amount 20300".

The manner of merging the search data is not limited to accumulating the field values corresponding to the statistical fields, for example, the field values corresponding to the statistical fields in the records may be obtained, and the field values may all correspond to the sorting fields. In practical application, the manner of combining the search data can be selected according to the corresponding requirements, which is not limited to the above example and will not be described in detail.

If the search data has other fields besides the index field, the sorting field and the statistics field, the other fields may be deleted when the search data is combined, or the other fields may be combined when the other fields may be combined. The specific processing manner can be adjusted according to the actual situation, and is not limited to the above example, and will not be described in detail.

S260: the second database stores the merged retrieval data.

After the second database merges the search data, statistics of field values of the statistics fields of the similar data corresponding to the sorting fields in the search data is completed, and the merged search data can be stored under the condition that the requirements of the second database are met.

S270: and the second database feeds back the combined retrieval data to the client.

And after receiving the combined search data, the second database feeds back the combined search data to the client, so that the data statistics request of the client is fed back, and the statistical requirement of a user on the data is completed.

According to the data statistics method, after the search data are obtained from the distributed database according to the index field, the search data are sent to the second database to achieve sequencing and merging of the search data, and therefore statistics of data in the distributed database is completed. The method overcomes the defects that the storage capacity of the distributed database is small and the data of a specific field cannot be directly obtained in a certain database, improves the efficiency of statistics operation aiming at the distributed database, and reduces the consumption of time and resources.

Based on the method corresponding to fig. 2, the embodiment of the present disclosure further provides a data statistics method based on a distributed database, as shown in fig. 3, where an execution subject of the method is the first database management system, and the method includes the following specific steps.

S310: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database.

The description of this step may refer specifically to the description in step S210, and will not be described herein.

S320: retrieval data corresponding to the index field value is obtained in the first database.

The description of this step may refer specifically to the description in step S220, and will not be described herein.

S330: transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field.

The description of this step may refer to the descriptions in steps S230, S240, S250, S260, and S270, and will not be described in detail herein.

Based on the method corresponding to fig. 2, the embodiment of the present disclosure further provides a data statistics method based on a distributed database, as shown in fig. 4, where an execution subject of the method is the second database management system, and the method includes the following specific steps.

S410: receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database.

The description of this step may refer to the descriptions in steps S210, S220, and S230, and will not be repeated here.

S420: and sorting the retrieval data according to the sorting field.

The description of this step may refer to the description in step S240, and will not be repeated here.

S430: the retrieved data having the same rank field values are combined.

The description of this step may refer to the description in step S250, and will not be repeated here.

S430: and feeding back the combined retrieval data.

The description of this step may refer to the descriptions in steps S260 and S270, and will not be repeated here.

Based on the method corresponding to fig. 3, the embodiment of the present disclosure further provides a data statistics device based on a distributed database, as shown in fig. 5, where the device is disposed in the first database management system, and the device includes the following modules.

A request receiving module 510, configured to receive a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database;

a data acquisition module 520, configured to acquire, in the first database, search data corresponding to the index field value;

a data transmitting module 530, configured to transmit the search data to a second database, so that the second database merges the search data based on the sorting field; the second database is used for sorting the retrieval data according to the sorting field.

Based on the method corresponding to fig. 4, the embodiment of the present disclosure further provides a data statistics device based on a distributed database, as shown in fig. 6, where the device is disposed in the second database management system, and the device includes the following modules.

A data receiving module 610 for receiving the search data and the ranking field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database;

a data ordering module 620, configured to order the search data according to the ordering field;

a data merging module 630, configured to merge the search data with the same ordering field value;

and the data feedback module 640 is used for feeding back the combined retrieval data.

Based on the method corresponding to fig. 3, as shown in fig. 7, a first database management system according to an embodiment of the present disclosure is described. The first database management system may include a memory and a processor.

In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state hard disk, or a usb disk. The memory may be used to store computer instructions.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others.

The processor may execute the computer instructions to implement the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; acquiring search data corresponding to the index field value in the first database; transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field.

Based on the method corresponding to fig. 4, as shown in fig. 8, a second database management system according to an embodiment of the present disclosure is described. The first database management system may include a memory and a processor.

The processor may execute the computer instructions to implement the steps of: receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database; sorting the retrieved data according to the sorting field; merging the retrieved data having the same rank field value; and feeding back the combined retrieval data.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips 2. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not only one, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog2 are most commonly used at present. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present description may be implemented in software plus a necessary general purpose hardware platform. Based on this understanding, the technical solution of the present specification may be embodied in essence or a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The specification is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present specification has been described by way of example, it will be appreciated by those skilled in the art that there are many variations and modifications to the specification without departing from the spirit of the specification, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the specification.

Claims

1. A distributed database-based data statistics method, comprising:

receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; the data statistics request also comprises a statistics field;

transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field; the second database stores data according to the key value; the second database merging the retrieved data based on the ranking field, comprising: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; and merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data.

2. The method of claim 1, wherein the second database is a non-relational database.

3. The method of claim 1, wherein the obtaining, in the first database, the retrieved data corresponding to the index field value comprises:

distributing data acquisition tasks to containers corresponding to the first databases; the data acquisition task comprises an index field value; the container is used for executing tasks to call data in the first database;

the data acquisition task is performed with the container to acquire retrieval data having the index field value.

4. A distributed database-based data statistics apparatus, comprising:

the request receiving module is used for receiving the data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; the data statistics request also comprises a statistics field;

The data sending module is used for sending the search data to a second database so that the second database can merge the search data based on the ordering field; the second database is used for sorting the retrieval data according to the sorting field; the second database stores data according to the key value; the second database merging the retrieved data based on the ranking field, comprising: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; and merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data.

5. A first database management system comprising a memory and a processor;

the memory is used for storing computer program instructions;

the processor is configured to execute the computer program instructions to implement the steps of: receiving a data statistics request; the data statistics request comprises an ordering field and an index field value corresponding to the index field; the data statistics request is used for counting data corresponding to the same index field value in at least two first databases; the first database is a distributed database; the data statistics request also comprises a statistics field; acquiring search data corresponding to the index field value in the first database; transmitting the search data to a second database, so that the second database merges the search data based on the sorting fields; the second database is used for sorting the retrieval data according to the sorting field; the second database stores data according to the key value; the second database merging the retrieved data based on the ranking field, comprising: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; and merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data.

6. A distributed database-based data statistics method, comprising:

receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database; the data statistics request also comprises a statistics field;

sorting the retrieved data according to the sorting field;

merging the retrieved data having the same rank field value; wherein, include: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data;

and feeding back the combined retrieval data.

7. The method of claim 6, wherein merging the retrieved data having the same rank field value comprises:

Dividing the search data into at least one search data set; the number of the search data in the search data set is not more than the preset set capacity;

combining the search data having the same field value corresponding to the index field in each search data set, respectively;

merging the search data sets; and if the adjacent search data sets have the search data with the same field value corresponding to the index field, merging the search data with the same field value.

8. A distributed database-based data statistics apparatus, comprising:

the data receiving module is used for receiving the search data and the ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database; the data statistics request also comprises a statistics field;

the data merging module is used for merging the search data with the same sequencing field value; wherein, include: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data;

9. A second database management system comprising a memory and a processor;

the memory is used for storing computer program instructions;

the processor is configured to execute the computer program instructions to implement the steps of: receiving retrieval data and an ordering field; the search data comprises data with the same index field value obtained in the first database according to the data statistics request; the data statistics request comprises the index field value and the ordering field; the first database is a distributed database; the data statistics request also comprises a statistics field; sorting the retrieved data according to the sorting field; merging the retrieved data having the same rank field value; wherein, include: dividing the search data into search data categories corresponding to different sorting field values; acquiring a statistical field value of a statistical field corresponding to the search data in the search data category; accumulating the statistical field value corresponding to the search data category; merging the search data in the search data category, and taking the accumulated value of the statistical fields as the statistical field value corresponding to the statistical field of the merged search data; and feeding back the combined retrieval data.