CN114466069A

CN114466069A - Data acquisition system

Info

Publication number: CN114466069A
Application number: CN202111617804.2A
Authority: CN
Inventors: 杨主决; 向校民; 王金土
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-05-10

Abstract

The embodiment of the invention relates to a data acquisition system, which comprises: each edge server in the edge server group is used for receiving a user request; the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; and sending the first summarized data to a second proxy server; the second proxy server is used for performing second classification processing on the first summarized data, acquiring second summarized data and respectively sending the second summarized data to each data processing center server; and each data processing center server is respectively used for performing third classification processing on the second summarized data to acquire third summarized data when the second summarized data are determined to be data to be processed in the jurisdiction range of the data processing center server. The problems of overlarge load and low timeliness caused by the fact that the data processing center server classifies and processes all massive and complex data at the same time are solved.

Description

Data acquisition system

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a data acquisition system and a bandwidth acquisition method.

Background

With the rapid development of the internet and the popularization of 5G, a Content Delivery Network (CDN) scheduling system needs to carry a larger and larger amount of bandwidth. This presents the following challenges and problems:

(1) most of the bandwidth amount of the client in a certain area is unpredictable, and most of the clients have the condition of sudden bandwidth demand;

(2) the bandwidth of the CDN nodes cannot be acquired in real time, and the utilization rate of the bandwidth of the CDN nodes exceeds a threshold value, so that switching cannot be performed in time;

(3) the data acquisition mode of the bandwidth data is unreasonable, so that the data loss is serious, and the reasonable planning of client resources is influenced.

The CDN system automates resource scheduling decisions, when to start scheduling and where to schedule, and completely depends on the data basis of real-time bandwidth. The amount of bandwidth is large, and there are also great challenges for real-time scheduling.

Therefore, there is an urgent need for a bandwidth acquisition method that can accurately and completely acquire real-time bandwidth used by clients and machines in real time when there are many edge nodes and the amount of bandwidth is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured.

Disclosure of Invention

The application provides a data acquisition system and a bandwidth acquisition method, which aim to solve the technical problems in the prior art.

In a first aspect, the present application provides a data acquisition system, comprising: the data acquisition system includes: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server;

each edge server in the edge server group is used for receiving a user request;

the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; sending the first summarized data to a second proxy server;

the second proxy server is used for performing second classification processing on the first summarized data, acquiring second summarized data and respectively sending the second summarized data to each data processing center server;

and each data processing center server in the at least one data processing center server is respectively used for performing third classification processing on the second summarized data to obtain third summarized data.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

the data acquisition system provided by the embodiment of the application adopts the divide-and-conquer algorithm idea to realize a data classification scheme of splitting a complex mass data into a plurality of small modules after the complex mass data is acquired, namely, the data acquisition system is divided into a plurality of data acquisition channels to execute data acquisition work in parallel. And the first-order classification processing is executed through the first proxy server, then the second-order classification processing is executed through the second proxy server, and finally the second summarized data is transmitted to the data processing center server for processing. By the method, the concept of dividing and treating is realized, all work is subdivided into different channels, and different modules execute the work. This reduces the processing load on the data processing center server. The problems of overlarge load and low timeliness caused by the fact that a certain data processing center server classifies and processes all massive and complex data at the same time are solved. The real-time bandwidth used by the client and the machine can be accurately and completely collected in real time when the number of edge nodes is large and the bandwidth amount is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured. Moreover, the edge server group, the first proxy server, the second proxy server and the data processing center server adopt streaming transmission, so that the data file is prevented from falling to the ground, and the intermediate loss is reduced.

Drawings

Fig. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the principle structure of the divide-and-conquer concept provided by the present invention;

fig. 3 is a block diagram of a data reporting process provided by the present invention;

fig. 4 is a schematic structural diagram of a data acquisition system in which each data processing center server provided by the present invention dispatches and distributes real-time data to different channels according to a data header in a protocol, and performs parallel processing according to a service rule;

FIG. 5 is an overall system architecture diagram including a configuration center management server, data acquisition system, etc. provided by the present invention;

fig. 6 is a block diagram of a process for establishing communication connection between a plurality of data acquisition paths and a plurality of data processing centers (servers), respectively.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

In view of the technical problems mentioned in the background art, an embodiment of the present application provides a data acquisition system, which is specifically shown in fig. 1. Before describing the modules included in the data acquisition system in this embodiment, a configuration management center server that establishes a communication connection with the data acquisition system is described first. The configuration management center server comprises a client domain name module, an asset management module, a client resource planning module, a configuration issuing module and the like.

Wherein, the customer domain name module: the method mainly maintains the client domain name information and the client package information of the CDN acceleration of the client in a company.

An asset management module: the method is used for maintaining node information, machine room information and machine resource information, and dividing acceleration areas according to geographic positions and operator dimensions: dividing all the edge machines into attribution areas according to the geographic positions and the operator dimensionalities; creating an area node autonomous machine for each area; thus, each edge server is logically related to the node autonomous machines of the region.

And (3) planning the customer resources: maintaining a list of resources for which the client domain name is planned across the various acceleration zones.

Configuring a sending module: the system comprises a regional node autonomous machine, a regional node autonomous machine and a client, wherein the regional node autonomous machine is used for issuing machine resource information of a planned client to each edge machine, and the edge machines can know which client domain names are served and the regional node autonomous machine is reported by an edge server; issuing an edge server list controlled by each regional node autonomous machine to each regional node autonomous machine; and issuing the regional node autonomous machine list to all the data processing centers.

Therefore, the edge server can know to which regional node qualification machine the statistical data of the edge server needs to be reported. The regional node autonomous machine can determine to which data processing center the data counted by the regional node autonomous machine needs to be reported. In this way, multiple data acquisition channels can be configured for the data acquisition system. Each data acquisition channel may include an edge server group, a first proxy server, a second proxy server, and a data processing center server. The first proxy server is a proxy server in the edge server group and is used for classifying the user requests collected by each edge server in the edge server group. The second proxy server is a proxy server corresponding to the regional node autonomous machine and is used for directly establishing communication connection with the first proxy server and the data processing center server, so that the data is prevented from falling to the ground, and the intermediate loss is reduced.

Specifically, as shown in fig. 1, a schematic structural diagram of the data acquisition system includes: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server. Fig. 1 shows 3 data acquisition channels, and two data processing center servers.

Each edge server in the edge server group is used for receiving a user request;

each data processing center server in the at least one data processing center server is respectively used for determining whether the second summarized data is data to be processed in the jurisdiction range of the data processing center server;

and when the second summarized data are determined to be the data to be processed in the jurisdiction range of the second summarized data, performing third classification processing on the second summarized data to obtain third summarized data.

In the embodiment of the application, the concept of divide-and-conquer is adopted, and the data acquisition work is divided into a plurality of data acquisition channels for acquisition. The specific idea of divide and conquer can be seen in fig. 2. That is, a parent problem is split into a plurality of child problems, a plurality of child results are obtained after corresponding processing, and finally the child results are summarized to obtain a parent problem result.

Each data acquisition channel includes a plurality of modules that respectively perform different tasks (identical to processing sub-problems, respectively). And then classifying the collected data. Therefore, the finally acquired data are basically classified and formed when reaching the data processing center, and the subsequent processing is facilitated. And the burden is reduced for subsequent processing work. Finally, each data processing center server in the at least one data processing center server identifies the classified content, obtains a self-processing part and obtains a corresponding processing result. And when all the processing results are aggregated, the result is the father problem result. That is, scheduling indexes of different dimensions are generated finally. These matters will be described one by one hereinafter.

Specifically, the edge server is an edge node of the CDN. And the node is used for facilitating the access of the user when the CDN is accelerated. When the user accesses, the edge server can receive the user access request and then record log data.

In an alternative example, the user request may include, but is not limited to, the following field information: the method comprises the steps of requesting time, a first IP, a second IP, accessing a domain name, accessing flow and requesting times, wherein the first IP is an IP corresponding to a visitor client, and the second IP is an IP to be accessed by the visitor client.

The edge server node records the data in a log form.

The edge server deploys the agent service (namely, the first agent server), and after receiving the configuration sent by the configuration center, the edge server can acquire which regional node autonomous machine the edge server belongs to for management, and then establishes long connection with the corresponding regional node autonomous machine service through the first agent server.

The first proxy server is used for performing first classification processing on the user request to acquire first summarized data. The first summarized data is then sent to a second proxy server.

In one specific example, the first proxy server may periodically count the log data and then perform the categorization. For example, log data is counted every 5 minutes.

One embodiment of the specific log data is as follows:

# timestamp VIP visitor IP customer Domain name traffic (B) number of requests

2021-07-05 19:00:00 61.134.42.12 14.220.30.197www.ctyun.cn 52770118 10

2021-07-05 19:00:00 61.134.42.12 10.125.30.101www.ctyun.cn 33388 10

2021-07-05 19:00:00 61.134.42.12 5.220.18.105www.ctyun.cn 515 10

2021-07-05 19:00:00 61.134.42.12 222.193.25.156www.ctyun.cn 108551 10

The first proxy server collects the log data every 5 minutes, and the data collection depends on the configuration issued by the configuration center. In an alternative embodiment, the configuration rules may include, but are not limited to, the following:

determining the area to which the client of the visitor belongs according to the first IP; and according to the second IP, determining the edge server group sending the user request from the edge server groups respectively corresponding to the plurality of data acquisition channels.

In a specific example, the first IP is an IP corresponding to a client used by the user, and the second IP is an upstream IP corresponding to an access request of the user.

Therefore, the first proxy server can determine the area to which the client of the user (guest) belongs according to the first IP, that is, determine the area in which the user is currently located.

Specifically, the visitor IP, i.e., the first IP, may be demarked with the IP database data, divided according to the geographic location and the operator, and matched with the corresponding area, where the matching result is:

dianxin_fujian,yidong_fujian。

and determining the edge server group sending the user request from the edge server groups respectively corresponding to the plurality of data acquisition channels according to the second IP.

Since it is possible that the edge server is currently accepting the request for the user to access, but at the next time, the edge server fails and subsequent interaction with the user needs to be completed by other edge servers in the edge server group. Therefore, the first proxy server is not directly matched to the corresponding edge server according to the second IP, but is directly matched to the corresponding edge server group.

And then summarizing the user request according to a first summary specification consisting of request time, an edge server group sending the user request, a region to which the client of the visitor belongs, an access domain name, access flow and request times, and acquiring first summary data.

As an example above, the format of the summarized first summarized data may be seen in the following examples:

number of client domain name traffic (B) requests for region to which # timestamp cluster name visitor belongs

2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun.cn 2000000 10

2021-07-05 19:00:00dianxin_cluster1 dianxin_beijingwww.ctyun.cn 2800000 50

2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun1.cn 10000000 100

2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun1.cn 35000000 120

An agent service is also deployed on each regional node autonomous machine for primarily processing data, each edge cache server is directly connected with a server of a regional center, and the regional center servers cannot be carried on the local servers, so that distributed summarization is performed through an agent (a second agent server).

That is, in each data collection channel, the first proxy server sends the first summary data to the second proxy server. And the second proxy server is used for performing second classification processing on the first summarized data to obtain second summarized data.

In an optional example, the second proxy server provides a data receiving port for the first proxy server to establish a communication connection with the second proxy server, and after the first proxy server aggregates the data, the data is directly reported to the second proxy server in the regional center without falling to the ground.

The second proxy server is specifically configured to extract the following fields in each user request from the first summary data to form a second summary specification: requesting time, an edge server group sending a user request, a region to which a client of a visitor belongs, and an access domain name;

and according to the second summary specification, performing simplification and classification processing on the first summary data to obtain second summary data.

And then distributing the second summarized data to each data processing center server.

Specifically, as described above, the second summary data includes the region to which the visitor belongs, and then different data management center servers may manage data of different regions. When the visitor area is identified as the jurisdiction area of the visitor area, the data can be determined to be the data to be processed by the visitor area, and then the second summarized data is summarized. Of course, it may also be identified in other ways whether the second summarized data belongs to the data to be processed itself. For example, the client domain name in the second summary data, or the server cluster. The specific identification information can be set according to the actual situation, and is not limited too much here. Of course, if some data processing center servers find that the second summarized data is not data that they need to process, the second summarized data may be deleted (discarded) to avoid occupying internal resources.

And the second summarized data are distributed to different data processing center servers, so that the data are prevented from being reported in a missing mode. For example, a certain data processing center needs to process a certain part of data, but the second proxy server sends data with errors, and sends the data to other data processing center servers. Of course, in an implementable manner, in the case that data cannot be missed, the data processing center server to be transmitted may also be directly distinguished at the second proxy server without being distributed to different data processing center servers, respectively.

In an optional example, the data processing center server also provides a data port for the second proxy server to report data, and may simultaneously support data transmission modes such as TCP and HTTP to report data. See in particular fig. 3. Fig. 3 is a block diagram illustrating a process for reporting data.

Specifically referring to fig. 3, when reporting specific data, data needs to be abstracted into one data stream, and reporting modes are different, but a uniform data protocol is used, and the specific format is as follows:

data header timestamp data primary key (multiple fields supported) data value …

Data one:

DOMAIN_BANDWIDTH 1625484000www.ctyun.cn dx_fujian dx_cluster1 3750000 100

DOMAIN_BANDWIDTH 1625484000www.ctyun.cn dx_beijing dx_cluster1 3750000 100

DOMAIN_BANDWIDTH 1625484000www.ctyun1.cn dx_fujian dx_cluster1 3750000 100

DOMAIN_BANDWIDTH 1625484000www.ctyun2.cn dx_fujian dx_cluster1 3750000 100

data II:

CLUSTER_BANDWIDTH 1625484000dx_cluster1 3750000 100

CLUSTER_BANDWIDTH 1625484000dx_cluster2 3750000 100

CLUSTER_BANDWIDTH 1625484000yd_cluster2 3750000 100

further optionally, after receiving the reported data, the data processing center needs to distinguish according to the data header and execute different processing after splitting the data in order to avoid coupling the data reporting with multiple data processing, for example, the reported data may be carried with the reported data, and the bandwidth data and the request data of the same client are reported at the same time. And the data processing center server is also used for preprocessing the second summarized data.

In one possible embodiment, the data processing center server may utilize the responsibility chain FilterChain to perform preliminary preprocessing on the data, including data cleaning and filtering.

In the chain of responsibility, the filtering rules can be configured by themselves: for example, configuring data header filtering, field length, bandwidth size range and other filtering rules.

Optionally, after the data processing center server preprocesses the second summarized data, the data processing center server performs a third classification process on the second summarized data, which may specifically be implemented by the following steps:

classifying the second summarized data to obtain a plurality of fourth summarized data;

and according to the processing rule corresponding to the category of each fourth summarized data, processing the fourth summarized data in parallel, and respectively acquiring third summarized data corresponding to each fourth summarized data.

The data processing center server classifies the second summarized data, and obtains a plurality of fourth summarized data, which can be seen in the following manner:

extracting a data header from each piece of the second summarized data;

and classifying the second summarized data according to the information in the data header to acquire a plurality of fourth summarized data.

In a specific example, referring to fig. 4 in particular, fig. 4 shows a schematic structural diagram that a data processing center server dispatches real-time data to different channels according to a data header in a protocol, and processes the real-time data in parallel according to a business rule.

Processor-record data-cache-divide data into different channels (including smoothing, updating, analyzing, and snapshot, etc.), and finally cache the processed data.

Optionally, in addition to the above operations, it is also necessary to consider that when the amount of data of the cache machine is large, the machine goes up and down on shelves, the machine room is cut over, the network fluctuates, the service is upgraded, and the like, which may cause transient data loss.

Therefore, the data processing center server is also used for performing disaster tolerance processing and data smoothing processing on the third summarized data.

Specifically, for bandwidth data, the difference of the bandwidth amount is not large unless a client bursts or peak time periods, and the bandwidth data at a plurality of points around a time point, so that when data is missing at a plurality of times, the bandwidth data at the nearest time point can be used for disaster recovery processing.

In addition, considering that noisy data is inevitably generated in the data acquisition process, especially sudden increase and decrease of bandwidth data occur, which results in relatively obvious frequent switching for resource scheduling. Therefore, the data processing center server may be further configured to perform denoising processing on the third summarized data.

Specifically, an exponential smoothing algorithm may be used to denoise the noise. See specifically below:

given the smoothing coefficient α, the value range (0, 1.0), the calculation formula of quadratic exponential smoothing is:

S′_t＝α*x_t+(1-α)*S′_t-1(formula 1)

S″_t＝α*S′_t+(1-α)*S″_t-1(formula 2)

The calculation formula for smoothing the value of the future T period is:

Y_t+T＝A_T-B_Tt (equation 3)

Wherein:

A_t＝2*S′_t-S″_t(formula 4)

The denoising process can be implemented in the above manner, and will not be described in detail here.

Optionally, after the above operations are performed, the data processing center is further configured to classify the third summarized data according to a preset classification dimension, and obtain a scheduling index corresponding to the classification dimension.

In a specific example, scheduling indexes with different dimensions are generated finally, taking the following representative index data as an example:

real-time bandwidth amount of client in each acceleration region

www.ctyun.com dianxin beijing timestamp Bandwidth amount

www.ctyun.com dianxin _ xiamen timestamp Bandwidth amount

Node bandwidth real-time usage

Bandwidth redundancy of time stamp upper limit bandwidth of building-door telecommunication room

Beijing telecom room timestamp upper limit bandwidth amount redundant bandwidth amount

Upper limit bandwidth quantity redundant bandwidth quantity of timestamp of building door mobile machine room

Server cluster real-time bandwidth usage and redundancy

Dianxin _ xiamen _ cluster1 timestamp ceiling bandwidth amount redundancy bandwidth amount

yidong _ xiamen _ cluster1 timestamp ceiling bandwidth amount redundancy bandwidth amount

Identifying customer emergencies in acceleration zone

www.ctyun.com dianxin _ beijing timestamp 1 (1: burst, 0: not burst)

www.ctyun.com yidong _ beijing timestamp 1 (1: burst, 0: not burst)

www.ctyun.com dianxin _ xiamen timestamp 0 (1: burst, 0: not burst)

In addition, in consideration of the integrity and stability of data acquisition, the method is very important for CDN automation scheduling, so a certain monitoring means is also adopted for the bandwidth acquisition stability of the edge server and the area node autonomous machines.

For example, the edge server reports a heartbeat packet to the corresponding regional node autonomous machine at a timing of 10s, the regional node autonomous machine collects and counts the heartbeat packets of the edge machines controlled by the regional node autonomous machine, and reports an abnormal edge machine list to the data processing center.

Similarly, the regional node autonomous machine reports a heartbeat packet to the data processing center at a timing of 10s, and the data processing center is responsible for monitoring the regional node autonomous machine

The monitoring rule of the heartbeat packet can be to receive heartbeat packet data in real time and maintain a hash table for recording the heartbeat packet data of each server.

And scanning the hash table at the timing of 10s every period, detecting the validity of the heartbeat packet, removing the heartbeat packet from the hash table after the expiration of the heartbeat packet, and if curr _ timestamp-timestamp > max _ times _ timing _ cycle is established, representing that the heartbeat packet is invalid.

Wherein curr _ timestamp represents the current timestamp

timestamp representing the corresponding timestamp of the heartbeat packet

timing _ cycle, representing the scan cycle time

max _ times, representing the number of valid cycles

And traversing a server list controlled by the current machine, matching heartbeat packet data, if the heartbeat packet data does not exist, performing abnormal service, monitoring and alarming, and performing manual intervention and repair.

Fig. 5 is a diagram illustrating an overall system architecture including the configuration center management server, the functional modules in the data acquisition system, and other required components.

As shown in fig. 5, the method includes:

and respectively transmitting the service access requests to the edge server groups corresponding to the areas to which the target requests belong through the load balancing equipment. After the data is collected by each edge server group, the data is transmitted to the regional node qualification machine (to the second proxy server) (through the first proxy server). The regional node qualification machine transmits the data to a real-time data processing center (data processing center server).

Fig. 6 shows a block flow diagram of 3 data acquisition paths (including edge server to regional center, hiding the first proxy server and the second proxy server) to establish communication connections with 2 data processing centers (servers), respectively. Fig. 6 is only a specific example, and the number of specific data acquisition channels, data processing center servers and the like can be set according to actual situations.

The data acquisition processes in fig. 5 and 6 are both described in detail in the above figures, and therefore are not described in detail herein.

The data acquisition system provided by the embodiment of the invention adopts the divide-and-conquer algorithm idea to realize a data classification scheme of splitting a complex mass data into a plurality of small modules after the complex mass data is acquired, namely, the data acquisition system is divided into a plurality of data acquisition channels to execute data acquisition work in parallel. And the first-order classification processing is executed through the first proxy server, then the second-order classification processing is executed through the second proxy server, and finally the second summarized data is transmitted to the data processing center server for processing. By the method, the concept of dividing and treating is realized, all work is subdivided into different channels, and different modules execute the work. This reduces the processing load on the data processing center server. The problems of overlarge load and low timeliness caused by the fact that a certain data processing center server classifies all massive and complex data at the same time are solved. The real-time bandwidth used by the client and the machine can be accurately and completely collected in real time when the number of edge nodes is large and the bandwidth amount is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured. Moreover, the edge server group, the first proxy server, the second proxy server and the data processing center server adopt streaming transmission, so that the data file is prevented from falling to the ground, and the intermediate loss is reduced.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data acquisition system, characterized in that the data acquisition system comprises: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server;

each edge server in the edge server group is used for receiving a user request;

the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; and sending the first summarized data to the second proxy server;

the second proxy server is configured to perform second classification processing on the first summarized data, acquire second summarized data, and send the second summarized data to each data processing center server;

2. The system of claim 1, further comprising: the user request comprises the following field information: the method comprises the steps of requesting time, a first IP, a second IP, an access domain name, access flow and request times, wherein the first IP is an IP corresponding to a visitor client, and the second IP is an IP to be accessed by the visitor client;

the first proxy server is specifically configured to: determining the area to which the client of the visitor belongs according to the first IP;

according to the second IP, determining an edge server group which sends the user request from the edge server groups which respectively correspond to the plurality of data acquisition channels;

and summarizing the user request according to a first summary specification formed by the request time, the edge server group sending the user request, the region to which the client of the visitor belongs, the access domain name, the access flow and the request times, and acquiring the first summary data.

3. The system of claim 2, wherein the second proxy server is specifically configured to extract from the first summary data the following fields in each user request that constitute a second summary specification: the request time, an edge server group which sends the user request, a region to which a client of the visitor belongs, and the access domain name;

4. The system according to any one of claims 1 to 3, wherein the data processing center server is further configured to preprocess the second summarized data.

5. The system according to any one of claims 1 to 3, wherein the data processing center server is specifically configured to:

and processing the fourth summarized data in parallel according to a processing rule corresponding to the category of each fourth summarized data, and respectively acquiring third summarized data corresponding to each fourth summarized data.

6. The system of claim 5, wherein the data processing center server is specifically configured to:

extracting a header from each of the second summary data;

7. The system according to any one of claims 1 to 3 or 6, wherein the data processing center server is further configured to perform disaster tolerance processing and data smoothing processing on the third summarized data.

8. The system according to any one of claims 1-3 or 6, wherein the data processing center server is further configured to generate a scheduling index based on the third summarized data.

9. The system of claim 8, wherein the data processing center server is further configured to:

and classifying the third summarized data according to a preset classification dimension to obtain a scheduling index corresponding to the classification dimension.

10. The system according to any one of claims 1-3 or 6, wherein the data processing center server is further configured to discard the second summarized data when it is determined that the second summarized data does not belong to the data to be processed in its own jurisdiction.