CN114466069A - Data acquisition system - Google Patents

Data acquisition system Download PDF

Info

Publication number
CN114466069A
CN114466069A CN202111617804.2A CN202111617804A CN114466069A CN 114466069 A CN114466069 A CN 114466069A CN 202111617804 A CN202111617804 A CN 202111617804A CN 114466069 A CN114466069 A CN 114466069A
Authority
CN
China
Prior art keywords
data
summarized
server
processing center
summarized data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111617804.2A
Other languages
Chinese (zh)
Inventor
杨主决
向校民
王金土
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202111617804.2A priority Critical patent/CN114466069A/en
Publication of CN114466069A publication Critical patent/CN114466069A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention relates to a data acquisition system, which comprises: each edge server in the edge server group is used for receiving a user request; the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; and sending the first summarized data to a second proxy server; the second proxy server is used for performing second classification processing on the first summarized data, acquiring second summarized data and respectively sending the second summarized data to each data processing center server; and each data processing center server is respectively used for performing third classification processing on the second summarized data to acquire third summarized data when the second summarized data are determined to be data to be processed in the jurisdiction range of the data processing center server. The problems of overlarge load and low timeliness caused by the fact that the data processing center server classifies and processes all massive and complex data at the same time are solved.

Description

Data acquisition system
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a data acquisition system and a bandwidth acquisition method.
Background
With the rapid development of the internet and the popularization of 5G, a Content Delivery Network (CDN) scheduling system needs to carry a larger and larger amount of bandwidth. This presents the following challenges and problems:
(1) most of the bandwidth amount of the client in a certain area is unpredictable, and most of the clients have the condition of sudden bandwidth demand;
(2) the bandwidth of the CDN nodes cannot be acquired in real time, and the utilization rate of the bandwidth of the CDN nodes exceeds a threshold value, so that switching cannot be performed in time;
(3) the data acquisition mode of the bandwidth data is unreasonable, so that the data loss is serious, and the reasonable planning of client resources is influenced.
The CDN system automates resource scheduling decisions, when to start scheduling and where to schedule, and completely depends on the data basis of real-time bandwidth. The amount of bandwidth is large, and there are also great challenges for real-time scheduling.
Therefore, there is an urgent need for a bandwidth acquisition method that can accurately and completely acquire real-time bandwidth used by clients and machines in real time when there are many edge nodes and the amount of bandwidth is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured.
Disclosure of Invention
The application provides a data acquisition system and a bandwidth acquisition method, which aim to solve the technical problems in the prior art.
In a first aspect, the present application provides a data acquisition system, comprising: the data acquisition system includes: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server;
each edge server in the edge server group is used for receiving a user request;
the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; sending the first summarized data to a second proxy server;
the second proxy server is used for performing second classification processing on the first summarized data, acquiring second summarized data and respectively sending the second summarized data to each data processing center server;
and each data processing center server in the at least one data processing center server is respectively used for performing third classification processing on the second summarized data to obtain third summarized data.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
the data acquisition system provided by the embodiment of the application adopts the divide-and-conquer algorithm idea to realize a data classification scheme of splitting a complex mass data into a plurality of small modules after the complex mass data is acquired, namely, the data acquisition system is divided into a plurality of data acquisition channels to execute data acquisition work in parallel. And the first-order classification processing is executed through the first proxy server, then the second-order classification processing is executed through the second proxy server, and finally the second summarized data is transmitted to the data processing center server for processing. By the method, the concept of dividing and treating is realized, all work is subdivided into different channels, and different modules execute the work. This reduces the processing load on the data processing center server. The problems of overlarge load and low timeliness caused by the fact that a certain data processing center server classifies and processes all massive and complex data at the same time are solved. The real-time bandwidth used by the client and the machine can be accurately and completely collected in real time when the number of edge nodes is large and the bandwidth amount is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured. Moreover, the edge server group, the first proxy server, the second proxy server and the data processing center server adopt streaming transmission, so that the data file is prevented from falling to the ground, and the intermediate loss is reduced.
Drawings
Fig. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the principle structure of the divide-and-conquer concept provided by the present invention;
fig. 3 is a block diagram of a data reporting process provided by the present invention;
fig. 4 is a schematic structural diagram of a data acquisition system in which each data processing center server provided by the present invention dispatches and distributes real-time data to different channels according to a data header in a protocol, and performs parallel processing according to a service rule;
FIG. 5 is an overall system architecture diagram including a configuration center management server, data acquisition system, etc. provided by the present invention;
fig. 6 is a block diagram of a process for establishing communication connection between a plurality of data acquisition paths and a plurality of data processing centers (servers), respectively.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.
In view of the technical problems mentioned in the background art, an embodiment of the present application provides a data acquisition system, which is specifically shown in fig. 1. Before describing the modules included in the data acquisition system in this embodiment, a configuration management center server that establishes a communication connection with the data acquisition system is described first. The configuration management center server comprises a client domain name module, an asset management module, a client resource planning module, a configuration issuing module and the like.
Wherein, the customer domain name module: the method mainly maintains the client domain name information and the client package information of the CDN acceleration of the client in a company.
An asset management module: the method is used for maintaining node information, machine room information and machine resource information, and dividing acceleration areas according to geographic positions and operator dimensions: dividing all the edge machines into attribution areas according to the geographic positions and the operator dimensionalities; creating an area node autonomous machine for each area; thus, each edge server is logically related to the node autonomous machines of the region.
And (3) planning the customer resources: maintaining a list of resources for which the client domain name is planned across the various acceleration zones.
Configuring a sending module: the system comprises a regional node autonomous machine, a regional node autonomous machine and a client, wherein the regional node autonomous machine is used for issuing machine resource information of a planned client to each edge machine, and the edge machines can know which client domain names are served and the regional node autonomous machine is reported by an edge server; issuing an edge server list controlled by each regional node autonomous machine to each regional node autonomous machine; and issuing the regional node autonomous machine list to all the data processing centers.
Therefore, the edge server can know to which regional node qualification machine the statistical data of the edge server needs to be reported. The regional node autonomous machine can determine to which data processing center the data counted by the regional node autonomous machine needs to be reported. In this way, multiple data acquisition channels can be configured for the data acquisition system. Each data acquisition channel may include an edge server group, a first proxy server, a second proxy server, and a data processing center server. The first proxy server is a proxy server in the edge server group and is used for classifying the user requests collected by each edge server in the edge server group. The second proxy server is a proxy server corresponding to the regional node autonomous machine and is used for directly establishing communication connection with the first proxy server and the data processing center server, so that the data is prevented from falling to the ground, and the intermediate loss is reduced.
Specifically, as shown in fig. 1, a schematic structural diagram of the data acquisition system includes: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server. Fig. 1 shows 3 data acquisition channels, and two data processing center servers.
Each edge server in the edge server group is used for receiving a user request;
the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; sending the first summarized data to a second proxy server;
the second proxy server is used for performing second classification processing on the first summarized data, acquiring second summarized data and respectively sending the second summarized data to each data processing center server;
each data processing center server in the at least one data processing center server is respectively used for determining whether the second summarized data is data to be processed in the jurisdiction range of the data processing center server;
and when the second summarized data are determined to be the data to be processed in the jurisdiction range of the second summarized data, performing third classification processing on the second summarized data to obtain third summarized data.
In the embodiment of the application, the concept of divide-and-conquer is adopted, and the data acquisition work is divided into a plurality of data acquisition channels for acquisition. The specific idea of divide and conquer can be seen in fig. 2. That is, a parent problem is split into a plurality of child problems, a plurality of child results are obtained after corresponding processing, and finally the child results are summarized to obtain a parent problem result.
Each data acquisition channel includes a plurality of modules that respectively perform different tasks (identical to processing sub-problems, respectively). And then classifying the collected data. Therefore, the finally acquired data are basically classified and formed when reaching the data processing center, and the subsequent processing is facilitated. And the burden is reduced for subsequent processing work. Finally, each data processing center server in the at least one data processing center server identifies the classified content, obtains a self-processing part and obtains a corresponding processing result. And when all the processing results are aggregated, the result is the father problem result. That is, scheduling indexes of different dimensions are generated finally. These matters will be described one by one hereinafter.
Specifically, the edge server is an edge node of the CDN. And the node is used for facilitating the access of the user when the CDN is accelerated. When the user accesses, the edge server can receive the user access request and then record log data.
In an alternative example, the user request may include, but is not limited to, the following field information: the method comprises the steps of requesting time, a first IP, a second IP, accessing a domain name, accessing flow and requesting times, wherein the first IP is an IP corresponding to a visitor client, and the second IP is an IP to be accessed by the visitor client.
The edge server node records the data in a log form.
The edge server deploys the agent service (namely, the first agent server), and after receiving the configuration sent by the configuration center, the edge server can acquire which regional node autonomous machine the edge server belongs to for management, and then establishes long connection with the corresponding regional node autonomous machine service through the first agent server.
The first proxy server is used for performing first classification processing on the user request to acquire first summarized data. The first summarized data is then sent to a second proxy server.
In one specific example, the first proxy server may periodically count the log data and then perform the categorization. For example, log data is counted every 5 minutes.
One embodiment of the specific log data is as follows:
# timestamp VIP visitor IP customer Domain name traffic (B) number of requests
2021-07-05 19:00:00 61.134.42.12 14.220.30.197www.ctyun.cn 52770118 10
2021-07-05 19:00:00 61.134.42.12 10.125.30.101www.ctyun.cn 33388 10
2021-07-05 19:00:00 61.134.42.12 5.220.18.105www.ctyun.cn 515 10
2021-07-05 19:00:00 61.134.42.12 222.193.25.156www.ctyun.cn 108551 10
The first proxy server collects the log data every 5 minutes, and the data collection depends on the configuration issued by the configuration center. In an alternative embodiment, the configuration rules may include, but are not limited to, the following:
determining the area to which the client of the visitor belongs according to the first IP; and according to the second IP, determining the edge server group sending the user request from the edge server groups respectively corresponding to the plurality of data acquisition channels.
In a specific example, the first IP is an IP corresponding to a client used by the user, and the second IP is an upstream IP corresponding to an access request of the user.
Therefore, the first proxy server can determine the area to which the client of the user (guest) belongs according to the first IP, that is, determine the area in which the user is currently located.
Specifically, the visitor IP, i.e., the first IP, may be demarked with the IP database data, divided according to the geographic location and the operator, and matched with the corresponding area, where the matching result is:
dianxin_fujian,yidong_fujian。
and determining the edge server group sending the user request from the edge server groups respectively corresponding to the plurality of data acquisition channels according to the second IP.
Since it is possible that the edge server is currently accepting the request for the user to access, but at the next time, the edge server fails and subsequent interaction with the user needs to be completed by other edge servers in the edge server group. Therefore, the first proxy server is not directly matched to the corresponding edge server according to the second IP, but is directly matched to the corresponding edge server group.
And then summarizing the user request according to a first summary specification consisting of request time, an edge server group sending the user request, a region to which the client of the visitor belongs, an access domain name, access flow and request times, and acquiring first summary data.
As an example above, the format of the summarized first summarized data may be seen in the following examples:
number of client domain name traffic (B) requests for region to which # timestamp cluster name visitor belongs
2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun.cn 2000000 10
2021-07-05 19:00:00dianxin_cluster1 dianxin_beijingwww.ctyun.cn 2800000 50
2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun1.cn 10000000 100
2021-07-05 19:00:00dianxin_cluster1 dianxin_fujianwww.ctyun1.cn 35000000 120
An agent service is also deployed on each regional node autonomous machine for primarily processing data, each edge cache server is directly connected with a server of a regional center, and the regional center servers cannot be carried on the local servers, so that distributed summarization is performed through an agent (a second agent server).
That is, in each data collection channel, the first proxy server sends the first summary data to the second proxy server. And the second proxy server is used for performing second classification processing on the first summarized data to obtain second summarized data.
In an optional example, the second proxy server provides a data receiving port for the first proxy server to establish a communication connection with the second proxy server, and after the first proxy server aggregates the data, the data is directly reported to the second proxy server in the regional center without falling to the ground.
The second proxy server is specifically configured to extract the following fields in each user request from the first summary data to form a second summary specification: requesting time, an edge server group sending a user request, a region to which a client of a visitor belongs, and an access domain name;
and according to the second summary specification, performing simplification and classification processing on the first summary data to obtain second summary data.
And then distributing the second summarized data to each data processing center server.
Each data processing center server in the at least one data processing center server is respectively used for determining whether the second summarized data is data to be processed in the jurisdiction range of the data processing center server;
and when the second summarized data are determined to be the data to be processed in the jurisdiction range of the second summarized data, performing third classification processing on the second summarized data to obtain third summarized data.
Specifically, as described above, the second summary data includes the region to which the visitor belongs, and then different data management center servers may manage data of different regions. When the visitor area is identified as the jurisdiction area of the visitor area, the data can be determined to be the data to be processed by the visitor area, and then the second summarized data is summarized. Of course, it may also be identified in other ways whether the second summarized data belongs to the data to be processed itself. For example, the client domain name in the second summary data, or the server cluster. The specific identification information can be set according to the actual situation, and is not limited too much here. Of course, if some data processing center servers find that the second summarized data is not data that they need to process, the second summarized data may be deleted (discarded) to avoid occupying internal resources.
And the second summarized data are distributed to different data processing center servers, so that the data are prevented from being reported in a missing mode. For example, a certain data processing center needs to process a certain part of data, but the second proxy server sends data with errors, and sends the data to other data processing center servers. Of course, in an implementable manner, in the case that data cannot be missed, the data processing center server to be transmitted may also be directly distinguished at the second proxy server without being distributed to different data processing center servers, respectively.
In an optional example, the data processing center server also provides a data port for the second proxy server to report data, and may simultaneously support data transmission modes such as TCP and HTTP to report data. See in particular fig. 3. Fig. 3 is a block diagram illustrating a process for reporting data.
Specifically referring to fig. 3, when reporting specific data, data needs to be abstracted into one data stream, and reporting modes are different, but a uniform data protocol is used, and the specific format is as follows:
data header timestamp data primary key (multiple fields supported) data value …
Data one:
DOMAIN_BANDWIDTH 1625484000www.ctyun.cn dx_fujian dx_cluster1 3750000 100
DOMAIN_BANDWIDTH 1625484000www.ctyun.cn dx_beijing dx_cluster1 3750000 100
DOMAIN_BANDWIDTH 1625484000www.ctyun1.cn dx_fujian dx_cluster1 3750000 100
DOMAIN_BANDWIDTH 1625484000www.ctyun2.cn dx_fujian dx_cluster1 3750000 100
data II:
CLUSTER_BANDWIDTH 1625484000dx_cluster1 3750000 100
CLUSTER_BANDWIDTH 1625484000dx_cluster2 3750000 100
CLUSTER_BANDWIDTH 1625484000yd_cluster2 3750000 100
further optionally, after receiving the reported data, the data processing center needs to distinguish according to the data header and execute different processing after splitting the data in order to avoid coupling the data reporting with multiple data processing, for example, the reported data may be carried with the reported data, and the bandwidth data and the request data of the same client are reported at the same time. And the data processing center server is also used for preprocessing the second summarized data.
In one possible embodiment, the data processing center server may utilize the responsibility chain FilterChain to perform preliminary preprocessing on the data, including data cleaning and filtering.
In the chain of responsibility, the filtering rules can be configured by themselves: for example, configuring data header filtering, field length, bandwidth size range and other filtering rules.
Optionally, after the data processing center server preprocesses the second summarized data, the data processing center server performs a third classification process on the second summarized data, which may specifically be implemented by the following steps:
classifying the second summarized data to obtain a plurality of fourth summarized data;
and according to the processing rule corresponding to the category of each fourth summarized data, processing the fourth summarized data in parallel, and respectively acquiring third summarized data corresponding to each fourth summarized data.
The data processing center server classifies the second summarized data, and obtains a plurality of fourth summarized data, which can be seen in the following manner:
extracting a data header from each piece of the second summarized data;
and classifying the second summarized data according to the information in the data header to acquire a plurality of fourth summarized data.
In a specific example, referring to fig. 4 in particular, fig. 4 shows a schematic structural diagram that a data processing center server dispatches real-time data to different channels according to a data header in a protocol, and processes the real-time data in parallel according to a business rule.
Processor-record data-cache-divide data into different channels (including smoothing, updating, analyzing, and snapshot, etc.), and finally cache the processed data.
Optionally, in addition to the above operations, it is also necessary to consider that when the amount of data of the cache machine is large, the machine goes up and down on shelves, the machine room is cut over, the network fluctuates, the service is upgraded, and the like, which may cause transient data loss.
Therefore, the data processing center server is also used for performing disaster tolerance processing and data smoothing processing on the third summarized data.
Specifically, for bandwidth data, the difference of the bandwidth amount is not large unless a client bursts or peak time periods, and the bandwidth data at a plurality of points around a time point, so that when data is missing at a plurality of times, the bandwidth data at the nearest time point can be used for disaster recovery processing.
In addition, considering that noisy data is inevitably generated in the data acquisition process, especially sudden increase and decrease of bandwidth data occur, which results in relatively obvious frequent switching for resource scheduling. Therefore, the data processing center server may be further configured to perform denoising processing on the third summarized data.
Specifically, an exponential smoothing algorithm may be used to denoise the noise. See specifically below:
given the smoothing coefficient α, the value range (0, 1.0), the calculation formula of quadratic exponential smoothing is:
S′t=α*xt+(1-α)*S′t-1(formula 1)
S″t=α*S′t+(1-α)*S″t-1(formula 2)
The calculation formula for smoothing the value of the future T period is:
Yt+T=AT-BTt (equation 3)
Wherein:
At=2*S′t-S″t(formula 4)
Figure BDA0003435648760000111
The denoising process can be implemented in the above manner, and will not be described in detail here.
Optionally, after the above operations are performed, the data processing center is further configured to classify the third summarized data according to a preset classification dimension, and obtain a scheduling index corresponding to the classification dimension.
In a specific example, scheduling indexes with different dimensions are generated finally, taking the following representative index data as an example:
real-time bandwidth amount of client in each acceleration region
www.ctyun.com dianxin beijing timestamp Bandwidth amount
www.ctyun.com dianxin _ xiamen timestamp Bandwidth amount
Node bandwidth real-time usage
Bandwidth redundancy of time stamp upper limit bandwidth of building-door telecommunication room
Beijing telecom room timestamp upper limit bandwidth amount redundant bandwidth amount
Upper limit bandwidth quantity redundant bandwidth quantity of timestamp of building door mobile machine room
Server cluster real-time bandwidth usage and redundancy
Dianxin _ xiamen _ cluster1 timestamp ceiling bandwidth amount redundancy bandwidth amount
yidong _ xiamen _ cluster1 timestamp ceiling bandwidth amount redundancy bandwidth amount
Identifying customer emergencies in acceleration zone
www.ctyun.com dianxin _ beijing timestamp 1 (1: burst, 0: not burst)
www.ctyun.com yidong _ beijing timestamp 1 (1: burst, 0: not burst)
www.ctyun.com dianxin _ xiamen timestamp 0 (1: burst, 0: not burst)
In addition, in consideration of the integrity and stability of data acquisition, the method is very important for CDN automation scheduling, so a certain monitoring means is also adopted for the bandwidth acquisition stability of the edge server and the area node autonomous machines.
For example, the edge server reports a heartbeat packet to the corresponding regional node autonomous machine at a timing of 10s, the regional node autonomous machine collects and counts the heartbeat packets of the edge machines controlled by the regional node autonomous machine, and reports an abnormal edge machine list to the data processing center.
Similarly, the regional node autonomous machine reports a heartbeat packet to the data processing center at a timing of 10s, and the data processing center is responsible for monitoring the regional node autonomous machine
The monitoring rule of the heartbeat packet can be to receive heartbeat packet data in real time and maintain a hash table for recording the heartbeat packet data of each server.
And scanning the hash table at the timing of 10s every period, detecting the validity of the heartbeat packet, removing the heartbeat packet from the hash table after the expiration of the heartbeat packet, and if curr _ timestamp-timestamp > max _ times _ timing _ cycle is established, representing that the heartbeat packet is invalid.
Wherein curr _ timestamp represents the current timestamp
timestamp representing the corresponding timestamp of the heartbeat packet
timing _ cycle, representing the scan cycle time
max _ times, representing the number of valid cycles
And traversing a server list controlled by the current machine, matching heartbeat packet data, if the heartbeat packet data does not exist, performing abnormal service, monitoring and alarming, and performing manual intervention and repair.
Fig. 5 is a diagram illustrating an overall system architecture including the configuration center management server, the functional modules in the data acquisition system, and other required components.
As shown in fig. 5, the method includes:
and respectively transmitting the service access requests to the edge server groups corresponding to the areas to which the target requests belong through the load balancing equipment. After the data is collected by each edge server group, the data is transmitted to the regional node qualification machine (to the second proxy server) (through the first proxy server). The regional node qualification machine transmits the data to a real-time data processing center (data processing center server).
Fig. 6 shows a block flow diagram of 3 data acquisition paths (including edge server to regional center, hiding the first proxy server and the second proxy server) to establish communication connections with 2 data processing centers (servers), respectively. Fig. 6 is only a specific example, and the number of specific data acquisition channels, data processing center servers and the like can be set according to actual situations.
The data acquisition processes in fig. 5 and 6 are both described in detail in the above figures, and therefore are not described in detail herein.
The data acquisition system provided by the embodiment of the invention adopts the divide-and-conquer algorithm idea to realize a data classification scheme of splitting a complex mass data into a plurality of small modules after the complex mass data is acquired, namely, the data acquisition system is divided into a plurality of data acquisition channels to execute data acquisition work in parallel. And the first-order classification processing is executed through the first proxy server, then the second-order classification processing is executed through the second proxy server, and finally the second summarized data is transmitted to the data processing center server for processing. By the method, the concept of dividing and treating is realized, all work is subdivided into different channels, and different modules execute the work. This reduces the processing load on the data processing center server. The problems of overlarge load and low timeliness caused by the fact that a certain data processing center server classifies all massive and complex data at the same time are solved. The real-time bandwidth used by the client and the machine can be accurately and completely collected in real time when the number of edge nodes is large and the bandwidth amount is large. The CDN scheduling system can effectively schedule bandwidth resources when the bandwidth demand of a customer is sudden or the resources are in failure, so that the overall service quality of the customer is ensured. Moreover, the edge server group, the first proxy server, the second proxy server and the data processing center server adopt streaming transmission, so that the data file is prevented from falling to the ground, and the intermediate loss is reduced.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A data acquisition system, characterized in that the data acquisition system comprises: each data acquisition channel comprises an edge server group, a first proxy server and a second proxy server, and each data acquisition channel is in communication connection with at least one data processing center server;
each edge server in the edge server group is used for receiving a user request;
the first proxy server is used for performing first classification processing on the user request to acquire first summarized data; and sending the first summarized data to the second proxy server;
the second proxy server is configured to perform second classification processing on the first summarized data, acquire second summarized data, and send the second summarized data to each data processing center server;
each data processing center server in the at least one data processing center server is respectively used for determining whether the second summarized data is data to be processed in the jurisdiction range of the data processing center server;
and when the second summarized data are determined to be the data to be processed in the jurisdiction range of the second summarized data, performing third classification processing on the second summarized data to obtain third summarized data.
2. The system of claim 1, further comprising: the user request comprises the following field information: the method comprises the steps of requesting time, a first IP, a second IP, an access domain name, access flow and request times, wherein the first IP is an IP corresponding to a visitor client, and the second IP is an IP to be accessed by the visitor client;
the first proxy server is specifically configured to: determining the area to which the client of the visitor belongs according to the first IP;
according to the second IP, determining an edge server group which sends the user request from the edge server groups which respectively correspond to the plurality of data acquisition channels;
and summarizing the user request according to a first summary specification formed by the request time, the edge server group sending the user request, the region to which the client of the visitor belongs, the access domain name, the access flow and the request times, and acquiring the first summary data.
3. The system of claim 2, wherein the second proxy server is specifically configured to extract from the first summary data the following fields in each user request that constitute a second summary specification: the request time, an edge server group which sends the user request, a region to which a client of the visitor belongs, and the access domain name;
and according to the second summary specification, performing simplification and classification processing on the first summary data to obtain second summary data.
4. The system according to any one of claims 1 to 3, wherein the data processing center server is further configured to preprocess the second summarized data.
5. The system according to any one of claims 1 to 3, wherein the data processing center server is specifically configured to:
classifying the second summarized data to obtain a plurality of fourth summarized data;
and processing the fourth summarized data in parallel according to a processing rule corresponding to the category of each fourth summarized data, and respectively acquiring third summarized data corresponding to each fourth summarized data.
6. The system of claim 5, wherein the data processing center server is specifically configured to:
extracting a header from each of the second summary data;
and classifying the second summarized data according to the information in the data header to acquire a plurality of fourth summarized data.
7. The system according to any one of claims 1 to 3 or 6, wherein the data processing center server is further configured to perform disaster tolerance processing and data smoothing processing on the third summarized data.
8. The system according to any one of claims 1-3 or 6, wherein the data processing center server is further configured to generate a scheduling index based on the third summarized data.
9. The system of claim 8, wherein the data processing center server is further configured to:
and classifying the third summarized data according to a preset classification dimension to obtain a scheduling index corresponding to the classification dimension.
10. The system according to any one of claims 1-3 or 6, wherein the data processing center server is further configured to discard the second summarized data when it is determined that the second summarized data does not belong to the data to be processed in its own jurisdiction.
CN202111617804.2A 2021-12-27 2021-12-27 Data acquisition system Pending CN114466069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111617804.2A CN114466069A (en) 2021-12-27 2021-12-27 Data acquisition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111617804.2A CN114466069A (en) 2021-12-27 2021-12-27 Data acquisition system

Publications (1)

Publication Number Publication Date
CN114466069A true CN114466069A (en) 2022-05-10

Family

ID=81407311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111617804.2A Pending CN114466069A (en) 2021-12-27 2021-12-27 Data acquisition system

Country Status (1)

Country Link
CN (1) CN114466069A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905324A (en) * 2023-02-21 2023-04-04 中科迅联智慧网络科技(北京)有限公司 Intelligent matching method and system applied to correlation of various data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102474700A (en) * 2009-08-05 2012-05-23 弗里塞恩公司 Method and system for filtering of network traffic
CN106027272A (en) * 2016-04-26 2016-10-12 乐视控股(北京)有限公司 CDN (Content Delivery Network) node server traffic time deduction method and system
CN107465526A (en) * 2016-06-03 2017-12-12 德科仕通信(上海)有限公司 Internet video CDN server mass monitoring system and method
CN108347465A (en) * 2017-01-23 2018-07-31 阿里巴巴集团控股有限公司 A kind of method and device of selection network data center
CN110401647A (en) * 2019-07-16 2019-11-01 广东申立信息工程股份有限公司 A kind of IDC Information Security Management System
WO2021217470A1 (en) * 2020-04-29 2021-11-04 Citrix Systems, Inc. Computer resource allocation based on categorizing computing processes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102474700A (en) * 2009-08-05 2012-05-23 弗里塞恩公司 Method and system for filtering of network traffic
CN106027272A (en) * 2016-04-26 2016-10-12 乐视控股(北京)有限公司 CDN (Content Delivery Network) node server traffic time deduction method and system
CN107465526A (en) * 2016-06-03 2017-12-12 德科仕通信(上海)有限公司 Internet video CDN server mass monitoring system and method
CN108347465A (en) * 2017-01-23 2018-07-31 阿里巴巴集团控股有限公司 A kind of method and device of selection network data center
CN110401647A (en) * 2019-07-16 2019-11-01 广东申立信息工程股份有限公司 A kind of IDC Information Security Management System
WO2021217470A1 (en) * 2020-04-29 2021-11-04 Citrix Systems, Inc. Computer resource allocation based on categorizing computing processes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
VADYM KAPTUR: "Method of adaptive complex Internet content filtering", 《2019 INTERNATIONAL CONFERENCE ON INFORMATION AND TELECOMMUNICATION TECHNOLOGIES AND RADIO ELECTRONICS (UKRMICO)》, 12 August 2020 (2020-08-12) *
许建明, 杨璐, 刘云玲: "面向数据中心设计的多种请求分发策略", 计算机工程与设计, no. 07, 28 July 2003 (2003-07-28) *
马云龙;梅峥;郭子明;王恒;张昊;阎博;: "电力调度系统广域分布式代理关键技术", 电力系统及其自动化学报, no. 03, 15 March 2018 (2018-03-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905324A (en) * 2023-02-21 2023-04-04 中科迅联智慧网络科技(北京)有限公司 Intelligent matching method and system applied to correlation of various data

Similar Documents

Publication Publication Date Title
US7801985B1 (en) Data transfer for network interaction fraudulence detection
CN101933290B (en) Method for configuring acls on network device based on flow information
US7657624B2 (en) Network usage management system and method
CN101754253B (en) General packet radio service (GPRS) end-to-end performance analysis method and system
US9729563B2 (en) Data transfer for network interaction fraudulence detection
CN103546343B (en) The network traffics methods of exhibiting of network traffic analysis system and system
CN106533782A (en) Method and system for discovering service faults of offline places in real time
CN110417612A (en) A kind of Network Traffic Monitoring System and method based on network element
CN108900374A (en) A kind of data processing method and device applied to DPI equipment
CN107635003A (en) The management method of system journal, apparatus and system
CN114466069A (en) Data acquisition system
CN102055620B (en) Method and system for monitoring user experience
CN110838949A (en) Network flow log recording method and device
CN105025006B (en) A kind of positive information safety operation and maintenance platform
WO2022001480A1 (en) Popular application identification method, network system, network device and storage medium
JP4584735B2 (en) Quality management method for large-scale stream distribution, viewing quality management device and viewing quality management program
CN111698120B (en) Storage node isolation method and device
KR102318686B1 (en) Improved method for sequrity employing network
CN106341474B (en) It is a kind of that center and its contents management method are managed based on the data of ICN and SDN network
CN102123092A (en) Method and system for analyzing multicast performance
CN117155939B (en) Method for realizing cross-cluster resource scheduling
CN103457773A (en) Method and device for terminal customer experience management
CN110247825B (en) Information shielding method and device
Judge et al. Modeling world wide web request traffic
EP2696322B1 (en) Action triggering based on Subscriber Profile

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination