CN116938926A - CDN-fused data processing method, device and system and electronic equipment - Google Patents

CDN-fused data processing method, device and system and electronic equipment Download PDF

Info

Publication number
CN116938926A
CN116938926A CN202210369424.XA CN202210369424A CN116938926A CN 116938926 A CN116938926 A CN 116938926A CN 202210369424 A CN202210369424 A CN 202210369424A CN 116938926 A CN116938926 A CN 116938926A
Authority
CN
China
Prior art keywords
data
task
pulling
configuration information
cdn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210369424.XA
Other languages
Chinese (zh)
Inventor
黄佑榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210369424.XA priority Critical patent/CN116938926A/en
Publication of CN116938926A publication Critical patent/CN116938926A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/14Charging, metering or billing arrangements for data wireline or wireless communications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M15/00Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP
    • H04M15/70Administration or customization aspects; Counter-checking correct charges

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method, device and system for fusing CDNs and electronic equipment, and relates to the technical field of big data and clouds. The method comprises the following steps: acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider; and for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, pushing the first data pulling task to a preset message queue, so that the distributed system pulls data according to a second data pulling task to be executed in the message queue, and storing the pulled data to a database. The embodiment of the application realizes decoupling of task production and consumption, realizes integration of real-time pulling and processing of related data of each CDN server under mass data, can conveniently carry out transverse capacity expansion, and better meets the actual application requirements.

Description

CDN-fused data processing method, device and system and electronic equipment
Technical Field
The application relates to the technical field of clouds, in particular to a data processing method, device and system for fusing CDNs and electronic equipment.
Background
When a CDN service user accesses the CDN service, the bottom layer can realize analysis service of a plurality of CDN service providers, namely, fusion CDN. The aim of the fusion CDN is to integrate the current cloud manufacturer resources with high quality, and through the fusion technology, the fusion management of the acceleration and further optimization of the data network can be realized,
after the CDN service user accesses the CDN, partial traffic of the business of the specific CDN service user is cut to a third party service provider for providing service in consideration of the overall operation cost and benefit, and at the moment, charging amount data generated by the CDN service user on each third party CDN service provider is required to be pulled back through a data pulling protocol for each service provider, so that real-time display and charging of the CDN service user are provided for the CDN service user.
The existing traditional technical scheme is to pull charging data of each CDN server polled in a batch mode according to the period, and after reaching a specified period, the charging data processing program is pulled to start to analyze interface protocols and data pulling of each server, so that development cost is high, processing efficiency is low, data instantaneity is insufficient, and data accumulation risks exist.
Disclosure of Invention
The embodiment of the application provides a data processing method, device and system for fusing CDNs and electronic equipment, which can solve the problems in the prior art. The technical proposal is as follows:
according to an aspect of an embodiment of the present application, there is provided a data processing method for fusing CDNs, including:
acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider;
and for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, pushing the first data pulling task to a preset message queue, so that the distributed system pulls data according to a second data pulling task to be executed in the message queue, and storing the pulled data to a database.
As an alternative embodiment, the configuration information further includes field mapping rules for normalizing the data of the storage target type;
pushing the first data pulling task to a preset message queue, including:
pushing a field mapping rule corresponding to the first data pulling task to a preset message queue;
Storing the pulled data to a database, comprising:
and carrying out standardization processing on the data pulled according to the second data pulling task according to the field mapping rule corresponding to the second data pulling task, and storing the standardized data into a database.
As an optional implementation manner, the distributed system includes a plurality of working nodes, and the distributed system pulls the second task to pull data according to the data to be executed in the message queue, including:
when the second data pulling task to be executed exists in the message queue, for each second data pulling task to be executed, a plurality of working nodes in the distributed system compete for the second data pulling task to be executed, and the successfully competing working nodes pull data according to the second data pulling task.
As an optional implementation manner, the working state of any working node of the distributed system is a first state or a second state;
the working node in the distributed system competes for the second data pulling task to be executed, and the working node with successful competition pulls data according to the second data pulling task, which comprises the following steps:
the working nodes in the first state compete for a second data pulling task to be executed in the distributed system;
Updating the working state to a second state by the working node with successful competition, and pulling data according to a second data pulling task to be executed;
storing the pulled data in a database, and then further comprising:
the competing working node updates the working state to the first state.
As an alternative embodiment, the related information includes time granularity and task indication information of the pulled data;
generating a first data pulling task in real time according to the related information in the configuration information, wherein the first data pulling task comprises the following steps:
and generating a first data pulling task corresponding to the time granularity in real time according to the task indication information, wherein the first data pulling task comprises a time range of data to be pulled.
As an alternative embodiment, the task instruction information further includes: at least one of identification information of the service, domain name of the CDN service user, identification information of the corresponding CDN service provider, and interaction protocol information for interacting with a server of the corresponding CDN service provider.
As an alternative embodiment, the interaction protocol information includes: communication protocol, data coding format and interactive interface.
As an optional implementation manner, the configuration information of any CDN server is obtained by performing a configuration triggering operation in response to the configuration information of the client, and displaying a configuration interface through the client;
And receiving configuration operation aiming at the at least one item to be configured through the configuration interface, and generating configuration information based on the received configuration operation.
According to another aspect of an embodiment of the present application, there is provided a data processing apparatus for fusing CDNs, including:
the configuration information acquisition module is used for acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider;
the data processing module is used for generating corresponding first data pulling tasks in real time according to the related information in the configuration information for each configuration information, pushing the first data pulling tasks to a preset message queue, enabling the distributed system to pull data according to second data pulling tasks to be executed in the message queue, and storing the pulled data to the database.
As an alternative embodiment, the configuration information further includes field mapping rules for normalizing the data of the storage target type;
the data processing module comprises:
the pushing sub-module is used for pushing the first data pulling task and the field mapping rule corresponding to the first data pulling task to a preset message queue;
And the rule storage sub-module is used for carrying out standardization processing on the data pulled according to the second data pulling task according to the field mapping rule corresponding to the second data pulling task and storing the standardized data into the database.
As an alternative embodiment, the distributed system includes a plurality of working nodes, and the working nodes of the distributed system are used for:
when the second data pulling task to be executed exists in the message queue, competing the second data pulling task to be executed for each second data pulling task to be executed, and if the competition is successful, pulling data according to the second data pulling task.
As an optional implementation manner, the working state of any working node of the distributed system is a first state or a second state;
the working node is used for:
if the first data is in the first state, competing for a second data pulling task to be executed;
if the competition is successful, the working state is updated to a second state, and the data is pulled according to a second data pulling task to be executed;
and storing the pulled data into a database, and updating the working state into a first state.
As an alternative embodiment, the related information includes time granularity and task indication information of the pulled data;
A data processing module, comprising:
and the task generation sub-module is used for generating a first data pulling task corresponding to the time granularity in real time according to the task indication information, wherein the first data pulling task comprises a time range of data to be pulled.
As an alternative embodiment, the task instruction information further includes: at least one of identification information of the service, domain name of the CDN service user, identification information of the corresponding CDN service provider, and interaction protocol information for interacting with a server of the corresponding CDN service provider.
As an alternative embodiment, the interaction protocol information includes: communication protocol, data coding format and interactive interface.
As an optional implementation manner, the configuration information obtaining module is specifically configured to:
responding to configuration triggering operation of configuration information of the client, and displaying a configuration interface through the client;
and receiving configuration operation aiming at the at least one item to be configured through the configuration interface, and generating configuration information based on the received configuration operation.
According to another aspect of an embodiment of the present application, there is provided a data processing system for fusing CDNs, the system including:
the data processing device is used for acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider; for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, and pushing the first data pulling task to a preset message queue;
And the distributed system is used for pulling the data according to the second data pulling task to be executed in the message queue and storing the pulled data into the database.
According to another aspect of an embodiment of the present application, there is provided an electronic apparatus including: the system comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of the data processing method of the fusion CDN.
According to another aspect of an embodiment of the present application, there is provided a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the data processing method of a converged CDN described above.
According to another aspect of an embodiment of the present application, there is provided a computer program product, including a computer program, which when executed by a processor implements the steps of the data processing method of the converged CDN described above.
The technical scheme provided by the embodiment of the application has the beneficial effects that:
the configuration information comprises the related information required by pulling the data of the target type of the corresponding CDN server, and the flexibility of updating the configuration items in the configuration information is utilized, so that compared with the mode of directly packaging scripts in the prior art, the configuration information has higher flexibility, corresponding first data pulling tasks are generated in real time according to the related information in the configuration information for each configuration information, the first data pulling tasks are pushed to a message queue in a producer mode, the tasks in the message queue are consumed in real time by a distributed queue in a consumer mode, the decoupling of task production and consumption is realized, the integration of real-time pulling and processing of the related data of each CDN server under massive data is realized, the transverse capacity expansion can be conveniently carried out, and the actual application requirements are better met.
Optionally, the method provided by the embodiment of the application can be applied to pulling of the charging data of a large number of third-party CDN servers, and the capability of real-time pulling and processing integration of the charging data of the third-party CDN servers under the mass data is realized by decoupling the production and consumption of the pulling task. The problem that the existing scheme is insufficient in real-time performance and data are accumulated can be effectively solved by adopting the distributed system to consume tasks.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a flow chart of a data processing method of a fusion CDN in the prior art;
fig. 2 is a flow chart of a data processing method of a converged CDN according to an embodiment of the present application;
fig. 3 is a flow chart of a data processing method of a converged CDN according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a status update of a working node according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a configuration interface displayed by a client according to an embodiment of the present application;
fig. 6 is a flow chart of a data processing method of a converged CDN according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a data processing system with a CDN according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a computer system according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data processing apparatus for fusing CDNs according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a data processing system with a converged CDN according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, several terms related to the present application are described and explained:
1) Cloud technology (Cloud technology) can form a resource pool based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by a Cloud computing business model, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
2) Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.
3) The Database (DB), which can be simply referred to as an electronic filing cabinet, is a place where electronic files are stored, and a user can perform operations such as adding, querying, updating, deleting, etc. on data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application.
4) And a content delivery network (Content Delivery Network, CDN) for delivering the source station content to accelerating nodes throughout the country, shortening the delay of viewing the content by the user, improving the response speed of accessing the website by the user and the usability of the website, and solving the problems of small network bandwidth, large user access amount, uneven website distribution and the like. The CDN system can redirect the user's request to the service node nearest to the user in real time according to the network traffic and the comprehensive information of the connection of each node, the load condition, the distance to the user, the response time and the like. The method aims to enable the user to obtain the required content nearby, solve the problem of congestion of the Internet network and improve the response speed of the user for accessing the website.
5) Fusion CDN: with the deep application of cloud computing products, most enterprises gradually migrate a plurality of offline services to the cloud, when a CDN service user (such as a live platform) accesses the CDN service, the bottom layer can realize analysis service of a plurality of CDN servers, the purpose of the fusion CDN is to integrate the current high-quality cloud manufacturer resources, fusion management of further optimization of acceleration of a data network can be realized through a fusion technology, and meanwhile, financial report scale and benefit are improved among service providers through business exchange.
Fig. 1 is a schematic flow chart of a data processing method of a converged CDN in the prior art, as shown in the drawing, in the prior art, a manner of deploying and configuring a crontab timing task by a single machine is mainly used to develop a customized data pulling script corresponding to each CDN server, that is, to start executing the script to perform batch processing after a specified period, and mainly concentrate on two mature technical schemes of pulling once a day and pulling once an hour, where after the specified period is reached, the script (three pulling programs for A, B and C three servers are shown in fig. 1) for pulling charging data of each server triggers operation, so as to complete pulling of charging data once.
By analyzing the existing methods, the following problems can be found in the prior art:
1. the development cost is high: different services may have different billing data pull protocols for adaptation in different CDN servers, for example, HTTP interfaces, kafka protocols, text protocols, etc., and usually separate script program development is required, and there may be many customized script development tasks for processes such as processing pull protocol parsing, data processing conversion, and standardized landing.
2. The treatment efficiency is low: the existing charging data pulling is generally that a single machine is configured with a timing task trigger, and the data pulling program for triggering each service provider can be started to process only after a period arrives, wherein the period gap is equivalent to a sleep waiting trigger state, and new charging data can not be pulled and processed in time.
3. Insufficient data instantaneity and risk of data accumulation: the charging amount data generated by a CDN service user after the period T is started can be started after a complete time period T+1 by a pulling program, which is equivalent to the fact that the charging amount data of the CDN service user can be seen after a complete period, namely, the charging amount data is not real-time and not efficient, and the CDN service user experience is not friendly; the fixed period execution is simple to realize, but when the sudden increase of the data volume of one or more CDN servers possibly causes that the pulling processing of the charging data cannot be completed in one period, the method mainly depends on the processing capacity of a single machine, the capacity expansion processing cannot be performed by a transverse capacity expansion machine, and the situations that the data delay exceeds a timing period and the like can occur.
The application provides a data processing method, a device, electronic equipment, a computer readable storage medium and a computer program product of a converged CDN, and aims to solve the technical problems in the prior art. The main inventive concept is as follows: by means of configurable data protocol analysis, pulling task distribution and distributed cluster processing are achieved through big data distributed message queues, and target type data standardization integration of third-party CDN servers of real-time mass big data is achieved.
Referring to fig. 2, a schematic flow diagram of a data processing method of a converged CDN in an embodiment of the present application is shown, where, to overcome the problem that in the prior art, a determined data pulling protocol is integrated into a script and is not easy to update and has high maintenance difficulty, the present application may pre-configure configuration information required for pulling data of a CDN server corresponding to each service, because the configuration information itself is made up of discrete configuration items, it is very convenient to generate a new configuration item or update the configuration item, the data pulling task may be quickly generated according to information recorded in the configuration information, and the script does not need to be rewritten in a lot of time as in the prior art, based on the configuration information, a new data pulling task may be continuously generated based on an allowable minimum time granularity, and the data pulling task may be sent to a message queue, by a broadcasting characteristic of the message queue, N working nodes on a distributed system consume the data pulling task in a corresponding task processing process, perform real-time data pulling, process and store the data pulling task in real-time by an instruction in the task, so that a real-time data pulling task may be quickly generated, and when the integrated task is executed by a third-party service type of a service is required to be significantly faster than a system, and the system is required to be executed in a system is significantly.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
The embodiment of the application provides a data processing method for fusing CDNs, as shown in FIG. 3, which comprises the following steps:
s101, acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider.
When a CDN service user wants to obtain a CDN service aiming at a target domain name, the CDN service user needs to apply to a main CDN server, and after the CDN service is effective, all requests of the target domain name are transferred to a service node of a main CDN server. In some embodiments, CDN services may include storage services, live audio video services, on demand services, overseas delivery services, and the like. The main CDN service provider of the embodiment of the application can cut partial flow of CDN service to the third-party CDN service provider in proportion for providing service in consideration of the overall operation cost and benefit, so as to realize the fusion of CDN service. Since the main CDN server and the third-party CDN server need to settle accounts for the division of the CDN service, and the CDN service user also needs to obtain the CDN service rule, the main CDN server needs to determine the data pulling protocol with the third-party CDN server first, and pull the target type of data generated by the CDN service from the third-party CDN server according to the data pulling protocol.
For the method of obtaining the data pulling protocol of the third-party CDN server, the embodiment of the present application is not limited, for example, the third-party CDN server may autonomously edit the data pulling protocol through an interface for editing the data pulling protocol provided to the third-party CDN server, or the third-party CDN server may provide specific information of the data pulling protocol to the main CDN server and edit the data pulling protocol by the main CDN server.
The data pulling protocol includes rules to be followed by the main CDN server to pull the data from the third party CDN server, for example, may include communication protocol information, a coding format of data transmission, encryption and decryption rules, and the embodiment of the present application is not limited in detail.
In some embodiments, the data of the target type may be charging data related to the split requirement, and may also be flow data generated by the CDN service, stored data, and other parameter data related to the service quality.
In order to solve the problems of difficult updating and high maintenance difficulty existing in the prior art of integrating the determined data pulling protocol into the script, the application generates the configuration information by the data pulling protocol, wherein the configuration information comprises a plurality of configuration items, the information needing to be filled in different configuration items has differences, and all the configuration items are related to the data of the target type of the pulling CDN server. According to the embodiment of the application, the related information required by pulling the target type data of the corresponding CDN service provider is recorded in the mode of configuration information, and compared with the mode that a pulling script is directly generated by a data pulling protocol in the prior art, each configuration item can be quickly written, updated and deleted, so that the method and the device have higher flexibility.
In addition, since the configuration information itself is composed of discrete configuration items, new configuration information with differences can be generated by changing part of configuration items in one configuration information, and further, the generated pulling tasks are different and have no logic conflict, so that the pulling tasks generated based on the configuration information have extremely high expansibility. Specifically, if the configuration information includes three configuration items: the domain name, service and CDN service provider simply change any configuration item to generate new configuration information, and the subsequent first data pulling task generated based on the configuration information is completely different, while in the prior art, if a new data pulling task is required to be generated, a new script needs to be created again, and if the data pulling task is required to be updated, the script needs to be edited again.
The related information of the embodiment of the application can include: at least one of identification information of a service, a domain name of a CDN service user, identification information of a corresponding CDN service provider, interaction protocol information for interacting with a server of the corresponding CDN service provider, and a floating coefficient.
The flow required by the different services used by the CDN service consumer may be different, for example, in the live broadcast field, the services selected by the live broadcast platform as the CDN service consumer include high-speed live broadcast, medium-speed live broadcast and low-speed live broadcast, and different services have different experiences for the audience. Therefore, the identification information of the recorded service lays a foundation for accurately counting the data of the target type.
The domain name of the CDN service consumer, i.e., the domain name of the service provided by the CDN service consumer, e.g., the CDN service consumer is a live platform, then the domain name is the domain name of the live platform. By logging the information of the domain name, when the CDN service is validated, the request directed to the domain name is proportionally diverted to a node of the third party CDN server.
The identification information of the CDN server is used to indicate which CDN server's data to pull.
There may be differences in the interaction protocols that interact with servers of different CDN servers, some CDN servers may support the json protocol for http, and some may support the kafka protocol.
The floating coefficient is used for identifying the floating proportion of the pulled data, taking the data of the target type as charging data as an example, and aims to record the service cost of each third-party CDN service provider, and the profit degree of the main CDN service provider can be regulated by setting the floating coefficient.
By combining the above related information, the configuration information may be referred to later by pulling related information required by the data of the target type of the corresponding CDN service provider, for example, the service identifier, the domain name of the CDN service consumer, the identification information of the CDN service provider, and the interaction protocol information, so that the service used by the specific CDN service consumer can interact with the CDN service provider by using the interaction protocol information of the corresponding CDN service provider, thereby successfully pulling the data. That is, the first data pulling task generated according to the related information in the configuration information in the embodiment of the present application actually includes the related information in the configuration information.
S102, for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, pushing the first data pulling task to a preset message queue, so that the distributed system pulls data according to a second data pulling task to be executed in the message queue, and storing the pulled data to a database.
According to the embodiment of the application, the first data pulling task corresponding to the CDN server is generated in real time according to the related information of each configuration information, that is, the configuration information of the CDN server is obtained through step S101, namely the first data pulling task corresponding to the CDN server can be continuously generated, and the first data pulling task can specifically indicate the time and the time period when the data of the target type is pulled. Because the first data pulling task is continuously generated, the period of the task generation and the duration of the time period of each task pulling data can be indicated, so that the indicated duration is accumulated on the basis of the time period corresponding to the data pulled by the previous task according to the indicated period, and the time period corresponding to the data pulled by the current task is determined. For example, if it is indicated that one first data pulling task is generated every 1 minute, the previous first data pulling task indicates that data is pulled for a certain day 10:10 to 10:11, then after 1 minute of the previous first data pulling task is generated, a new first data pulling task may be generated to indicate that data is pulled for a certain day 10:11 to 10:12.
The embodiment of the application pushes the first data pulling task generated in real time to the message queue in a producer mode for processing by downstream consumers. The distributed system is used as a consumer to perceive that a data pulling task (namely a second data pulling task) to be executed exists in the message queue, and particularly N working nodes in the distributed system preemptively execute the second data pulling task and store the data pulled from the corresponding CDN service providers into a database.
Step S202 is to decouple the generating task and the consuming task through the message queue, the distributed system can execute the task without state, and even when the generating speed of the first data task is larger than the executing speed of the task, the processing capacity of the mass data can be conveniently realized through the transverse expansion of the distributed system.
It should be understood that the second data pulling task in the embodiment of the present application refers to a data pulling task to be performed in the message queue. In some embodiments, the distributed system of the present application may process from first to second according to the time that each second data pulling task in the message queue is pushed to the message queue when data is obtained from the message queue. If the speed of the distributed system consumption task is greater than the speed of the generation task, the second data pulling task consumed by the distributed system is the first data pulling task just put into the message queue every time the generated first data pulling task is pushed to the message queue. If the speed of the distributed system consuming task is less than the speed of the generating task, the second data pulling task consumed by the distributed system is the first data pulling task which is not consumed yet and enters the consuming queue earliest.
In addition, since the data pulling tasks for each CDN server are all collected in the message queue, priorities of different CDN servers may be determined, and when there is no second data pulling task for the CND server with the highest priority in the message queue, the second data pulling task of the CND server with the next highest priority is executed again.
According to the data processing method for the fusion CDN, the configuration information of at least one CDN server is obtained, the configuration information comprises the relevant information required for pulling the data of the target type of the corresponding CDN server, the corresponding first data pulling task is generated in real time according to the relevant information in the configuration information, and because the configuration items in the updating configuration information are only relevant to the conditions required for pulling the relevant information, the updating configuration items are obviously lower than the workload and difficulty of updating scripts in the prior art, the data pulling task generation method has higher flexibility, the first data pulling task is pushed to a message queue in a producer mode for each configuration information, the distributed queue is used for real-time consuming of the tasks in the message queue in a consumer mode, the decoupling of task production and consumption is realized, the integration of real-time pulling and processing of the relevant data of each CDN server under massive data is realized, the transverse expansion is conveniently carried out, and the practical application requirements are better met.
Because the distributed system stores the pulled data for different CDN servers in the same way, and rules used by the different CDN servers themselves in data storage, such as naming rules of field names and differences in data storage formats, it is necessary to set uniform data storage rules so as to quickly call the data later.
On the basis of the above embodiments, as an alternative embodiment, the configuration information further includes a field mapping rule for normalizing the data of the storage target type.
Because the distributed system only needs to pull the data according to the second data pulling task, the pulled data may have different dimensions due to different CDN servers, and if the pulled data is stored in the database intact, the later searching of the data is very affected. The configuration information according to the embodiment of the application further comprises field mapping rules for standardizing the data of the storage target type. The field mapping rule may be used to indicate that each dimension information in the collected data of the target type is stored in the target storage field of the database, and if the information of a certain dimension is not suitable for being directly stored in the database, for example, the information amount of the information is large, or the storage format of the domain database is inconsistent, the field mapping rule may further include format conversion logic of each dimension information in the collected data of the target type.
Correspondingly, pushing the first data pulling task to a preset message queue comprises the following steps:
pushing the first data pulling task and the field mapping rule corresponding to the first data pulling task to a preset message queue.
By pushing the field mapping rule corresponding to each first data pulling task to the message queue together, the working node executing the first data pulling task can perform standardized processing on the data according to the corresponding field mapping rule after pulling the data.
In some embodiments, the embodiments of the present application may generate the first data pulling task according to the relevant information for pulling data in the configuration information and the field mapping rule for normalizing the stored data, so that the rule for pulling data is recorded in the first data pulling task, and the rule for subsequently storing data is also recorded in the first data pulling task, so that when a node of the distributed system performs a data pulling task, data pulling and data storage may be performed consecutively according to the data pulling task.
Further, the embodiment of the application stores the pulled data to a database, which comprises the following steps:
and carrying out standardization processing on the data pulled according to the second data pulling task according to the field mapping rule corresponding to the second data pulling task, and storing the standardized data into a database.
The standardized processing of the embodiment of the application is that the information of each dimension of the pulled data is stored in the corresponding field in the database in a standard data format. In some embodiments, the pulled data may include time information, source information (i.e., from which CDN server), specific values, and so forth. Accordingly, when storing data in the database, time information, source information, and specific values may be normalized and stored in corresponding fields.
Specifically, in one embodiment the code segment of the field mapping rule may be expressed as:
wherein, "T", "Host", "value" and "count" respectively represent preset fields in the database, and the number and naming of specific fields in the database are not specifically limited in the embodiments of the present application. Each row of data stored in the database comprises information of the four fields, and in the 'T' field, a time range corresponding to pulled data is needed to be stored. The "Host" field stores the domain name of the CDN service user, and records domain name information to lay a foundation for counting CDN service information (e.g., cost calculation) of a specific domain name. The value field stores data of a corresponding time range that is pulled, and the country field stores a country or region where traffic occurs, such as china, japan, korea, and the like. For example, the CDN server counts that the chinese access traffic of a domain name "www.xxx.com" is classified as L at 9 a.m. 01, and when the traffic data is pulled, the time range of the traffic data is "9 a.m. 01, the Value field stores" L ", the domain name field is" www.xxx.com ", and the count field stores" chinese ".
Based on the foregoing embodiments, as an optional embodiment, the distributed system includes a plurality of working nodes, and the distributed system pulls data according to a data pulling task to be executed in a message queue, including:
when the second data pulling task to be executed exists in the message queue, for each second data pulling task to be executed, a plurality of working nodes in the distributed system compete for the second data pulling task to be executed, and the successfully competing working nodes pull data according to the second data pulling task.
It should be noted that Consumer Group (Consumer Group) is an extensible and fault tolerant Consumer mechanism provided by kafka, and there are multiple consumers in a Consumer Group that share a common ID, i.e., group ID, and all consumers in the Group coordinate together to consume all partitions (parts) of the subscription topic (subscribed topics). Of course, each partition can only be consumed by one consumer within the same consumption group. According to the distributed system, each working node corresponds to a task processing process for processing the kafka queue, and all the working nodes are in the same Group, so that only one task processing process (corresponding working node) can be successfully obtained for each data pulling task by using a consumption Group mechanism of the kafka. The problem that the same data pulling task is repeatedly executed is effectively avoided.
The competing data pulling task of the embodiments of the present application may be to randomly determine that one working node performs the second data pulling task, in some embodiments, each working node generates a random number related to one second data pulling task, and then determines, by a specific size, the working node corresponding to the maximum number or the minimum number as the node performing the second data pulling task. When at least two working nodes generate the same maximum number or minimum number, the working nodes generating the same maximum number or minimum number generate random numbers again, so that the unique working node is determined. The unique second data pulling task is removed from the message queue whenever the execution node of the second data pulling task is determined.
On the basis of the above embodiments, as an alternative embodiment, the working state of any working node of the distributed system is the first state or the second state. The two states are understood to mean the state when a well-known node is busy, i.e. performs a data pulling task until the data is stored in the database, or idle, i.e. not performing a data pulling task.
The working node in the distributed system competes for the second data pulling task to be executed, and the working node with successful competition pulls data according to the second data pulling task, which comprises the following steps:
s201, competing a second data pulling task to be executed by a working node in a first state in the distributed system;
s202, updating the working state to a second state by the working node with successful competition, and pulling data according to a second data pulling task to be executed;
the working node stores the pulled data in a database, and then further comprises: the competing working node updates the working state to the first state.
When the second data pulling task which is not executed exists in the message queue at the upstream of the distributed system, all idle nodes compete for the second data pulling task to be executed, only one idle node can compete successfully, and the successfully competing working node pulls data and stores the data into the database and then updates the data into the first state.
Referring to fig. 4, which is a schematic diagram illustrating status update of a working node according to an embodiment of the present application, as shown in the drawing, the distributed system includes 4 working nodes, there are two second data pulling tasks in a message queue, since the working status of the working nodes 1-3 is the first status and the working status of the working node 4 is the second status at the initial time, the working nodes 1-3 compete for the second data pulling task 1, the working node 1 is determined to execute the second data pulling task 1, the working status of the working node 1 is updated to be the pulling status, and data is pulled from the corresponding CDN server according to the second data pulling task 1, when the working node 1 is determined to execute the second data pulling task 1, the working node in the first status needs to compete for the second data pulling task 1 immediately, at this time, the working node 4 has already stored the data into a database, and the working status of the working node 1 is updated to the first status, so the working nodes 2-4 compete, and the working node 3 is determined to compete successfully; when the working node 1 stores the data into the database, the working state is updated to the first state again, and a new data pulling task appears in the message queue.
In some embodiments, the second state of an embodiment of the present application may be further subdivided into a pulled state and a stored state. When the competition of the idle nodes succeeds, the first state is updated to be the pulling state, data is pulled according to the competing second data pulling task, after the data is successfully pulled, the pulling state is further updated to be the storage state, the pulled data is stored in the database, and after the storage is completed, the storage state is updated to be the first state. And when the node is in the first state, waiting for the task to enter the message queue.
On the basis of the above embodiments, as an optional embodiment, the related information further includes time granularity and task indication information of the pulled data.
The time granularity of the pull data indicates the generation frequency of the data pull task, and for example, if the time granularity is 30 seconds, it means that one data pull task is generated every 30 seconds.
The task indication information is used for indicating how to pull and which data needs to be pulled, for example, the task indication information may include a service identifier, a domain name of a CDN service user, identification information of a corresponding CDN service provider, interaction protocol information for interacting with a server of the corresponding CDN service provider, and the like.
The target type of data often changes with time, for example, the flow data in different time periods are different, the corresponding charging data is also different, the task indication information also comprises time range information, and then when the time range of the data pulled by the last task is determined, the time range of the data pulled by the current task can be determined by combining the time granularity and is represented by a starting time (time stamp) and an ending time (time stamp).
Taking the time granularity of 1 minute as an example, it is indicated that one first data pulling task needs to be generated every minute, and the start time stamp of the first data pulling task generated at present is the end time stamp of the first data pulling task generated at last, and the end time stamp of the first data pulling task generated at present is the time 1 minute after the start time stamp of the first data pulling task generated at present. Specifically, if the starting timestamp of the first data pulling task generated last time is 20:05, end timestamp 20:06, then the start timestamp of the currently generated first data pull task is 20:06, end timestamp 20:07.
generating a first data pulling task in real time according to the related information in the configuration information, wherein the first data pulling task comprises the following steps: and generating a first data pulling task corresponding to the time granularity in real time according to the task indication information, wherein the first data pulling task comprises a time range of data to be pulled, and the time range corresponds to the time granularity.
For example, the task indication information includes identification information of a CDN server, and then the corresponding generated first data pulling task instructs the working node to obtain data of a target type from the CDN server, the task indication information includes a domain name and service identification information, and then the first data pulling task instructs the working node to obtain data of a target type of a service of the domain name.
On the basis of the above embodiments, as an alternative embodiment, the interaction protocol information includes: communication protocol, data coding format and interactive interface.
Pulling data from the CDN server must follow the communication protocol specified by the CDN server, so the interaction protocol information is provided by the CDN server. The type of the communication protocol is not particularly limited and may be, for example, http protocol and kafka protocol. The data coding format specifies the coding format of the data pulled from the CDN server, and the pulled data can be correctly decoded only when the data coding format is known and then stored in the database. The interactive interface is the interface of the CDN server for calling the data of the target type.
Based on the foregoing embodiments, as an optional embodiment, the configuration information of any CDN server in the embodiment of the present application is obtained by:
s301, responding to configuration triggering operation of configuration information of a client, and displaying a configuration interface through the client, wherein at least one item to be configured is displayed in the configuration interface.
The client in the embodiment of the application can be used for configuring configuration information of a CDN server, the operation authority with the client can be a main server or a third-party CDN server, the embodiment of the application is not particularly limited, when the user of the client is the main server, the main server needs to determine related information required by pulling data of the third-party CDN server in advance, when the user of the client is the third-party CDN server, the third-party CND server configures the configuration information on the client and then transmits the configuration information to the main CDN server through a preset interface, and the main CDN server supplements the configuration items which are configured by the third-party CDN server in the configuration information and stores the supplemented complete configuration information in a preset database. It should be understood that the items to be configured displayed in the configuration interface correspond to the configuration items in the configuration information, and that the complete configuration information is capable of generating executable data pulling tasks.
S302, receiving configuration operation for the at least one item to be configured through a configuration interface, and generating configuration information based on the received configuration operation.
Fig. 5 is a schematic diagram of a configuration interface displayed by a client, where a CDN service user opens the client, and a main interface of the client displays various functions, including a control 501 for configuring configuration information, the CDN service user triggers the control in a preset manner, and the client switches from the main interface to the configuration interface. The configuration interface comprises a plurality of configuration items which need to be filled out by CDN service users, and each configuration item corresponds to related information needed by interaction with a certain third-party CDN service provider. The figure comprises identification information, communication protocols, data coding formats and the like of the third-party CDN service providers. After filling all configuration items, the CDN service user clicks a completion control 502 in the interface to submit configuration information.
It should be noted that, the configuration information may also be generated by the main CDN server through the client, and when operated by the main CDN server, more configuration items may be shown in the configuration page, for example, a service identifier, a domain name of a CDN service user, and the like.
It should be understood that, by providing the client with the configuration interface to generate the configuration information, the embodiment of the application can realize the access to the newly accessed third party CDN service merchant charging data pulling task. It should be understood that the embodiment of the present application may also implement the function of modifying and deleting the generated configuration information through the above-mentioned client. Of course, the stored configuration information may also be modified and deleted directly in the database for storing configuration information. The embodiment of the present application is not particularly limited.
Referring to fig. 6, a flow chart of a data processing method of a converged CDN according to another embodiment of the present application is shown, and the flow chart includes:
s401, responding to configuration triggering operation of configuration information of a client, displaying a configuration interface through the client, receiving configuration operation aiming at least one item to be configured through the configuration interface, and generating configuration information based on the received configuration operation;
the main CDN server and the third party CDN server determine the flow packetization rule, the main CDN server or the third party CDN server generates configuration information through a preset client, in a specific example, the configuration information is [ tx.video| www.aizhibo.com |closed_ CDN _a|1.1|pull protocol configuration 1|60s ], where tx.video represents a name of a specific service purchased by a purchaser of the CDN service, the service is a multimedia live service, www.aizhibo.com is a domain name of the purchaser, closed_ CDN _a is an identifier of the third party CDN server, 1.1 represents a floating proportion of charging data, and the pull protocol configuration includes related information required for pulling data of a target type of the corresponding CDN server and a field mapping rule for standardizing data of a storage target type, and 60s is a time granularity of the pull data.
S402, for each piece of configuration information, generating a corresponding first data pulling task in real time according to the time granularity in the configuration information, and pushing the first data pulling task to a message queue.
Since the granularity of time in the configuration information is 60s, this embodiment generates a first data pulling task every 60s, and a certain first data pulling task may be expressed as:
the tx_video|closed_ cdn _a| www.aizhibo.com | pull protocol configuration 1|start timestamp|end timestamp.
In this embodiment, the rule for pulling data and the rule for standardizing stored data are written into the configuration information and the first data pulling task together, so that the downstream distributed system can complete two steps of data pulling and data storage according to the first data pulling task.
S403, a second data pulling task to be executed in a working node competition message queue in a first state in the distributed system;
s404, updating the working state to a second state by the working node with successful competition, and pulling data according to a second data pulling task to be executed;
s405, after successful data pulling of the working node with successful competition, standardized storage is carried out according to the field mapping rule in the pulling protocol configuration 1.
Specifically, fields in pull protocol configuration 1 are used to record field mapping rules, including a time T field that records a start timestamp and an end timestamp of a record in a second data pull task in a database, a domain name www.aizhibo.com of a record in a second data pull task in a host field of the database, pulled billing data in a value field, and a traffic generated area, such as "china", written to a country field.
S406, the working node with successful competition updates the working state to the first state so as to compete again for the unprocessed second data pulling task in the message queue.
Referring to fig. 7, a schematic diagram of a data processing system of a converged CDN according to an embodiment of the present application is shown, where the system includes a configuration DB, a parse configuration/real-time generation pull task module, a message queue, a task execution module, a data formatting processing module, and a data DB.
The embodiment of the application stores the configuration information and the pulled data in two databases respectively, wherein the configuration information is stored through a configuration DB, and the pulled data is stored through a data DB. The format of the configuration information in this embodiment may be service identifier, domain name, CDN service provider identifier, floating coefficient, pull protocol configuration, pull granularity,
The service identifier, the domain name, the CDN service provider identifier, the floating coefficient and the pull protocol configuration all belong to the task indication information. The service identifier is used to represent the name of the service used by the CDN service consumer, and may be, for example: tx_video; the domain name, i.e. the domain name of the CDN service consumer, may be, for example: www.abc.com; the CDN service provider identifier is used to identify the CDN service provider, and may be, for example: cloud_ cdn _a; floating coefficient: the proportion of the floating coefficients of the data can float the pulled data, for example, the method specifically can be as follows: 1.1; the pull protocol configuration mainly describes protocol interaction information pulled by the CDN service provider, including information such as a communication protocol, a data encoding format, a field mapping, and the like, and supports JSON protocol and kafka message queue protocol of HTTP, for example, the method specifically may be:
pull protocol configuration:
pulling granularity: the temporal granularity of the pulled data, for example: 30s, 60s, etc.
The analysis configuration/real-time task generation module is used for analyzing the configuration module, generating tasks of each service and each domain name in each service provider with time granularity in real time according to the time granularity through analysis of configuration, and pushing the tasks to the KAFKA task message queue in a producer mode for processing by a downstream consumption cluster.
A generated task may be understood as a set of execution instructions at each time granularity, for example:
task 1: tx_video|closed_ cdn _a|domain name 1|pull protocol configuration 1|start UNIX timestamp|end UNIX timestamp
Task 2: tx_video|closed_ cdn _a|domain name 2|pull protocol configuration 2|start UNIX timestamp|end UNIX timestamp
As can be seen from the foregoing embodiments, the two data pulling tasks are the target type of data of the same type of traffic (tx_video) using the same CDN service provider (closed_ CDN _a) for two different CDN service subscribers (domain name 1 and domain name 2).
The module only needs to analyze the configuration information, continuously generates a first data pulling task, pushes the first data pulling task into a message queue, and realizes the decoupling of task generation and task execution through producer and consumer modes.
The task execution module is a distributed cluster multi-machine deployment module, and is specifically expressed as a task process (also called a work node) for consuming, analyzing and executing queue tasks deployed on the cluster. Each working node is mainly divided into a first state and a second state, wherein the first state is also called an idle WAITING state, the working node waits for the arrival of a task in the first state, detects a queue message, and the second state is also called a busy state, and the working node pulls data in the second state and stores the data in a database.
In some embodiments, the second state is further subdivided into:
parsing/pulling (RUNNING): task analysis and data pulling are executed;
store (store): storing execution data to a database
When the process is executed, the process is connected to the kafka message queue, when the task is generated and pushed to the message queue at the upstream, each executing process can preempt the consumption process, only one process can successfully acquire the task through the attribute characteristics of the same group of the kafka, then the process enters the RUNNING state from the WAITING state, the data format processing storage module is called to store the pulled data in a falling mode (STORING) to a database after the data is successfully pulled, and after the whole process is completed, the process is converted into WAITING and is added to the detection of the message queue task again.
Meanwhile, the task execution module performs decoupling and stateless task execution through the message queue and the upstream task module, and can conveniently realize the processing capacity of mass data through transverse capacity expansion, namely, a mode of expanding working nodes.
The data formatting processing storage module is mainly used for standardized alignment of data information such as charging data Fields pulled by different manufacturers, so as to realize unified data storage, for example, the field mapping rule of Fields record in the pull protocol configuration can be expressed as:
The field names returned by the service provider and the mapping relation actually stored in the standardized database are identified, and the unified standardized storage of the data is realized through the processing of the module.
The embodiment of the application creatively realizes the real-time pulling task instruction of each CDN server to be reported to the task message queue through the analysis of a configurable data protocol based on the distributed big data kafka message queue, and the task processing process on the distributed cluster performs real-time data pulling, processing and storing actions through the task consumption of the task in the message queue and the analysis of the task instruction, thereby realizing the standardized integration of the charging amount data of the CDN server in real time and mass.
Moreover, according to the embodiment of the application, on one hand, the access to the newly accessed charging data pulling task of the third-party CDN service business can be realized through configuration, and meanwhile, the capability of real-time pulling and processing integration of the charging data of the third-party CDN service business under mass data is realized through a mode of decoupling tasks of distributed producers and consumers.
Fig. 8 is a schematic architecture diagram of a data processing system according to this embodiment of the present application, where the data processing system includes a terminal 110, a main CDN server 120, a third party CDN server 121, a data server 122, a distributed server 123, and a database 130.
The CDN service consumer in the embodiment of the present application, that is, the purchaser of the CDN service, applies for the CDN service for the target domain name to the main CDN service provider through the preset application program installed in the terminal 110. In some embodiments, the terminal 110 may be a mobile terminal such as a smart phone, a tablet computer, a laptop portable notebook computer, or a desktop computer, a projection computer, or the like, and the type of the terminal is not limited in the embodiments of the present application.
The main CDN server 120 is a server maintained by a main CDN server, the third party CDN server 121 is a server maintained by a third party CDN server, and the data server 122 and the distributed server 123 are servers for executing the data processing method of the converged CDN according to the embodiment of the present application, where the servers may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, and basic cloud computing services such as big data and artificial intelligence platforms.
The main CDN server determines the subcontracting rule with the third party CDN server in advance, determines related information of pulling the data (for example, charging data) of the target type from the third party CDN server, generates configuration information, and stores the configuration information in the database 130, where the related information includes time granularity of pulling the data and task indication information, and the task indication information is used for indicating how to pull and what data needs to be pulled specifically, for example, including a service identifier, a domain name of a CDN service user, identification information of a corresponding CDN server, interaction protocol information for interacting with a server of the corresponding CDN server, and so on. In the embodiment of the application, the number of the third-party CDN service providers can be multiple, and in the illustration, for convenience of expression, only 1 server corresponding to the third-party CDN service provider is shown.
After the terminal 110 sends a request for purchasing the CDN service to the main CDN server 120 through the application, the main CDN server may packetize a portion of the CDN service to the third party CDN server according to the packetization rule, so that the main CDN server and the third party CDN server provide services for purchasers of the CDN service together, and the main CDN server and the third party CDN server may record charging data in real time. The primary CDN server may record configuration information for each CDN server in advance in the database 130.
The data server 122 obtains configuration information of at least one CDN server (including a main CDN server and/or a third party CDN server) from the database 130, and for each configuration information, generates a corresponding first data pulling task in real time according to relevant information in the configuration information, and pushes the first data pulling task to a preset message queue.
The working nodes in the first state in the distributed server 123 compete for the second data pulling task to be executed, the working node with successful competition updates the working state to the second state, and pulls data according to the second data pulling task to be executed, and after the working node stores the pulled data in the database 130, the working state is updated to the first state.
The primary CDN server may obtain revenue from the CDN service buyers based on the billing data in the database 130 and pay the third party CDN server.
An embodiment of the present application provides a data processing apparatus for a converged CDN, as shown in fig. 9, where the data processing apparatus for a converged CDN may include: a configuration information acquisition module 901, and a data processing module 902, wherein,
a configuration information obtaining module 901, configured to obtain configuration information of at least one CDN server, where the configuration information includes related information required for pulling data of a target type of a corresponding CDN server;
The data processing module 902 is configured to generate, for each configuration information, a corresponding first data pulling task in real time according to relevant information in the configuration information, push the first data pulling task to a preset message queue, so that the distributed system pulls data according to a second data pulling task to be executed in the message queue, and store the pulled data in the database.
The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.
The device of the embodiment of the application obtains the configuration information of at least one CDN service provider, wherein the configuration information comprises the related information required for pulling the data of the target type of the corresponding CDN service provider, and utilizes the flexibility of updating the configuration items in the configuration information, so that the device has higher flexibility compared with the mode of directly packaging scripts in the prior art, and for each configuration information, the corresponding first data pulling task is generated in real time according to the related information in the configuration information, the first data pulling task is pushed to a message queue in a producer mode, and the distributed queue consumes the tasks in the message queue in real time in a consumer mode, thereby realizing the decoupling of task production and consumption, realizing the integration of the real-time pulling and processing of the related data of each CDN service provider under massive data, being convenient for transverse expansion and better meeting the actual application requirements.
As an alternative embodiment, the configuration information further includes field mapping rules for normalizing the data of the storage target type;
the data processing module comprises:
the pushing sub-module is used for pushing the first data pulling task and the field mapping rule corresponding to the first data pulling task to a preset message queue;
and the rule storage sub-module is used for carrying out standardization processing on the data pulled according to the second data pulling task according to the field mapping rule corresponding to the second data pulling task and storing the standardized data into the database.
As an alternative embodiment, the distributed system includes a plurality of working nodes, where the working nodes of the distributed system are configured to:
when the second data pulling task to be executed exists in the message queue, competing the second data pulling task to be executed for each second data pulling task to be executed, and if the competition is successful, pulling data according to the second data pulling task.
As an alternative embodiment, the working state of any working node of the distributed system is the first state or the second state;
the working node is used for:
if the first data is in the first state, competing for a second data pulling task to be executed;
If the competition is successful, the working state is updated to a second state, and the data is pulled according to a second data pulling task to be executed;
and storing the pulled data into a database, and updating the working state into a first state.
As an alternative embodiment, the related information includes time granularity and task indication information of the pulled data;
a data processing module, comprising:
and the task generation sub-module is used for generating a first data pulling task corresponding to the time granularity in real time according to the task indication information, wherein the first data pulling task comprises a time range of data to be pulled.
As an alternative embodiment, the task instruction information further includes: at least one of the name of the service, the domain name of the CDN service user, the identification information of the corresponding CDN service provider and the interaction protocol information for interacting with the server of the corresponding CDN service provider.
As an alternative embodiment, the interaction protocol information includes: communication protocol, data coding format and interactive interface.
As an alternative embodiment, the configuration information obtaining module is specifically configured to:
responding to configuration triggering operation of configuration information of the client, and displaying a configuration interface through the client;
And receiving configuration operation aiming at the at least one item to be configured through the configuration interface, and generating configuration information based on the received configuration operation.
The embodiment of the application also provides a data processing system integrated with the CDN, as shown in fig. 10, including a data processing apparatus 1001 and a distributed system 1002, specifically,
the data processing device 1001 is configured to obtain configuration information of at least one CDN server, where the configuration information includes related information required for pulling data of a target type of a corresponding CDN server; for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, and pushing the first data pulling task to a preset message queue;
the distributed system 1002 is configured to pull data according to a second data pulling task to be performed in the message queue, and store the pulled data to the database.
The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a data processing method of a fusion CDN, and compared with the related technology, the steps of the data processing method of the fusion CDN can be realized: the configuration information of at least one CDN server is obtained, the configuration information comprises related information required for pulling the data of the target type of the corresponding CDN server, and the flexibility of configuration items in the configuration information is utilized, so that compared with a direct script packaging mode in the prior art, the configuration information has higher flexibility, corresponding first data pulling tasks are generated in real time according to the related information in the configuration information for each configuration information, the first data pulling tasks are pushed to a message queue in a producer mode, the tasks in the message queue are consumed in real time by a distributed queue in a consumer mode, the decoupling of task production and consumption is realized, the integration of real-time pulling and processing of the related data of each CDN server under massive data is realized, the transverse expansion can be conveniently carried out, and the practical application requirements are better met.
In an alternative embodiment, there is provided an electronic device, as shown in fig. 11, the electronic device 4000 shown in fig. 11 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus.
Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.
The memory 4003 is used for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to realize the steps shown in the foregoing method embodiment.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content. The computer readable storage medium obtains configuration information of at least one CDN service provider, the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider, and flexibility of configuration items in the configuration information is utilized, so that the computer readable storage medium has higher flexibility compared with a mode of directly packaging scripts in the prior art, for each configuration information, a corresponding first data pulling task is generated in real time according to the relevant information in the configuration information, the first data pulling task is pushed to a message queue in a producer mode, and a distributed queue consumes tasks in the message queue in real time in a consumer mode, so that decoupling of task production and consumption is realized, integration of real-time pulling and processing of relevant data of each CDN service provider under massive data is realized, lateral expansion can be conveniently carried out, and practical application requirements are better met.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor. The computer program product obtains the configuration information of at least one CDN service provider, the configuration information comprises the relevant information required for pulling the data of the target type of the corresponding CDN service provider, and the flexibility of updating the configuration items in the configuration information is utilized, so that the computer program product has higher flexibility compared with the mode of directly packaging scripts in the prior art, and for each configuration information, the corresponding first data pulling task is generated in real time according to the relevant information in the configuration information, the first data pulling task is pushed to a message queue in a producer mode, and the distributed queue consumes the tasks in the message queue in real time in a consumer mode, so that the decoupling of task production and consumption is realized, the integration of the real-time pulling and processing of the relevant data of each CDN service provider under the condition of massive data is realized, the transverse expansion can be conveniently carried out, and the actual application requirements are better met.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.
It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.
The foregoing is only an optional implementation manner of some implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, which also belongs to the protection scope of the embodiments of the present application.

Claims (12)

1. A data processing method for a converged content delivery network CDN, comprising:
acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises related information required for pulling data of a target type of a corresponding CDN service provider;
and for each configuration information, generating a corresponding first data pulling task in real time according to the related information in the configuration information, pushing the first data pulling task to a preset message queue, so that the distributed system pulls data according to a second data pulling task to be executed in the message queue, and storing the pulled data to a database.
2. The method of claim 1, wherein the configuration information further comprises field mapping rules for normalizing the data storing the target type;
pushing the first data pulling task to a preset message queue, including:
pushing the first data pulling task and a field mapping rule corresponding to the first data pulling task to a preset message queue;
the storing the pulled data in the database comprises the following steps:
and carrying out standardization processing on the data pulled according to the second data pulling task according to the field mapping rule corresponding to the second data pulling task, and storing the standardized data into the database.
3. The method according to claim 1 or 2, wherein the distributed system includes a plurality of working nodes, and wherein the distributed system pulls data according to a second data pulling task to be performed in the message queue, including:
when the second data pulling task to be executed exists in the message queue, for each second data pulling task to be executed, a plurality of working nodes in the distributed system compete for the second data pulling task to be executed, and the successfully competing working nodes pull data according to the second data pulling task.
4. A method according to claim 3, wherein the operational state of any operational node of the distributed system is either a first state or a second state;
the working nodes in the distributed system compete for the second data pulling task to be executed, and the successfully competing working nodes pull data according to the second data pulling task, including:
the working nodes in the first state compete for the second data pulling task to be executed in the distributed system;
the working node with successful competition updates the working state into the second state and pulls data according to the second data pulling task to be executed;
The step of storing the pulled data in a database further comprises the following steps:
and the working node with successful competition updates the working state into the first state.
5. The method of claim 1, wherein the related information includes time granularity and task indication information of pulled data;
the generating a first data pulling task in real time according to the related information in the configuration information comprises the following steps:
and generating a first data pulling task corresponding to the time granularity in real time according to the task indication information, wherein the first data pulling task comprises a time range of data to be pulled, and the time range corresponds to the time granularity.
6. The method of claim 5, wherein the task indication information further comprises: at least one of service identification information, domain name of CDN service user, identification information of corresponding CDN service provider, and interaction protocol information for interacting with server of the corresponding CDN service provider.
7. The method of claim 1 wherein the configuration information for any CDN server is obtained by:
responding to configuration triggering operation of configuration information of a client, and displaying a configuration interface through the client, wherein at least one item to be configured is displayed in the configuration interface;
And receiving configuration operation aiming at the at least one item to be configured through the configuration interface, and generating configuration information based on the received configuration operation.
8. A data processing apparatus for a converged CDN, comprising:
the configuration information acquisition module is used for acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider;
and the data processing module is used for generating corresponding first data pulling tasks in real time according to the related information in the configuration information for each configuration information, pushing the first data pulling tasks to a preset message queue, so that the distributed system pulls data according to the second data pulling tasks to be executed in the message queue, and storing the pulled data to a database.
9. A data processing system for a converged CDN, comprising:
the data processing device is used for acquiring configuration information of at least one CDN service provider, wherein the configuration information comprises relevant information required for pulling data of a target type of a corresponding CDN service provider; for each piece of configuration information, generating a corresponding first data pulling task in real time according to related information in the configuration information, and pushing the first data pulling task to a preset message queue;
And the distributed system is used for pulling data according to the second data pulling task to be executed in the message queue and storing the pulled data into a database.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the data processing method of the converged CDN of any one of claims 1 to 7.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the data processing method of a converged CDN as claimed in any one of claims 1 to 7.
12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202210369424.XA 2022-04-08 2022-04-08 CDN-fused data processing method, device and system and electronic equipment Pending CN116938926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210369424.XA CN116938926A (en) 2022-04-08 2022-04-08 CDN-fused data processing method, device and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210369424.XA CN116938926A (en) 2022-04-08 2022-04-08 CDN-fused data processing method, device and system and electronic equipment

Publications (1)

Publication Number Publication Date
CN116938926A true CN116938926A (en) 2023-10-24

Family

ID=88374391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210369424.XA Pending CN116938926A (en) 2022-04-08 2022-04-08 CDN-fused data processing method, device and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN116938926A (en)

Similar Documents

Publication Publication Date Title
CN109684358B (en) Data query method and device
CN110019240B (en) Service data interaction method, device and system
US10599478B1 (en) Automated reconfiguration of real time data stream processing
US8983987B2 (en) System and method for a service metering framework in a network environment
WO2020258290A1 (en) Log data collection method, log data collection apparatus, storage medium and log data collection system
CN108600300B (en) Log data processing method and device
US20140165119A1 (en) Offline download method, multimedia file download method and system thereof
CN110908658A (en) Micro-service and micro-application system, data processing method and device
CN111459986B (en) Data computing system and method
CN105338061A (en) Lightweight message oriented middleware realization method and system
US10897500B2 (en) Synchronizing a device using push notifications
CN110555028A (en) data display method and device
CN110740160B (en) Multi-source data map gridding and data state real-time pushing system
CN111258978B (en) Data storage method
CN107888666A (en) A kind of cross-region data-storage system and method for data synchronization and device
CN111352903A (en) Log management platform, log management method, medium, and electronic device
CN110769018A (en) Message pushing method and device
US10489179B1 (en) Virtual machine instance data aggregation based on work definition metadata
CN115185705A (en) Message notification method, device, medium and equipment
CN116627333A (en) Log caching method and device, electronic equipment and computer readable storage medium
US10691653B1 (en) Intelligent data backfill and migration operations utilizing event processing architecture
CN114153609A (en) Resource control method and device, electronic equipment and computer readable storage medium
CN114615096A (en) Telecommunication charging method, system and related equipment based on event-driven architecture
WO2024082770A1 (en) Video transcoding method and apparatus, and device, storage medium and video on-demand system
CN104168174A (en) Method and apparatus for information transmission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination