CN110795677A - CDN node distribution method and device - Google Patents
CDN node distribution method and device Download PDFInfo
- Publication number
- CN110795677A CN110795677A CN201911099119.8A CN201911099119A CN110795677A CN 110795677 A CN110795677 A CN 110795677A CN 201911099119 A CN201911099119 A CN 201911099119A CN 110795677 A CN110795677 A CN 110795677A
- Authority
- CN
- China
- Prior art keywords
- website
- target
- keywords
- hit
- target website
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000009193 crawling Effects 0.000 claims abstract description 19
- 230000007123 defense Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 238000012423 maintenance Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 208000001613 Gambling Diseases 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Library & Information Science (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The application provides a CDN node distribution method and a CDN node distribution device, wherein the method comprises the following steps: crawling target content of a target website through a crawler program; matching the target content with keywords in a pre-established first word stock, determining keywords hit by the target content, and determining the danger level of the target website according to the hit keywords; and distributing CDN nodes matched with the danger level for the target website. The method and the system can be used for classifying massive websites in the platform through the danger levels, so that the websites with different danger levels are respectively distributed to different nodes, and therefore when the websites with high danger levels are attacked, the risk that innocent websites are affected can be reduced, and the service stability is improved.
Description
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for allocating CDN nodes.
Background
The current popular SaaS service mode supports the user to independently add domain names on a cloud platform provided by a CDN service provider, and achieves the purpose of quick access by modifying DNS analysis. In the prior art, a cloud platform of a CDN service provider generally allocates CDN nodes to a website of a user at random automatically, or manually determines the website and then allocates the nodes to the website. Random distribution causes different types and risks of websites served by each node cluster, stable service is difficult to carry out, manual judgment consumes long time, complete detection is not practical under the condition of large number of websites, a large amount of time and energy are consumed, and labor cost is increased.
Disclosure of Invention
An object of the present embodiment is to provide a method and an apparatus for allocating CDN nodes, so as to solve the foregoing technical problem.
In a first aspect, an embodiment of the present application provides a CDN node allocation method, including: crawling target content of a target website through a crawler program; matching the target content with keywords in a pre-established first word stock, determining keywords hit by the target content, and determining the danger level of the target website according to the hit keywords; and distributing CDN nodes matched with the danger level for the target website.
According to the scheme, the distribution precision of massive website node resources in the platform can be improved, websites which are likely to be attacked are accurately divided in limited node resources as much as possible, the probability of verifying the websites when the websites are attacked is improved, meanwhile, the influence on other low-risk websites is avoided, in addition, the investment cost of maintenance personnel is effectively reduced, and the detection efficiency is improved.
Optionally, each keyword in the first thesaurus is pre-configured with a preset risk level, and determining the risk level of the target website according to the hit keyword includes: and under the condition that the hit keywords are multiple, determining the highest risk level in preset risk levels corresponding to the multiple hit keywords, and taking the highest risk level as the risk level of the target website.
When the target content hits a plurality of keywords at the same time, the highest risk level in the keywords is taken as the risk level of the target website, so as to ensure the service quality of the website.
Optionally, each CDN node is preconfigured with a risk level adapted to the defense capability, and allocating a CDN node matched with the risk level to the target website includes: determining a target CDN node with the same risk level as the target website; and resolving the access request of the user to the target website to the IP of the target CDN node during domain name resolution.
The risk level of the CDN node is adapted to the defense capability of the CDN node, that is, the stronger the defense capability is, the higher the risk level is, for example, a node configured as a high risk may be allocated to a high risk website and may be used to resist an attack that may occur in the high risk website.
Optionally, each keyword in the first thesaurus is preconfigured with a preset industry type, and after determining the keyword hit by the target content, the method further includes: obtaining the number of times of hits corresponding to each keyword in the hit keywords; determining the keywords with the largest hit times, and taking the preset industry type corresponding to the keywords with the largest hit times as the industry type of the target website; distributing CDN nodes matched with the danger levels for the target website, wherein the CDN nodes comprise: and distributing CDN nodes matched with the danger level and the industry type of the target website for the target website.
After the industry classification is carried out on massive websites in the platform, the websites which are attacked can be positioned more quickly and more finely, and meanwhile, the nodes of the same node cluster are conveniently distributed for the websites of the same industry type, so that the standard use of the nodes is formed.
Optionally, crawling the target content of the target website by using a crawler program includes: inquiring whether the domain name is recorded or not from a third party record database by utilizing the domain name of the target website; and under the condition that the domain name is already recorded, crawling a website title of a target website and target content of a website home page through a crawler program.
The record condition can indicate the real legality of the website. Under the condition that the domain name of the target website is already recorded, the domain name is represented as a legal domain name, and the content of the target website can be crawled.
Optionally, the method further includes: after target content is obtained, matching the target content with violation keywords in a violation word library established in advance; and under the condition that any illegal keyword is not hit in the target content, matching the target content with a keyword in a pre-established first word stock.
The violation word bank comprises conventional illegal website keywords dynamically maintained and collected by the cloud platform in the long-term operation process, such as: play, pornography, gambling, entertainment, etc. When any violation keyword is missed in the target website, the website is indicated as a non-violation website, and then the website can be served.
In a second aspect, an embodiment of the present application provides an allocation apparatus for CDN nodes, including: the data crawling module is used for crawling the target content of the target website through a crawler program; the level determining module is used for matching the target content with keywords in a pre-established first word bank, determining keywords hit by the target content, and determining the danger level of the target website according to the hit keywords; and the node distribution module is used for distributing CDN nodes matched with the danger level for the target website.
Optionally, each keyword in the first thesaurus is preconfigured with a preset industry type, and the apparatus further includes: an industry determination module; the industry determining module is used for acquiring the number of times of hits corresponding to each keyword in the hit keywords; determining the keywords with the largest hit times, and taking the preset industry type corresponding to the keywords with the largest hit times as the industry type of the target website; the node allocation module is specifically configured to: and distributing CDN nodes matched with the danger level and the industry type of the target website for the target website.
In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to the first aspect is performed.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a CDN node allocation method provided in an embodiment of the present application;
fig. 2 is a detailed schematic diagram of an allocation method of CDN nodes provided in the embodiment of the present application when applied;
fig. 3 is a schematic diagram of a distribution device of CDN nodes according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
A Content Delivery Network (CDN) is an intelligent virtual Network built on the basis of the existing Network, and a user can obtain required Content nearby by using functional modules of load balancing, Content Delivery, scheduling and the like of a central platform by means of edge servers deployed in various places, so that Network congestion is reduced, and the access response speed and hit rate of the user are increased.
The embodiment of the application provides a CDN node allocation method, which can be applied to a server of a CDN service provider, wherein a cloud platform runs on the server, a user can be supported to independently add a domain name on the cloud platform, and quick access of a user website is realized by modifying DNS resolution. The method can automatically and intelligently classify massive websites rapidly, and different nodes are distributed to the websites according to classification results, so that the service quality of the nodes and the stability of a platform are improved. Referring to fig. 1, the method includes the following steps:
s101: and crawling the target content of the target website through a crawler program.
The target content is the content crawled by a pre-designated crawler program on a target website. In a particular embodiment, the targeted content includes the title of the website and the text content of the home page of the website. The title of a website is a high summary of a webpage, generally, the title of a first page of the website is a formal name of the website, the title of a chapter content page in the website is a title of an article, and the title of a first page of a column is also a name of the column. The home page of a website is a portal page of the website, and is often edited to easily understand the website and guide the user to browse the contents of other parts of the website, and the home page contents are generally regarded as contents of a directory nature. Thus, the content of the website title and the website home page can indicate the specific type of the website. Of course, in addition to the website title and the website home page, the target content may also be content in a history snapshot of the target website, or link content of some of the pages jumping to other websites (for example, a link in a website that is blacked out will jump to an illegal website).
S102: and matching the target content with keywords in a pre-established first word bank to determine the keywords hit by the target content.
S103: and determining the danger level of the target website according to the hit keywords.
The first word stock is a pre-established industry word stock, and each keyword in the word stock is pre-configured with a preset danger level. The danger level can be divided into three levels according to low danger, medium danger and high danger, or divided into three levels according to other modes. The risk level of the target website indicates the probability that the website may be attacked by DDoS and the like. The keywords in the first thesaurus may be updated periodically.
DDoS attacks can cause a lot of websites to be attacked at the same time, and the targets of the attacks cannot be used normally, so that not only the normal use of the user websites is influenced, but also huge economic losses of the websites are caused. For example, in the financial industry, the probability of DDoS attack is high. Thus, if a web site hits a keyword that is related to finance, such a web site should have a higher risk level.
In a specific implementation process, when a keyword is hit in target content, taking a preset danger level corresponding to the keyword as a danger level of a target website; when the target content simultaneously hits a plurality of keywords, determining the highest risk level in preset risk levels corresponding to the plurality of hit keywords, and taking the highest risk level in the keywords as the risk level of the target website.
S104: and distributing CDN nodes matched with the danger level for the target website.
The risk level of the CDN node is adapted to the defense capability of the CDN node, that is, the stronger the defense capability is, the higher the risk level is, for example, a node configured as a high risk may be allocated to a high risk website and may be used to resist an attack that may occur in the high risk website.
When a target website hits a keyword with a low risk level, the target website is considered as a low risk website, for example, websites of news media and government agencies can be considered as very low risk, and therefore, CDN nodes with low risk levels are allocated to the websites. The low-risk CDN nodes may be nodes without D-resistant capability or with poor D-resistant capability, that is, the capability of resisting DDoS attack is not high, and since such low-risk websites hardly suffer DDoS attack, the speed and number of CDN nodes providing services can be preferentially guaranteed without requiring the defense capability of the nodes. For another example, websites in industries such as games, live video, investment and financing can be regarded as very high-risk websites belonging to high-risk industry websites that are very vulnerable to DDoS attacks, so that nodes with D-resistant capability can be allocated to the high-risk websites to resist the DDoS attacks that are very likely to occur, and the service quality of the nodes is improved.
In a specific implementation process, after determining the risk level of the target website, the server matches the target website with the target CDN nodes having the same risk level, and automatically adjusts DNS resolution of the target website according to a matching result. After the adjustment, the IP of the DNS resolution is changed, that is, the server resolves the access request of the user to the target website to the IP of the target CDN node when performing the DNS resolution, thereby implementing node allocation.
Optionally, each keyword in the first thesaurus is also preconfigured with a preset industry type, where the preset industry type may include music, movie, forum community, government agency, news media, and the like. Under the condition that the target content only hits one keyword in the first word bank, taking a preset industry type corresponding to the keyword as the industry type of the target website; under the condition that the target content simultaneously hits a plurality of keywords, the hit times corresponding to each keyword in the hit keywords are obtained, the keyword with the largest hit times is determined, and the preset industry type corresponding to the keyword with the largest hit times is used as the industry type of the target website. For example, the target content simultaneously hits three keywords, wherein "song" hits 1 time, "reporter" hits 3 times, and "movie" hits 1 time, then the "reporter" hits the highest number, since "reporter" is configured as a news industry, i.e., the industry type of the target website is determined as the news industry.
After the industry type of the target website is determined, the server allocates CDN nodes which have the same danger level as the target website and belong to the same industry type to the target website, wherein the danger level and the industry type of each CDN node are configured in advance by cloud platform operation and maintenance personnel, so that the websites can be accurately associated with the nodes after being classified.
It should be noted that, in the case of multiple keywords being hit, if the number of hits of at least two keywords is the same, and the number of hits is the largest, or if the number of hits of each keyword being hit is the same, the industry type of the target website cannot be accurately determined, then the determination of the industry type of the website may be considered to be abandoned, that is, only the node allocation may be performed according to the determined risk level of the website.
According to the scheme, the distribution precision of massive website node resources in the platform can be improved, websites which are likely to suffer from DDoS attack are accurately divided in limited node resources as much as possible, the possibility of verifying the websites when the attacks occur is improved, and meanwhile, the influence on other low-risk websites is avoided.
In an architecture of the CDN, a node cluster is formed by a plurality of nodes, and the plurality of nodes share resources with each other in the same node cluster, for example, five nodes form a node cluster. When the target website is initially accessed into the platform, the target website is allocated to a default node cluster, and all websites newly accessed into the platform are allocated to the node cluster. And then, a server where the cloud platform is located crawls target content on the target website, if the target content hits the keywords in the first word bank, the danger level and the industry type of the target content are obtained, the target website is automatically adjusted from the default node cluster to the corresponding matched node cluster, and the CDN node in the adjusted node cluster provides service for the target website. Then, the user's request for the target website will also be located to the adjusted CDN node.
In this embodiment, each node cluster is divided according to different industries and risk levels, and the industry types and risk levels of different node clusters may be different. For example, if a node cluster is configured as a news media class, the node cluster is assigned to websites belonging to the news media industry, so that websites of the same industry type can use the same cluster conveniently, and further, the node can be used as a standard. The number of CDN nodes in each node cluster can be about three to five, and the industry types and the danger levels corresponding to the nodes in the same node cluster are the same.
In the prior art, websites served by each node cluster may have different types and risks, and it is difficult to perform stable service and fast positioning of attacks. After the massive websites in the platform are classified, only a few websites may be allocated in the same node cluster, so that the websites which are attacked can be located more quickly and more finely, and the probability of being verified is improved. Meanwhile, after the websites are classified according to the danger levels, the high-risk websites and the low-risk websites are respectively distributed to different node clusters, so that when the high-risk websites are attacked, the risk that innocent websites are affected can be reduced, and the service stability of the platform is improved.
Optionally, step S101 specifically includes: inquiring whether the domain name is recorded or not from a third party record database by utilizing the domain name of the target website; and under the condition that the domain name is already filed, crawling a website title of the target website and target content of a website home page through a crawler program.
The third party filing database comprises a work and correspondence department ICP filing database. Any website provides internet information service, and should be recorded according to law, and the recording condition can indicate the real legality of the website. Stopping the service of the target website under the condition that the domain name of the target website is not recorded; under the condition that the domain name of the target website is already recorded, the domain name is represented as a legal domain name, content crawling can be performed on the target website, and distribution of CDN nodes is provided for the target website.
Further, after step S101, that is, crawling to obtain the target content of the target website, the method further includes the following steps: and matching the target content with the violation keywords in the preset violation word library. The violation word bank comprises conventional illegal website keywords which are dynamically maintained and collected by the cloud platform in a long-term operation process, for example: play, pornography, gambling, entertainment, etc. When any violation keyword is hit on a target website, indicating that the website is a violation website, quickly marking the violation of the website, and stopping the service of the website; in the case where the target website does not hit any of the violation keywords, step S102 is performed.
In this embodiment, the server is provided with an illegal lexicon (the lexicon is gradually formed by years of data accumulation experience of the platform and cooperation of a network supervision operator, is manually maintained, and can be changed and edited according to types of events needing attention at different periods) and an industry lexicon, so that the detection automation of a large number of websites in the cloud platform can be realized. The platform violation word stock and the industry word stock can be dynamically updated and maintained according to different social facts, political forms and the like, the server regularly and repeatedly crawls the title and the home page content of a target website according to preset scanning frequency, websites which hit any violation keyword in the violation word stock are marked and stop serving, and the rest non-violation websites rapidly distribute pre-configured node clusters according to industry types and danger levels. And after marking the violation website, submitting the violation website to an operation and maintenance worker for further checking, and if the operation and maintenance worker judges that the website is a non-violation website, canceling the marking of the website and processing the website according to the non-violation website.
In one embodiment, referring to fig. 2, a detailed step of the method when applied includes:
s201: a user adds a primary domain name and a secondary domain name needing service in the cloud platform, and modifies DNS resolution. When the DNS modification is successful, the process goes to S202.
After the user modifies the DNS analysis of the website, the server is triggered to perform DNS detection on the website, if the DNS modification of the user is determined to be successful, the step is switched to S202 to perform record detection, and if the DNS modification is not successful, the access to the website is suspended.
S202: and the server of the cloud platform inquires the ICP filing database of the Ministry of industry and communications whether the domain name of the website is filed or not. If the domain name is already filed, the process goes to S203, and if the domain name is not filed, the service to the website is directly stopped.
S203: the server crawls the title of the website and the target content on the home page through a crawler program.
S204: and the server carries out violation matching on the target content and the keywords in the violation word bank. And (4) for the illegal website, turning to S205 for processing, and for the non-illegal website, turning to S206 for processing.
If any illegal keyword in the illegal word bank is hit, the website is judged to be an illegal website, and the website which does not hit any keyword in the illegal word bank is a non-illegal website.
S205: and the server marks the domain name of the website for quick violation, stops the service of the website and pushes the marked domain name to operation and maintenance personnel.
The operation and maintenance personnel can further verify the pushed illegal website, and if the operation and maintenance personnel confirm that the website is actually an illegal website, the operation and maintenance personnel can cancel the marking of the domain name, restart the service of the website, and go to S206 for processing.
S206: the server matches the target content of the website with the keywords in the first word bank, and determines the danger level and the industry type of the website according to the hit keywords.
S207: and the server distributes CDN nodes with the same danger level and belonging to the same industry type for the website.
And comparing the nodes in the preset cluster according to the danger level and the industry type determined in the matching process by the non-violation website, and directly distributing, so that the node resources can be used.
Because a large CDN service provider may receive access of hundreds of new domain names every day, and websites enjoying platform services may have website content change and other phenomena, in order to comply with relevant national legal policy regulations, the platform regularly checks the website content and checks violation information in time, and the platform can timely and sensitively acquire websites accessed to the platform, quickly classify massive websites and distribute different node clusters, ensure service quality and platform stability, and is also beneficial to quickly positioning DDoS attack targets.
Furthermore, the embodiment can screen the compliance and the content of the website newly accessed to the platform, filter the website through ICP filing and an illegal word bank, stop the service of the illegal website in time, classify the rest illegal websites according to industries and risks, rapidly allocate different nodes, form a plurality of different website industry clusters, fully utilize the service capacity of different nodes, reduce the attack verification time, and further realize rapid allocation of the platform nodes more efficiently and scientifically.
The CDN node distribution method provided by the embodiment of the application can improve the website content detection processing flow, can obviously improve the flexibility of illegal website detection through long-time running data sample accumulation, can realize scientific and effective automatic node resource distribution through an industry type and risk level matching comparison method, and enables different types of websites to distribute node clusters of different service types, for example, a certain industry with higher risk obtained according to platform long-term operation experience can distribute nodes with anti-D capability in advance. The method can be suitable for application scenes of mass website timing detection content, is convenient for CDN service providers to carry out self-detection on all domain names of the platform, dynamically updates used resources of the website in time, improves service stability and DDoS (distributed denial of service) verification probability, reduces maintenance personnel input cost and improves detection efficiency.
Based on the same inventive concept, an embodiment of the present application further provides a CDN node allocation apparatus, please refer to fig. 3, where the apparatus includes:
the data crawling module 301 is used for crawling target content of the target website through a crawler program;
a level determining module 302, configured to match the target content with a keyword in a pre-established first lexicon, determine a keyword hit by the target content, and determine a risk level of the target website according to the hit keyword;
a node allocating module 303, configured to allocate a CDN node matched with the risk level to the target website.
Optionally, each keyword in the first thesaurus is preconfigured with a preset risk level, and the level determining module 302 is specifically configured to: and under the condition that the hit keywords are multiple, determining the highest risk level in preset risk levels corresponding to the multiple hit keywords, and taking the highest risk level as the risk level of the target website.
Optionally, each CDN node is preconfigured with a risk level adapted to the defense capability, and the node allocation module 303 is specifically configured to: determining a target CDN node with the same risk level as the target website; and resolving the access request of the user to the target website to the IP of the target CDN node during domain name resolution.
Optionally, each keyword in the first thesaurus is preconfigured with a preset industry type, and the apparatus further includes: an industry determination module; the industry determining module is used for acquiring the hit times corresponding to each keyword in the hit keywords; determining the keywords with the largest hit times, and taking the preset industry type corresponding to the keywords with the largest hit times as the industry type of the target website; the node allocating module 303 is specifically configured to: and distributing CDN nodes matched with the danger level and the industry type of the target website for the target website.
Optionally, the data crawling module 301 is specifically configured to: inquiring whether the domain name is recorded or not from a third party record database by utilizing the domain name of the target website; and under the condition that the domain name is already recorded, crawling a website title of a target website and target content of a website home page through a crawler program.
Optionally, the apparatus further comprises: the violation matching module is used for matching the target content with violation keywords in a violation word library established in advance after the target content is obtained; when the target content does not hit any illegal keyword, the level determination module 302 matches the target content with a keyword in a first word stock established in advance.
The implementation principle and the generated technical effect of the allocation apparatus for CDN nodes provided in the embodiment of the present application are the same as those of the foregoing method embodiment, and for brief description, corresponding contents in the foregoing method embodiment may be referred to where no embodiment is mentioned in the apparatus embodiment, and are not described herein again.
The embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the allocation method for CDN nodes provided in the foregoing embodiment of the present application is executed.
The embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor, so as to implement the CDN node allocation method provided in the foregoing embodiment. The electronic device may further comprise a communication bus, wherein the processor and the memory communicate with each other via the communication bus. The memory may include high-speed random access memory (as a cache) and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. A communication bus is a circuit that connects the described elements and enables transmission between the elements. For example, the processor receives commands from other elements through the communication bus, decodes the received commands, and performs calculations or data processing according to the decoded commands.
The electronic device can be a server described in the embodiment of the method, a cloud platform runs on the electronic device, a user can be supported to add a domain name on the cloud platform independently, and quick access of a website is achieved by modifying DNS resolution. The electronic equipment can classify massive websites on the cloud platform through the danger levels, different nodes are distributed to the websites according to the classification results, and the service quality of the nodes and the stability of the platform are improved.
The electronic device may include, but is not limited to, a desktop computer, a server, a cluster of servers, and the like having data computing capabilities.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A CDN node allocation method is characterized by comprising the following steps:
crawling target content of a target website through a crawler program;
matching the target content with keywords in a pre-established first word stock, determining keywords hit by the target content, and determining the danger level of the target website according to the hit keywords;
and distributing CDN nodes matched with the danger level for the target website.
2. The method of claim 1, wherein each keyword in the first thesaurus is pre-configured with a preset risk level, and wherein determining the risk level of the target website according to the hit keyword comprises:
and under the condition that the hit keywords are multiple, determining the highest risk level in preset risk levels corresponding to the multiple hit keywords, and taking the highest risk level as the risk level of the target website.
3. The method of claim 1, wherein each CDN node is preconfigured with a risk level that is adapted to defense capabilities, and wherein assigning the target website a CDN node that matches the risk level comprises:
determining a target CDN node with the same risk level as the target website;
and resolving the access request of the user to the target website to the IP of the target CDN node during domain name resolution.
4. The method of claim 1, wherein each keyword in the first thesaurus is pre-configured with a predetermined industry type, and after determining the keyword hit by the target content, the method further comprises:
obtaining the number of times of hits corresponding to each keyword in the hit keywords;
determining the keywords with the largest hit times, and taking the preset industry type corresponding to the keywords with the largest hit times as the industry type of the target website;
distributing CDN nodes matched with the danger levels for the target website, wherein the CDN nodes comprise: and distributing CDN nodes matched with the danger level and the industry type of the target website for the target website.
5. The method of claim 1, wherein crawling targeted content of targeted websites through a crawler program comprises:
inquiring whether the domain name is recorded or not from a third party record database by utilizing the domain name of the target website; and under the condition that the domain name is already recorded, crawling a website title of a target website and target content of a website home page through a crawler program.
6. The method according to any one of claims 1-5, further comprising:
after target content is obtained, matching the target content with violation keywords in a violation word library established in advance;
and under the condition that any illegal keyword is not hit in the target content, matching the target content with a keyword in a pre-established first word stock.
7. An allocation apparatus for a CDN node, comprising:
the data crawling module is used for crawling the target content of the target website through a crawler program;
the level determining module is used for matching the target content with keywords in a pre-established first word bank, determining keywords hit by the target content, and determining the danger level of the target website according to the hit keywords;
and the node distribution module is used for distributing CDN nodes matched with the danger level for the target website.
8. The apparatus of claim 7, wherein each keyword in the first thesaurus is pre-configured with a predetermined industry type, the apparatus further comprising: an industry determination module;
the industry determining module is used for acquiring the number of times of hits corresponding to each keyword in the hit keywords; determining the keywords with the largest hit times, and taking the preset industry type corresponding to the keywords with the largest hit times as the industry type of the target website;
the node allocation module is specifically configured to: and distributing CDN nodes matched with the danger level and the industry type of the target website for the target website.
9. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method according to any one of claims 1-6.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911099119.8A CN110795677A (en) | 2019-11-12 | 2019-11-12 | CDN node distribution method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911099119.8A CN110795677A (en) | 2019-11-12 | 2019-11-12 | CDN node distribution method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110795677A true CN110795677A (en) | 2020-02-14 |
Family
ID=69443988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911099119.8A Pending CN110795677A (en) | 2019-11-12 | 2019-11-12 | CDN node distribution method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110795677A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131507A (en) * | 2020-09-25 | 2020-12-25 | 成都知道创宇信息技术有限公司 | Website content processing method, device, server and computer-readable storage medium |
CN113051372A (en) * | 2021-04-12 | 2021-06-29 | 平安国际智慧城市科技股份有限公司 | Material data processing method and device, computer equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253943A (en) * | 2010-05-21 | 2011-11-23 | 卓望数码技术(深圳)有限公司 | Webpage rating method and webpage rating system |
CN102487397A (en) * | 2010-12-02 | 2012-06-06 | 中兴通讯股份有限公司 | Method and node for storing and routing data on basis of node bottom layer security level |
CN102932380A (en) * | 2012-11-30 | 2013-02-13 | 网宿科技股份有限公司 | Distributed method and distributed system for preventing malicious attacks based on content distribution network |
CN107277160A (en) * | 2017-07-12 | 2017-10-20 | 北京潘达互娱科技有限公司 | A kind of content delivery network node switching method and device |
CN107707515A (en) * | 2017-02-15 | 2018-02-16 | 贵州白山云科技有限公司 | A kind of method and device that Intelligent Hybrid acceleration is carried out to different safety class resource |
CN108683685A (en) * | 2018-06-19 | 2018-10-19 | 三江学院 | A kind of cloud security CDN system and monitoring method for XSS attack |
CN109104445A (en) * | 2018-11-05 | 2018-12-28 | 北京京东尚科信息技术有限公司 | The anti-attack method and system of operation system based on block chain |
-
2019
- 2019-11-12 CN CN201911099119.8A patent/CN110795677A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253943A (en) * | 2010-05-21 | 2011-11-23 | 卓望数码技术(深圳)有限公司 | Webpage rating method and webpage rating system |
CN102487397A (en) * | 2010-12-02 | 2012-06-06 | 中兴通讯股份有限公司 | Method and node for storing and routing data on basis of node bottom layer security level |
CN102932380A (en) * | 2012-11-30 | 2013-02-13 | 网宿科技股份有限公司 | Distributed method and distributed system for preventing malicious attacks based on content distribution network |
CN107707515A (en) * | 2017-02-15 | 2018-02-16 | 贵州白山云科技有限公司 | A kind of method and device that Intelligent Hybrid acceleration is carried out to different safety class resource |
CN107277160A (en) * | 2017-07-12 | 2017-10-20 | 北京潘达互娱科技有限公司 | A kind of content delivery network node switching method and device |
CN108683685A (en) * | 2018-06-19 | 2018-10-19 | 三江学院 | A kind of cloud security CDN system and monitoring method for XSS attack |
CN109104445A (en) * | 2018-11-05 | 2018-12-28 | 北京京东尚科信息技术有限公司 | The anti-attack method and system of operation system based on block chain |
Non-Patent Citations (2)
Title |
---|
ADMIN: "高防CDN好在哪,高防CDN适用行业", 《HTTPS://WWW.IDCBEST.COM/IDCNEWS/11003272.HTML》 * |
DNS智能解析专家: "什么是高防CDN?", 《HTTPS://WWW.DNS.COM/SUPPORTS/1049.HTML》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131507A (en) * | 2020-09-25 | 2020-12-25 | 成都知道创宇信息技术有限公司 | Website content processing method, device, server and computer-readable storage medium |
CN113051372A (en) * | 2021-04-12 | 2021-06-29 | 平安国际智慧城市科技股份有限公司 | Material data processing method and device, computer equipment and storage medium |
CN113051372B (en) * | 2021-04-12 | 2024-05-07 | 平安国际智慧城市科技股份有限公司 | Material data processing method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4270875A2 (en) | Security weakness and infiltration detection and repair in obfuscated website content | |
CN110198313B (en) | Method and device for generating strategy | |
US10277619B1 (en) | System and methods of identifying system vulnerabilities | |
CN107239701B (en) | Method and device for identifying malicious website | |
CN109088909B (en) | Service gray level publishing method and device based on merchant type | |
CN102741845A (en) | URL reputation system | |
CN105635126A (en) | Malicious URL access protection method, client side, security server and system | |
US10805271B2 (en) | Method and system for intrusion detection and prevention | |
CN106899549B (en) | Network security detection method and device | |
CN109359263B (en) | User behavior feature extraction method and system | |
CN112131507A (en) | Website content processing method, device, server and computer-readable storage medium | |
CN111368227B (en) | URL processing method and device | |
US10931688B2 (en) | Malicious website discovery using web analytics identifiers | |
CN110795677A (en) | CDN node distribution method and device | |
US10897483B2 (en) | Intrusion detection system for automated determination of IP addresses | |
US9438610B2 (en) | Anti-tampering server | |
CN117892348A (en) | Management method and device for application program interface assets and electronic equipment | |
US11811587B1 (en) | Generating incident response action flows using anonymized action implementation data | |
CN110266719B (en) | Security policy issuing method, device, equipment and medium | |
KR101717063B1 (en) | Web crawling apparatus and method | |
CN111107170B (en) | DNS system and management method thereof | |
CN112491939B (en) | Multimedia resource scheduling method and system | |
US10936488B1 (en) | Incident response in an information technology environment using cached data from external services | |
Skrzewski | About the efficiency of malware monitoring via server-side honeypots | |
US20220405244A1 (en) | Batch deletion method and apparatus for cache contents, device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200214 |
|
RJ01 | Rejection of invention patent application after publication |