CN110460592A - URL analysis method, device, equipment and medium - Google Patents

URL analysis method, device, equipment and medium Download PDF

Info

Publication number
CN110460592A
CN110460592A CN201910687531.5A CN201910687531A CN110460592A CN 110460592 A CN110460592 A CN 110460592A CN 201910687531 A CN201910687531 A CN 201910687531A CN 110460592 A CN110460592 A CN 110460592A
Authority
CN
China
Prior art keywords
url
behavior
library
keyword
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910687531.5A
Other languages
Chinese (zh)
Other versions
CN110460592B (en
Inventor
李中帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGTONG TIANXIA NETWORK TECHNOLOGY Co.,Ltd.
Original Assignee
Hangzhou Jixun Huitong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jixun Huitong Technology Co Ltd filed Critical Hangzhou Jixun Huitong Technology Co Ltd
Priority to CN201910687531.5A priority Critical patent/CN110460592B/en
Publication of CN110460592A publication Critical patent/CN110460592A/en
Application granted granted Critical
Publication of CN110460592B publication Critical patent/CN110460592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of URL analysis methods, are related to technical field of network security, for solving the problems, such as existing URL behavioural analysis inaccuracy, method includes the following steps: receiving url data;Filtering has the url data of threat, obtains safe url data;It is matched according to the safe url data with the known URL in behavior library: when the safe url data successful match, obtaining behavior record and be stored in behavior record library;The safe url data to the URL that it fails to match, carries out key word analysis, analysis result is stored in behavior record library as the behavior record when it fails to match;According to analysis result regeneration behavior library.The invention also discloses a kind of URL analytical equipment, electronic equipment and computer storage mediums.The present invention is compared by keyword by being filtered to threat URL and carries out behavioural analysis, and real-time update behavior library, and then improve the accuracy of behavioural analysis.

Description

URL analysis method, device, equipment and medium
Technical field
The present invention relates to technical field of network security more particularly to a kind of URL analysis method, device, equipment and media.
Background technique
Traditional gateway can provide function of surfing the Net, but its website that can not be accessed user is detected, and safety is Number is lower, and as amount of access increases, user is frequently encountered the dangerous websites such as fishing website, website comprising trojan horse, It may under attack or virus infection when access.Therefore, there is the intelligent gateway with firewall functionality, it both will not shadow The operational efficiency for ringing user's smart machine can also carry out security protection to the equipment for accessing same gateway simultaneously.This intelligent network It closes while providing the user with function of surfing the Net, also will record the footprint of user's online, URL is exactly one of them, can be passed through The url data for accessing website to user detects, and then to recording there may be threat, reminds these websites of user There are threats, or close to the website containing wooden horse, forbid accessing, and attack user largely from threatening. In addition, also having intelligent gateway by analyzing url data, so that user be allow to check its internet behavior.
But when current intelligent gateway URL analysis, threat detection and internet behavior are implemented separately, or Person only has one of function, carries out that also the url data of threat, high risk can be analyzed simultaneously when internet behavior analysis, such as fish Fishnet station, webpage with wooden horse etc., these websites also carry out internet behavior analysis as other security websites, many times These websites have been intercepted, and there is no practical browsings by user, alternatively, the content in these danger URL is also used to build up user Behavior database causes the analysis result inaccuracy of internet behavior;In addition, behavior library need manually be updated, efficiency compared with It is low, for the URL being not present in behavior library, it is difficult to carry out accurate judgement.
Summary of the invention
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide a kind of URL analysis method, pass through Threat analysis is first carried out, then carries out behavioural analysis, and by key word analysis, and then obtains accurate URL behavioural analysis result.
An object of the present invention is implemented with the following technical solutions:
A kind of URL analysis method, comprising the following steps:
Url data is received, the url data is stored in url database;
It is matched according to the url data with library is threatened, filtering has the url data of threat, obtains safe URL number According to threat record deposit threatens record storehouse;
It is matched according to the safe url data with the known URL in behavior library:
When the safe url data successful match, obtains behavior record and be stored in behavior record library, the behavior record is The corresponding behavior classification of safe URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, Key word analysis is carried out according to the keyword in the behavior library, analysis result is stored in behavior record as the behavior record Library;
According to target keyword, unknown URL and its corresponding behavior classification regeneration behavior library in the analysis result.
Further, it is updated according to target keyword, unknown URL and its corresponding behavior classification in the analysis result Behavior library, comprising the following steps:
Behavior library is added in the target keyword and its frequency;
According to updated keyword, each keyword weight is recalculated, the behavior library is updated according to weight.
Further, the unknown URL webpage further includes the URL that crawler crawls at random.
Further, the behavior library includes the library URL and keywords database, and the library URL includes known URL and its corresponding Behavior classification, the keywords database are the corresponding keyword of behavior classification, and the keywords database is divided into according to weight: being higher than default The judgement keywords database of weight and other keywords databases lower than default weight, other described keywords databases are according to weight by height Keywords database and non-judgement keywords database are not determined to low be also divided into, and the judgement keywords database has behavior classification corresponding one Group or multiple groups determine keyword and frequency, and the key word analysis is obtained described by being matched with judgement keywords database Analysis result.
Further, the key word analysis obtains the analysis knot by being matched with judgement keywords database Fruit, comprising the following steps:
Behavior classification is arbitrarily chosen, obtains that behavior classification is corresponding to be determined keyword and described determine keyword Frequency is denoted as the first frequency, constructs the first array with the first frequency;
It counts weight in the unknown URL and is higher than the target keyword of default weight and the frequency of the target keyword, It is denoted as the second frequency, the second array is constructed with the second frequency;
First array and the second array are subjected to similarity comparison, obtain the phase of the unknown URL with the behavior classification Like angle value;
According to this, the similarity for calculating unknown URL Yu all behavior classifications obtains the analysis as a result, the analysis result For with the maximum behavior classification of unknown url data similarity.
Further, the frequency of the highest keyword of weight and the keyword in the unknown URL is counted, including following Step:
Crawl the webpage of the unknown URL;
The content of the webpage is segmented, all keywords in the webpage are obtained;
Calculate the weight of all keywords;
The keyword for being higher than default weight is filtered out, the target keyword is obtained.
Further, it by the data-pushing in the url database to security platform, is returned according to the security platform As a result the threat library is updated, the result of the return is newly-increased threat URL.
The second object of the present invention is implemented with the following technical solutions:
A kind of URL analytical equipment comprising:
Module is obtained the url data is stored in url database for receiving url data;
Filtering module, for being matched according to the url data with library is threatened, filtering has the url data of threat, obtains To safe url data, record deposit is threatened to threaten record storehouse;
Analysis module, for being matched according to the safe url data with the known URL in behavior library:
When the safe url data successful match, obtains behavior record and be stored in behavior record library, the behavior record is The corresponding behavior classification of safe URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, Key word analysis is carried out according to the keyword in the behavior library, analysis result is stored in behavior record as the behavior record Library;
Update module, for according to target keyword, unknown URL and its corresponding behavior classification in the analysis result Regeneration behavior library.
The third object of the present invention is to provide the electronic equipment for executing one of goal of the invention comprising processor, storage Medium and computer program, the computer program are stored in storage medium, and the computer program is executed by processor Shi Shixian above-mentioned URL analysis method.
The fourth object of the present invention is to provide the computer readable storage medium of one of storage goal of the invention, store thereon There is computer program, the computer program realizes above-mentioned URL analysis method when being executed by processor.
Compared with prior art, the beneficial effects of the present invention are:
The present invention threatens URL by filtering, filters out safe URL, carries out behavioural analysis to safe URL, avoids behavior It is analyzed simultaneously when analysis and threatens URL, cause result inaccurate, can directly inquire threat note by threatening storehouse matching and recording Record, behavior record can directly be inquired by passing through behavior storehouse matching and recording;For the URL not having in behavior library, pass through keyword Analysis obtains the corresponding behavior classification of URL, and real-time update behavior library based on the analysis results, further improves the accurate of analysis Rate, without being artificially updated to behavior library.
Detailed description of the invention
Fig. 1 is the flow chart of the URL analysis method of embodiment one;
Fig. 2 is the flow chart of the keyword analysis method and keyword analysis of embodiment three;
Fig. 3 is the structural block diagram of the URL analytical equipment of embodiment five;
Fig. 4 is the structural block diagram of the electronic equipment of embodiment six.
Specific embodiment
Below with reference to attached drawing, the present invention is described in more detail, it should be noted that right referring to the drawings The description that the present invention carries out is only illustrative, and not restrictive.It can be combined with each other between each difference embodiment, To constitute the other embodiments not shown in the following description.
Embodiment one
Embodiment one provides a kind of URL analysis method, threatens URL by first recording, then analyze the behavior class of safe URL Not, and then accurate threat record and behavior record are obtained;It is compared by keyword, improves behavior record, and use keyword Comparison result real-time update behavior library can obtain the behavior classification of all URL, be compared by keyword in this way As a result regeneration behavior library, instead of the artificial process for carrying out URL behavior library and updating.
It please refers to shown in Fig. 1, a kind of URL analysis method, comprising the following steps:
S110, url data is received, the url data is stored in url database;
The url data received, usually collected url data on gateway box, main includes the complete trails of URL, URL access times, access time etc., these data are pushed in real time on the topic specified, use Structured The mode of Streaming or similar stream process engine receives topic data in real time, obtains user and accesses the information such as URL.
S120, it is matched according to the url data with library is threatened, filtering has the url data of threat, obtains safety Url data threatens record deposit to threaten record storehouse;
When inquiry, threat historical record can be directly inquired from threatening in record storehouse.
S130, it is matched according to the safe url data with the known URL in behavior library:
When the safe url data successful match, obtains behavior record and be stored in behavior record library, the behavior record is The corresponding behavior classification of safe URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, Key word analysis is carried out according to the keyword in the behavior library, analysis result is stored in behavior record as the behavior record Library;
Unknown URL webpage further includes the URL that crawler crawls at random, increases the URL crawled at random to improve behavior library more New efficiency.
When matching, matched according to safe URL with the URL in behavior library, for example, the address URL is in behavior library " www.taobao.com " corresponding behavior classification is " shopping ", then all in safe URL includes " www.taobao.com " The corresponding behavior classification of URL exactly " do shopping ", when matching, usually also need to carry out standardization pretreatment to safe URL, such as go Except URL protocol header.
When key word analysis, keyword comparison is carried out to the webpage of unknown URL, for example, there is one group of behavior class in behavior library Not Wei the keyword of " shopping " be " along rich ", " preferential ", extracting target keyword has " along rich ", " preferential ", " apple ", then It can be assumed that the behavior classification of the unknown URL is " shopping ";Wherein target keyword is extracted according to the condition of setting, is led to Often by filtering out the keyword for being higher than default weight as target keyword;It can also be filtered out default according to weight sequencing The highest target keyword of the weight of number.
When inquiry, behavior historical record can directly be inquired by behavior record library.
S140, according to it is described analysis result in target keyword, unknown URL and its corresponding behavior classification regeneration behavior Library.
Embodiment two
Embodiment mainly solves the calculating in behavior library and weight second is that the improvement carried out on the basis of embodiment one It releases and illustrates.
It is matched for the ease of key word analysis and behavior type, behavior library includes the library URL and keywords database, the library URL Including known URL and its corresponding behavior classification, the keywords database is the corresponding keyword of behavior classification, the keywords database It is divided into according to weight: higher than the judgement keywords database of default weight and lower than other keywords databases of default weight, described its His keywords database is also divided into from high to low according to weight does not determine keywords database and non-judgement keywords database, the judgement keyword Inventory has the corresponding one or more groups of judgement keywords of behavior classification and frequency, and the key word analysis passes through crucial with judgement Dictionary is matched, and the analysis result is obtained.
When regeneration behavior library, weight is calculated according to the target keyword being newly added, for determining in keywords database lower than pre- Do not determine in keyword if the keyword of weight is added to, do not determine keywords database similarly, the keyword that weight is reduced is added Into non-judgement keyword, the keyword higher than default weight is added in judgement keyword.
Specifically, TF-IDF algorithm can be used in weight calculation, other available keyword weights also can be used Algorithm.
By taking TF-IDF algorithm as an example, in TF-IDF algorithm, TF indicates what some word or expression occurred in some document Frequency refers herein to the frequency that a keyword occurs in webpage, for example, " preferential " is shopping in a certain behavior classification The webpage frequency of occurrences, formula:I is i-th of word in keywords database, and j is that the keyword corresponds to webpage Number, for example, " preferential " occurs 5 times in the shopping webpage that number is " 1 ", which shares 100 keys Word determines that keywords database has the corresponding key of webpage that the number is " 1 " then the TF value of " preferential " is 5/100=0.05 Word and frequency also have the TF value of keyword.
IDF indicates reverse document-frequency, refers herein to the significance level that a certain keyword judges behavior classification, Its formula:| D | refer to all webpage numbers, { j:t in a certain behavior classificationi∈djRefer to wrapping Webpage number containing a certain keyword, such as one 100 " shopping " webpages are shared in behavior library, wherein with the presence of 10 webpages " preferential " this keyword, then its IDF value is 1.
TFIDF value is that the TF and IDF of " preferential " this keyword in the product of TF and IDF, such as citing are respectively 0.05 With 1, then its TFIDF value is equal to 0.05.
Weighted value is preset according to the actual situation, weighted value is higher as keyword is determined, keyword judgement is carried out, it is anti- The only lower keyword of weight, such as " ", the high frequency words such as " " influence the judgement of behavior classification.
Embodiment three
Embodiment is third is that carry out on the basis of embodiment one or/and embodiment two, mainly to key word analysis Detailed process is explained and illustrates.
Key word analysis the following steps are included:
S210, a behavior classification is arbitrarily chosen, obtains the corresponding judgement keyword of behavior classification and the judgement is closed The frequency of keyword is denoted as the first frequency, constructs the first array with the first frequency;
Weight is higher than the target keyword and the target keyword for presetting weight in S220, the statistics unknown URL Frequency is denoted as the second frequency, constructs the second array with the second frequency;
Specifically, the frequency of the highest keyword of weight and the keyword in the unknown URL, including following step are counted It is rapid:
Crawl the webpage of the unknown URL;
The content of the webpage is segmented, all keywords in the webpage are obtained;
Calculate the weight of all keywords;
The keyword for being higher than default weight is filtered out, the target keyword is obtained.
S230, the first array and the second array are subjected to similarity comparison, obtain the unknown URL and the behavior classification Similarity value;
According to this, the similarity for calculating unknown URL Yu all behavior classifications obtains the analysis as a result, the analysis result For with the maximum behavior classification of unknown url data similarity.
Specifically, similarity or other methods that can calculate similarity can be calculated by the cosine law.
By taking the cosine law as an example, the cosine law meets formula:
Wherein, A and B are respectively indicated First array and the second array, similarity calculated result indicate that the similarity of two groups of keywords is higher closer to 1.
Example IV
Example IV carries out on the basis of example 1.It is mainly explained and says to the update for threatening library It is bright.
Specifically, by the data-pushing in url database to security platform, more according to the result of security platform return The new threat library, the result of the return are newly-increased threat URL.
To in url database or URL that crawler the crawls at random event analysis that impends is threatened with more new threat library Detection can be by sending security platform for url data, and total amount threatens library according to real-time update according to testing result, to some In the presence of the website seriously threatened or the corresponding IP of URL by being issued to intelligent gateway box firewall system, to reach resistance Disconnected purpose.
It by security platform, can further impend detection to URL, and by testing result more new threat library, Keep accuracy rate when threatening storehouse matching higher.
Embodiment five
Embodiment five discloses a kind of corresponding device of the above-mentioned URL analysis method of correspondence, is the virtual dress of above-described embodiment Structure is set, it is shown referring to figure 3., comprising:
Module 310 is obtained the url data is stored in url database for receiving url data;
Filtering module 320, for being matched according to the url data with library is threatened, filtering has the URL number of threat According to obtaining safe url data, record deposit threatened to threaten record storehouse;
Analysis module 330, for being matched according to the safe url data with the known URL in behavior library:
When the safe url data successful match, obtains behavior record and be stored in behavior record library, the behavior record is The corresponding behavior classification of safe URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, Key word analysis is carried out according to the keyword in the behavior library, analysis result is stored in behavior record as the behavior record Library;
Update module 340, for according to target keyword, unknown URL and its corresponding behavior in the analysis result Classification regeneration behavior library.
Preferably, according to target keyword, unknown URL and its corresponding behavior classification more newline in the analysis result For library, comprising the following steps:
Behavior library is added in the target keyword and its frequency;
According to updated keyword, each keyword weight is recalculated, the behavior library is updated according to weight.
The unknown URL webpage further includes the URL that crawler crawls at random.
Preferably, the behavior library includes the library URL and keywords database, and the library URL includes known URL and its corresponding row For classification, the keywords database is the corresponding keyword of behavior classification, and the keywords database is divided into according to weight: being higher than default power Weight judgement keywords database and other keywords databases lower than default weight, other described keywords databases according to weight by height to Low be also divided into does not determine keywords database and non-judgement keywords database, and it is one group corresponding that the judgement keywords database has behavior classification Or multiple groups determine keyword and frequency, the key word analysis by with determine that keywords database is matched, obtain described Analyze result.
Preferably, the key word analysis by with determine that keywords database is matched, obtain the analysis as a result, The following steps are included:
Behavior classification is arbitrarily chosen, obtains that behavior classification is corresponding to be determined keyword and described determine keyword Frequency is denoted as the first frequency, constructs the first array with the first frequency;
It counts weight in the unknown URL and is higher than the target keyword of default weight and the frequency of the target keyword, It is denoted as the second frequency, the second array is constructed with the second frequency;
First array and the second array are subjected to similarity comparison, obtain the phase of the unknown URL with the behavior classification Like angle value;
According to this, the similarity for calculating unknown URL Yu all behavior classifications obtains the analysis as a result, the analysis result For with the maximum behavior classification of unknown url data similarity.
Count the frequency of the highest keyword of weight and the keyword in the unknown URL, comprising the following steps:
Crawl the webpage of the unknown URL;
The content of the webpage is segmented, all keywords in the webpage are obtained;
Calculate the weight of all keywords;
The keyword for being higher than default weight is filtered out, the target keyword is obtained.
Preferably, by the data-pushing in the url database to security platform, the knot returned according to the security platform Fruit updates the threat library, and the result of the return is newly-increased threat URL.
Embodiment six
Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention six provides, as shown in figure 4, the electronics is set Standby includes processor 410, memory 420, input unit 430 and output device 440;The number of processor 410 in computer equipment It measures and can be one or more, in Fig. 4 by taking a processor 410 as an example;Processor 410, memory 420 in electronic equipment, Input unit 430 can be connected with output device 440 by bus or other modes, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, if the corresponding program instruction/module of URL analysis method in the embodiment of the present invention is (for example, URL analysis method Data acquisition module 310, filtering module 320, analysis module 330 and update module 340 in device).Processor 410 passes through fortune The row software program, instruction and the module that are stored in memory 420, thereby executing electronic equipment various function application and Data processing, i.e. the URL analysis method of realization above-described embodiment one to example IV.
Memory 420 can mainly include storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This Outside, memory 420 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 420 can be into one Step includes the memory remotely located relative to processor 410, these remote memories can be set by network connection to electronics It is standby.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Input unit 430 can be used for receiving the subscriber identity information of input, default weight etc..Output device 440 may include Display screen etc. shows equipment.
Embodiment seven
The embodiment of the present invention seven also provides a kind of storage medium comprising computer executable instructions, and the computer can be held Row instruction is used to execute URL analysis method when being executed by computer processor, this method comprises:
Url data is received, the url data is stored in url database;
It is matched according to the url data with library is threatened, filtering has the url data of threat, obtains safe URL number According to threat record deposit threatens record storehouse;
It is matched according to the safe url data with the known URL in behavior library:
When the safe url data successful match, obtains behavior record and be stored in behavior record library, the behavior record is The corresponding behavior classification of safe URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, Key word analysis is carried out according to the keyword in the behavior library, analysis result is stored in behavior record as the behavior record Library;
According to target keyword, unknown URL and its corresponding behavior classification regeneration behavior library in the analysis result.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above can also be performed provided by any embodiment of the invention based on URL Relevant operation in analysis method.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions use so that an electronic equipment (can be mobile phone, personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the above-mentioned embodiment based on URL analysis method device, included each unit and module It is only divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized; In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
It will be apparent to those skilled in the art that can make various other according to the above description of the technical scheme and ideas Corresponding change and deformation, and all these changes and deformation all should belong to the protection scope of the claims in the present invention Within.

Claims (10)

1. a kind of URL analysis method, which comprises the following steps:
Url data is received, the url data is stored in url database;
It is matched according to the url data with library is threatened, filtering has the url data of threat, obtains safe url data, prestige Side of body record deposit threatens record storehouse;
It is matched according to the safe url data with the known URL in behavior library:
It when the safe url data successful match, obtains behavior record and is stored in behavior record library, the behavior record is safety The corresponding behavior classification of URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, according to Keyword in the behavior library carries out key word analysis, and analysis result is stored in behavior record library as the behavior record;
According to target keyword, unknown URL and its corresponding behavior classification regeneration behavior library in the analysis result.
2. URL analysis method as described in claim 1, which is characterized in that according to it is described analysis result in target keyword, Unknown URL and its corresponding behavior classification regeneration behavior library, comprising the following steps:
Behavior library is added in the target keyword and its frequency;
According to updated keyword, each keyword weight is recalculated, the behavior library is updated according to weight.
3. URL analysis method as claimed in claim 2, which is characterized in that the unknown URL webpage further includes that crawler is random The URL crawled.
4. URL analysis method as described in any one of claims 1-3, which is characterized in that the behavior library includes the library URL and closes Keyword library, the library URL include known URL and its corresponding behavior classification, and the keywords database is the corresponding pass of behavior classification Keyword, the keywords database are divided into according to weight: higher than the judgement keywords database of default weight and lower than other of default weight Keywords database, other described keywords databases are also divided into from high to low according to weight does not determine keywords database and non-judgement keyword Library, the judgement keywords database have the corresponding one or more groups of judgement keywords of behavior classification and frequency, the keyword Analysis obtains the analysis result by being matched with judgement keywords database.
5. URL analysis method as claimed in claim 4, which is characterized in that the key word analysis passes through crucial with judgement Dictionary is matched, and the analysis result is obtained, comprising the following steps:
A behavior classification is arbitrarily chosen, the corresponding frequency for determining keyword and the judgement keyword of behavior classification is obtained Number, is denoted as the first frequency, constructs the first array with the first frequency;
It counts weight in the unknown URL and is higher than the target keyword of default weight and the frequency of the target keyword, be denoted as Second frequency constructs the second array with the second frequency;
First array and the second array are subjected to similarity comparison, obtain the similarity of the unknown URL Yu the behavior classification Value;
According to this, the similarity for calculating unknown URL Yu all behavior classifications, obtain it is described analysis as a result, the analysis result for The unknown maximum behavior classification of url data similarity.
6. URL analysis method as claimed in claim 5, which is characterized in that the highest key of weight in the statistics unknown URL The frequency of word and the keyword, comprising the following steps:
Crawl the webpage of the unknown URL;
The content of the webpage is segmented, all keywords in the webpage are obtained;
Calculate the weight of all keywords;
The keyword for being higher than default weight is filtered out, the target keyword is obtained.
7. URL analysis method as described in claim 1, which is characterized in that by the data-pushing in the url database to peace Full platform updates the threat library according to the result that the security platform returns, and the result of the return is newly-increased threatens URL。
8. a kind of URL analytical equipment, characterized in that it comprises:
Module is obtained the url data is stored in url database for receiving url data;
Filtering module, for being matched according to the url data with library is threatened, filtering has the url data of threat, is pacified Full url data threatens record deposit to threaten record storehouse;
Analysis module, for being matched according to the safe url data with the known URL in behavior library:
It when the safe url data successful match, obtains behavior record and is stored in behavior record library, the behavior record is safety The corresponding behavior classification of URL;
The safe url data is when it fails to match, and to the URL that it fails to match, referred to as unknown URL extracts target keyword, according to Keyword in the behavior library carries out key word analysis, and analysis result is stored in behavior record library as the behavior record;
Update module, for being updated according to target keyword, unknown URL and its corresponding behavior classification in the analysis result Behavior library.
9. a kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in In storage media, which is characterized in that realize that claim 1 to 7 is described in any item when the computer program is executed by processor URL analysis method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Claim 1 to 7 described in any item URL analysis methods are realized when being executed by processor.
CN201910687531.5A 2019-07-26 2019-07-26 URL analysis method, device, equipment and medium Active CN110460592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910687531.5A CN110460592B (en) 2019-07-26 2019-07-26 URL analysis method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910687531.5A CN110460592B (en) 2019-07-26 2019-07-26 URL analysis method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN110460592A true CN110460592A (en) 2019-11-15
CN110460592B CN110460592B (en) 2021-03-26

Family

ID=68483814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910687531.5A Active CN110460592B (en) 2019-07-26 2019-07-26 URL analysis method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110460592B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902703A (en) * 2014-03-31 2014-07-02 辽宁四维科技发展有限公司 Text content sorting method based on mobile internet access
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
US20180075256A1 (en) * 2015-05-11 2018-03-15 Finjan Mobile, Inc. Detection and blocking of web trackers for mobile browsers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902703A (en) * 2014-03-31 2014-07-02 辽宁四维科技发展有限公司 Text content sorting method based on mobile internet access
US20180075256A1 (en) * 2015-05-11 2018-03-15 Finjan Mobile, Inc. Detection and blocking of web trackers for mobile browsers
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data

Also Published As

Publication number Publication date
CN110460592B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
US11710054B2 (en) Information recommendation method, apparatus, and server based on user data in an online forum
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
CN105072089B (en) A kind of WEB malice scanning behavior method for detecting abnormality and system
CN107204960B (en) Webpage identification method and device and server
CN103297435B (en) A kind of abnormal access behavioral value method and system based on WEB daily record
CN104899508B (en) A kind of multistage detection method for phishing site and system
CN107341716A (en) A kind of method, apparatus and electronic equipment of the identification of malice order
CN104217160A (en) Method and system for detecting Chinese phishing website
US20160188723A1 (en) Cloud website recommendation method and system based on terminal access statistics, and related device
CN111278014A (en) Fraud prevention system, method, server and storage medium
CN103714119B (en) A kind for the treatment of method and apparatus of browser data
CN110099059A (en) A kind of domain name recognition methods, device and storage medium
CN108023868B (en) Malicious resource address detection method and device
CN107341399A (en) Assess the method and device of code file security
JP2014502753A (en) Web page information detection method and system
CN109190014B (en) Regular expression generation method and device and electronic equipment
CN109271806A (en) Research on Privacy Preservation Mechanism based on user behavior
CN108512883B (en) Information pushing method and device and readable medium
CN113098887A (en) Phishing website detection method based on website joint characteristics
CN107273416A (en) The dark chain detection method of webpage, device and computer-readable recording medium
CN110830445A (en) Method and device for identifying abnormal access object
CN108600172A (en) Hit library attack detection method, device, equipment and computer readable storage medium
CN107231383B (en) CC attack detection method and device
CN104615723B (en) The determination method and apparatus of query word weighted value
CN112565164A (en) Dangerous IP identification method, dangerous IP identification device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210302

Address after: Room 402, Jinhua Network Economy Center Building, 398 Silian Road, Wucheng District, Jinhua City, Zhejiang Province

Applicant after: GUANGTONG TIANXIA NETWORK TECHNOLOGY Co.,Ltd.

Address before: 310051 room 2503, area a, building 1, No. 57, jianger Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: Hangzhou Jixun Huitong Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20230817

Granted publication date: 20210326