CN110191124B - Web front-end development data-based website identification method and device and storage equipment - Google Patents
Web front-end development data-based website identification method and device and storage equipment Download PDFInfo
- Publication number
- CN110191124B CN110191124B CN201910458634.4A CN201910458634A CN110191124B CN 110191124 B CN110191124 B CN 110191124B CN 201910458634 A CN201910458634 A CN 201910458634A CN 110191124 B CN110191124 B CN 110191124B
- Authority
- CN
- China
- Prior art keywords
- feature
- sub
- features
- website
- development data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
Abstract
The embodiment of the invention discloses a website identification method, a website identification device and storage equipment based on web front-end development data, which are used for solving the problem that the potential safety hazard exists because of the missed report generated by a blacklist mechanism in the traditional website identification mode; and only most known websites can be identified through a white list mechanism, and the newly-appeared websites cannot be accurately identified, so that the problem of false alarm is easily caused. The method comprises the following steps: collecting front-end development data of a safe website page; extracting the front-end development data characteristics of the page and forming an information base; extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base; if any one of the features is matched, the website attribute is safe.
Description
Technical Field
The embodiment of the invention relates to the field of network security, in particular to a website authentication method and device based on web front-end development data and storage equipment.
Background
In security research work, it is often necessary to identify the security and reputation of a web site. Conventional secure web site authentication has two ways: websites with low reputation characteristics can be filtered out through a blacklist mechanism, but the method can generate report missing and has potential safety hazards; and the other mode is a white list mechanism, which can only identify most known high-reputation websites, is not very friendly to the newly-appeared high-reputation websites, has insufficient accuracy and is easy to cause false alarm.
Disclosure of Invention
Based on the existing problems, the embodiment of the invention provides a website identification method, a website identification device and storage equipment based on web front-end development data, which are used for solving the problem that in the traditional website identification mode, a blacklist mechanism generates report omission and has potential safety hazards; and the white list mechanism can only identify most of known websites, and can not accurately identify newly-appeared websites, thereby easily causing the problem of false alarm.
The embodiment of the invention discloses a website identification method based on web front-end development data, which comprises the following steps:
collecting front-end development data of a safe website page; extracting the front-end development data characteristics of the page and forming an information base; extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base; if any one of the features is matched, the website attribute is safe.
Further, the page front-end developing data features, including: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code; the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords; the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes; the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution; the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
Further, extracting the page front-end development data features of the website with unknown attributes, and matching the features with the features in the information base, specifically: setting a threshold T for each feature in the information basen,TnanSetting a weight value for each sub-feature of each feature in the information base; extracting any front-end development data characteristic of the website page with unknown attribute, matching the sub-characteristic with the corresponding sub-characteristic in the database, and calculating the characteristic matching degree Sn(ii) a If the matching degree SnNot less than threshold value TnIf so, matching the characteristics with the corresponding characteristics in the information base, and setting the website attribute as safe; if the matching degree Sn< threshold TnContinuously selecting other characteristics for matching; wherein the matching degree Sn=Tna1+Tna2+…+TnanIf the sub-features match, then TnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
Further, if the matching degree S of all the characteristicsnAre all less than threshold TnThen, the unknown website attribute is identified by adopting the following method: setting a statistical value P, P ═ 1- (1-S)1/t1)*(1-S2/t2)*(1-S3/t3)*……*(1-Sn/tn) (ii) a Setting statistical preset valuesIf the statistical value isThe website attribute is security; otherwise the website attribute is not secure.
Further, if reputation evaluation needs to be further performed on the website, reputation level values R with different reputations need to be setn(ii) a Setting credit weight S according to each feature in information basenbnPerforming a feature matching degree S on all the featuresnCalculating and calculating the reputation value X of the website, wherein X is S1*S1b1+S2*S2b2+…+Sn*Snbn(ii) a Comparing the reputation value X with the reputation level value RnAnd the credibility of the website can be evaluated.
The embodiment of the invention discloses a website identification device based on web front-end development data, which comprises a memory and a processor, wherein the memory is used for storing a plurality of instructions, and the processor is used for loading the instructions stored in the memory to execute:
collecting front-end development data of a safe website page; extracting the front-end development data characteristics of the page and forming an information base; extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base; if any one of the features is matched, the website attribute is safe.
Further, the processor is also configured to load instructions stored in the memory to perform:
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code; the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords; the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes; the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution; the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
Further, the processor is also configured to load instructions stored in the memory to perform:
extracting the page front-end development data characteristics of the unknown attribute website, and matching the characteristics with the characteristics in the information base, specifically: setting a threshold T for each feature in the information basen,TnanSetting a weight value for each sub-feature of each feature in the information base; extracting any front-end development data characteristic of the website page with unknown attribute, matching the sub-characteristic with the corresponding sub-characteristic in the database, and calculating the characteristic matching degree Sn(ii) a If the matching degree SnNot less than threshold value TnIf so, matching the characteristics with the corresponding characteristics in the information base, and setting the website attribute as safe; if the matching degree Sn< threshold TnContinuously selecting other characteristics for matching; wherein the matching degree Sn=Tna1+Tna2+…+TnanIf the sub-features match, then TnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
Further, the processor is also configured to load instructions stored in the memory to perform:
if all the characteristics match with each othernAre all smallAt a threshold value TnThen, the unknown website attribute is identified by adopting the following method: setting a statistical value P, P ═ 1- (1-S)1/t1)*(1-S2/t2)*(1-S3/t3)*……*(1-Sn/tn) (ii) a Setting statistical preset valuesIf the statistical value isThe website attribute is security; otherwise the website attribute is not secure.
Further, the processor is also configured to load instructions stored in the memory to perform:
if reputation evaluation needs to be further performed on the website, reputation grade values R with different reputations need to be setn(ii) a Setting credit weight S according to each feature in information basenbnPerforming a feature matching degree S on all the featuresnCalculating and calculating the reputation value X of the website, wherein X is S1*S1b1+S2*S262+…+Sn*Snbn(ii) a Comparing the reputation value X with the reputation level value RnAnd the credibility of the website can be evaluated.
The embodiment of the invention also discloses a website identification device based on the web front-end development data, which comprises the following steps:
a data collection module: collecting front-end development data of a safe website page;
a feature extraction module: extracting the front-end development data characteristics of the page and forming an information base;
a matching module: extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base;
an authentication module: if any one feature is matched, the website attribute is safe;
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code; the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords; the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes; the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution; the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
The embodiment of the invention provides a storage device, wherein a plurality of instructions are stored in the storage device, and the instructions are suitable for being loaded by a processor and executing the steps of the website authentication method based on the web front-end development data provided by the embodiment of the invention.
Compared with the prior art, the website identification method, the website identification device and the storage equipment based on the web front-end development data provided by the embodiment of the invention at least realize the following beneficial effects:
collecting front-end development data of a safe website page; extracting the front-end development data characteristics of the page and forming an information base; extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base; if any one of the features is matched, the website attribute is safe. According to the embodiment of the invention, through summarizing the characteristics of the multidimensional web front-end development mode, websites with high reliability can be identified more accurately, the generation of false reports under a white list mechanism can be effectively reduced, and the generation of false reports under a black list mechanism can be avoided.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a website authentication method based on web front-end development data according to an embodiment of the present invention;
FIG. 2 is a flowchart of another website authentication method based on web front-end development data according to an embodiment of the present invention;
FIG. 3 is a flowchart of a website reputation evaluation method based on web front-end development data according to an embodiment of the present invention;
FIG. 4 is a block diagram of a website authentication apparatus based on web front-end development data according to an embodiment of the present invention;
fig. 5 is a structural diagram of another website authentication apparatus based on web front-end development data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, a detailed description will be given below of a specific implementation of a web site authentication method based on web front-end development data according to an embodiment of the present invention with reference to the accompanying drawings. It should be understood that the preferred embodiments described below are only for illustrating and explaining the present invention and are not to be used for limiting the present invention. And the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The embodiment of the invention provides a flow chart of a website identification method based on web front-end development data, as shown in fig. 1, comprising the following steps:
and 14, if any one feature is matched, the website attribute is safe.
Wherein the page front end develops data features, including: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
the adaptive technology is a technology which can automatically adjust products to the best expression form to adapt to different operating systems, browsers, equipment and the like; the adaptation technical features comprise a plurality of sub-features including but not limited to platform adaptation features, device adaptation features, interface adaptation features, browser adaptation features; and the adaptation technology is characterized by adaptation codes, for example, "< meta name" ("viewport" ("content") "-width" ("device-width"), initial-scale ═ 1.0, and user-scale ═ 0 ">" are relatively common mobile end interface standard adaptation codes.
The framework technical features include a number of sub-features including, but not limited to: calling special marks such as keywords, calling modes and the like in frames such as a CSS frame, a modular development frame and a tool frame; the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords; for example, "aria-valuew" is a unique attribute name in the CSS framework, i.e., a CSS framework keyword.
The interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes; for example, CSS code is referenced by url alone and is not intermixed in html code.
The design features comprise a plurality of sub-features including, but not limited to: page layout, color richness, color matching scheme, material quantity and quality; and the design appearance feature is a graphical composition feature and/or a color number feature and/or a resolution; through image processing and image recognition technology, appearance design is abstracted into characteristics such as image combination, color number, resolution ratio and the like.
The webpage content features comprise a plurality of sub-features, and the webpage content features are filing information features and/or copyright information features and/or URLs and/or title and annotation features, for example, the title format is usually concise and intuitive, the word number is controlled within 20 words (within 10 English words), the symbols are not more than 3, and no operation symbols are included, so that the core content of the current webpage is easy to understand by a user.
The method provided by the embodiment of the invention can accurately identify the website with high reliability by summarizing the multi-dimensional web front-end development mode characteristics, can effectively reduce the generation of false reports under a white list mechanism, and can also avoid the generation of false reports under a black list mechanism.
The flow chart of the website authentication method based on the web front-end development data provided by the embodiment of the invention is shown in fig. 2, and comprises the following steps:
Wherein, the matching degree Sn is Tna1+ Tna2+ … + Tnan, if the sub-characteristics are matched with each other, T isnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
if the matching degree Sn is larger than or equal to the threshold value Tn, the characteristic is matched with the corresponding characteristic in the information base, and the website attribute is safe; if the matching degree Sn is smaller than the threshold value Tn, other features are continuously selected for matching;
wherein, P is 1- (1-S)1/t1)*(1-S2/t2)*(1-S3/t3)*……*(1-Sn/tn);
The preset value can be set according to the user requirement, if the statistic valueThe website attribute is security; otherwise, the website attribute is unsafe; if the website attribute is judged to be unsafe, measures are taken for the unsafe website according to user requirements, wherein the measures include but are not limited to: and alarming, prohibiting further operation, closing the webpage and collecting the website information.
If the website attribute is judged to be safe and reputation evaluation needs to be further performed on the website, the embodiment of the invention provides a website reputation evaluation method flow chart based on web front-end development data, as shown in fig. 3;
301, setting reputation grade values R of different reputations for security of website attributesn;
For reputation level value RnThe setting of (2) can be set according to the requirement.
wherein X is S1*S1b1+S2*S2b2+…+Sn*Snbn;
According to the method provided by the embodiment of the invention, through summarizing the web front-end development modes of multiple dimensions, the website with high reliability can be more accurately identified, and the method is strong in logicality and easy to understand; the false alarm generation under the white list mechanism can be effectively reduced, and the false alarm generation under the black list mechanism can be avoided; meanwhile, the credibility of the website can be further evaluated, and convenience is provided for users.
The embodiment of the present invention further provides a website authentication apparatus based on web front-end development data, as shown in fig. 4, including: the apparatus includes a memory 410 and a processor 420, the memory 410 is configured to store a plurality of instructions, and the processor 420 is configured to load the instructions stored in the memory 410 to perform:
collecting front-end development data of a safe website page; extracting the front-end development data characteristics of the page and forming an information base; extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base; if any one of the features is matched, the website attribute is safe.
The processor 420 is configured to load instructions stored in the memory 410 to perform:
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code; the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords; the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes; the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution; the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
The processor 420 is configured to load instructions stored in the memory 410 to perform:
extracting the page front-end development data characteristics of the unknown attribute website, and matching the characteristics with the characteristics in the information base, specifically:
setting a threshold T for each feature in the information basen;
Extracting unknownsDeveloping data characteristics at any front end of the attribute website page, matching the sub-characteristics with the corresponding sub-characteristics in the database, and calculating the characteristic matching degree Sn;
If the matching degree SnNot less than threshold value TnIf so, matching the characteristics with the corresponding characteristics in the information base, and setting the website attribute as safe; if the matching degree Sn< threshold TnContinuously selecting other characteristics for matching;
wherein the matching degree Sn=Tna1+Tna2+…+Tnan,TnanSetting weight for each sub-feature of each feature in the information base, if the sub-features are matched with each other, TnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
The processor 420 is configured to load instructions stored in the memory 410 to perform:
if all the characteristics match with each othernAre all less than threshold TnThen, the unknown website attribute is identified by adopting the following method:
setting a statistical value P, P is 1- (1-s1/t1) ((1-s 2/t2) ((1-s 3/t3) × … … ([ 1-sn/tn) ];
If the statistical value isThe website attribute is security; otherwise the website attribute is not secure.
The processor 320 is configured to load the instructions stored in the memory 310 to perform:
if reputation evaluation needs to be further performed on the website, reputation grade values R with different reputations need to be setn;
Setting credit weight S according to each feature in information basenbnPerforming a feature matching degree S on all the featuresnCalculating and calculating the reputation value X of the website, wherein X is S1*S1b1+S2*S2b2+…+Sn*Snbn;
Comparing the reputation value X with the reputation level value RnAnd the credibility of the website can be evaluated.
The embodiment of the present invention also provides another website authentication apparatus based on web front-end development data, as shown in fig. 5, including:
the data collection module 51: collecting front-end development data of a safe website page;
the feature extraction module 52: extracting the front-end development data characteristics of the page and forming an information base;
the matching module 53: extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base;
the authentication module 54: if any one of the features is matched, the website attribute is safe.
The embodiment of the invention also provides a storage device, wherein a plurality of instructions are stored in the storage device, and the instructions are suitable for being loaded by a processor and executing the steps of the website authentication method based on the web front-end development data provided by the embodiment of the invention.
Through the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
Those skilled in the art will appreciate that the drawings are merely schematic representations of one preferred embodiment and that the blocks or flow diagrams in the drawings are not necessarily required to practice the present invention.
Those skilled in the art will appreciate that the modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, and may be correspondingly changed in one or more devices different from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A website identification method based on web front-end development data is characterized in that:
collecting front-end development data of a safe website page;
extracting the front-end development data characteristics of the page and forming an information base;
extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base;
if any one feature is matched, the website attribute is safe;
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code;
the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords;
the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes;
the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution;
the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
2. The method of claim 1, wherein extracting page front-end development data features of unknown-attribute websites to match with features in an information base is specifically:
setting a threshold T for each feature in the information basen,TnanSetting a weight value for each sub-feature of each feature in the information base;
extracting any front-end development data characteristic of the website page with unknown attribute, matching the sub-characteristic with the corresponding sub-characteristic in the database, and calculating the characteristic matching degree Sn;
If the matching degree SnNot less than threshold value TnIf so, matching the characteristics with the corresponding characteristics in the information base, and setting the website attribute as safe; if the matching degree Sn< threshold TnContinuously selecting other characteristics for matching;
wherein the matching degree Sn=Tna1+Tna2+…+TnanIf the sub-features match, then TnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
3. The method of claim 2, wherein the degree of match S is given to all featuresnAre all less than threshold TnThen, the unknown website attribute is identified by adopting the following method:
setting a statistical value P, P ═ 1- (1-S)1/t1)*(1-S2/t2)*(1-S3/t3)*……*(1-Sn/tn);
4. The method of claim 2, wherein if reputation evaluation is further performed on the website, reputation level values R of different reputations are setn;
Setting credit weight S according to each feature in information basenbnPerforming a feature matching degree S on all the featuresnCalculating and calculating the reputation value X of the website, wherein X is S1*S1b1+S2*S2b2+…+Sn*Snbn(ii) a Comparing the reputation value X with the reputation level value RnAnd the credibility of the website can be evaluated.
5. An apparatus for web site authentication based on web front-end development data, the apparatus comprising a memory for storing a plurality of instructions and a processor for loading the instructions stored in the memory to perform:
collecting front-end development data of a safe website page;
extracting the front-end development data characteristics of the page and forming an information base;
extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base;
if any one feature is matched, the website attribute is safe;
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code;
the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords;
the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes;
the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution;
the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
6. The apparatus of claim 5, wherein the processor is further to load instructions stored in the memory to perform:
extracting the page front-end development data characteristics of the unknown attribute website, and matching the characteristics with the characteristics in the information base, specifically:
setting a threshold T for each feature in the information basen;
Extracting any front-end development data characteristic of the website page with unknown attribute, matching the sub-characteristic with the corresponding sub-characteristic in the database, and calculating the characteristic matching degree Sn;
If the matching degree SnNot less than threshold value TnIf so, matching the characteristics with the corresponding characteristics in the information base, and setting the website attribute as safe; if the matching degree Sn< threshold TnContinuously selecting other characteristics for matching;
wherein the matching degree Sn=Tna1+Tna2+…+Tnan,TnanSetting weight for each sub-feature of each feature in the information base, if the sub-features are matched with each other, TnanAnd setting the weight value for the sub-features of the information base, and if the sub-features are not matched with each other, setting the weight value of the corresponding sub-feature to be zero.
7. The apparatus of claim 6, wherein the processor is further to load instructions stored in the memory to perform:
if all the characteristics match with each othernAre all less than threshold TnThen, the unknown website attribute is identified by adopting the following method:
setting a statistical value P.P ═ 1- (1-s1/t1) × (1-s2/t2) × (1-s3/t3) × … … × (1-sn/tn);
8. The apparatus of claim 6, wherein the processor is further to load instructions stored in the memory to perform:
if reputation evaluation needs to be further performed on the website, reputation grade values R with different reputations need to be setn;
Setting credit weight S according to each feature in information basenbnPerforming a feature matching degree S on all the featuresnCalculating and calculating the reputation value X of the website, wherein X is S1*S1b1+S2*S2b2+…+Sn*Snbn;
Comparing the reputation value X with the reputation level value RnAnd the credibility of the website can be evaluated.
9. A web site authentication apparatus based on web front-end development data, comprising:
a data collection module: collecting front-end development data of a safe website page;
a feature extraction module: extracting the front-end development data characteristics of the page and forming an information base;
a matching module: extracting the front-end development data characteristics of the web pages with unknown attributes, and matching the characteristics with the characteristics in the information base;
an authentication module: if any one feature is matched, the website attribute is safe;
the page front end developing data features, comprising: adapting technical characteristics, framework technical characteristics, interface design characteristics, appearance design characteristics and webpage content characteristics;
wherein the adaptation technical feature comprises a plurality of sub-features and is an adaptation code;
the frame technical characteristics comprise a plurality of sub-characteristics, and the frame technical characteristics are keywords and/or keywords in the frame and the number of the keywords;
the interface design feature comprises a plurality of sub-features, and the interface design feature is that page HTML does not contain other language codes;
the design appearance feature comprises a plurality of sub-features, and the design appearance feature is a graphic composition feature and/or a color number feature and/or a resolution;
the webpage content feature comprises a plurality of sub-features, and the webpage content feature is a filing information feature and/or a copyright information feature and/or a URL feature and/or a title and annotation feature.
10. A storage device having stored therein a plurality of instructions adapted to be loaded by a processor and to perform the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458634.4A CN110191124B (en) | 2019-05-29 | 2019-05-29 | Web front-end development data-based website identification method and device and storage equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458634.4A CN110191124B (en) | 2019-05-29 | 2019-05-29 | Web front-end development data-based website identification method and device and storage equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110191124A CN110191124A (en) | 2019-08-30 |
CN110191124B true CN110191124B (en) | 2022-02-22 |
Family
ID=67718703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910458634.4A Active CN110191124B (en) | 2019-05-29 | 2019-05-29 | Web front-end development data-based website identification method and device and storage equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110191124B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112785130B (en) * | 2021-01-13 | 2024-04-16 | 上海派拉软件股份有限公司 | Website risk level identification method, device, equipment and storage medium |
CN113535458B (en) * | 2021-09-17 | 2021-12-28 | 上海观安信息技术股份有限公司 | Abnormal false alarm processing method and device, storage medium and terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7559085B1 (en) * | 2004-08-13 | 2009-07-07 | Sun Microsystems, Inc. | Detection for deceptively similar domain names |
CN103927480A (en) * | 2013-01-14 | 2014-07-16 | 腾讯科技(深圳)有限公司 | Method, device and system for identifying malicious web page |
CN104537303A (en) * | 2014-12-30 | 2015-04-22 | 中国科学院深圳先进技术研究院 | Distinguishing system and method for phishing website |
CN104954372A (en) * | 2015-06-12 | 2015-09-30 | 中国科学院信息工程研究所 | Method and system for performing evidence acquisition and verification on phishing website |
CN108650250A (en) * | 2018-04-27 | 2018-10-12 | 北京奇安信科技有限公司 | Illegal page detection method, system, computer system and readable storage medium storing program for executing |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982129B (en) * | 2012-11-14 | 2016-10-19 | 优视科技有限公司 | Content in webpage is marked the method, system and device of prompting |
CN109242487A (en) * | 2018-09-26 | 2019-01-18 | 石帅 | A kind of value assessment method of internet block chain environment lower network domain name |
-
2019
- 2019-05-29 CN CN201910458634.4A patent/CN110191124B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7559085B1 (en) * | 2004-08-13 | 2009-07-07 | Sun Microsystems, Inc. | Detection for deceptively similar domain names |
CN103927480A (en) * | 2013-01-14 | 2014-07-16 | 腾讯科技(深圳)有限公司 | Method, device and system for identifying malicious web page |
CN104537303A (en) * | 2014-12-30 | 2015-04-22 | 中国科学院深圳先进技术研究院 | Distinguishing system and method for phishing website |
CN104954372A (en) * | 2015-06-12 | 2015-09-30 | 中国科学院信息工程研究所 | Method and system for performing evidence acquisition and verification on phishing website |
CN108650250A (en) * | 2018-04-27 | 2018-10-12 | 北京奇安信科技有限公司 | Illegal page detection method, system, computer system and readable storage medium storing program for executing |
Also Published As
Publication number | Publication date |
---|---|
CN110191124A (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110362370B (en) | Webpage language switching method and device and terminal equipment | |
Sun et al. | Dom based content extraction via text density | |
CN105677764B (en) | Information extraction method and device | |
US8515212B1 (en) | Image relevance model | |
US8630972B2 (en) | Providing context for web articles | |
US20130339840A1 (en) | System and method for logical chunking and restructuring websites | |
CA2918840C (en) | Presenting fixed format documents in reflowed format | |
CN108566399B (en) | Phishing website identification method and system | |
CN103136228A (en) | Image search method and image search device | |
CN102054024A (en) | Information processing apparatus, information extracting method, program, and information processing system | |
CN115982376B (en) | Method and device for training model based on text, multimode data and knowledge | |
CN103942211B (en) | A kind of recognition methods of text page and device | |
CN110191124B (en) | Web front-end development data-based website identification method and device and storage equipment | |
JP2014112433A (en) | Device and method for search result ordering using reliability of representative | |
CN104881428A (en) | Information graph extracting and retrieving method and device for information graph webpages | |
EP3467633B1 (en) | Method, device, and terminal device for extracting data | |
US11520835B2 (en) | Learning system, learning method, and program | |
CN106202349A (en) | Web page classifying dictionary creation method and device | |
EP3706014A1 (en) | Methods, apparatuses, devices, and storage media for content retrieval | |
US20090313558A1 (en) | Semantic Image Collection Visualization | |
CN106570003B (en) | Data pushing method and device | |
WO2016105334A1 (en) | Providing a print-ready document | |
CN104866545B (en) | The method of search key on information displayed page | |
CN113806667B (en) | Method and system for supporting webpage classification | |
CN108595453B (en) | URL (Uniform resource locator) identifier mapping obtaining method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 150010 building 7, innovation and entrepreneurship Plaza, science and technology innovation city, Harbin high tech Industrial Development Zone, Harbin, Heilongjiang Province (No. 838 Shikun Road) Applicant after: Antan Technology Group Co.,Ltd. Address before: 150010 building 7, innovation and entrepreneurship Plaza, science and technology innovation city, Harbin high tech Industrial Development Zone, Harbin, Heilongjiang Province (No. 838 Shikun Road) Applicant before: Harbin Antian Science and Technology Group Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |