CN110837619B - Website auditing method, device, equipment and storage medium - Google Patents

Website auditing method, device, equipment and storage medium Download PDF

Info

Publication number
CN110837619B
CN110837619B CN201911072110.8A CN201911072110A CN110837619B CN 110837619 B CN110837619 B CN 110837619B CN 201911072110 A CN201911072110 A CN 201911072110A CN 110837619 B CN110837619 B CN 110837619B
Authority
CN
China
Prior art keywords
website
data stream
current data
feature
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911072110.8A
Other languages
Chinese (zh)
Other versions
CN110837619A (en
Inventor
宋同珍
谢永恒
万月亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN201911072110.8A priority Critical patent/CN110837619B/en
Publication of CN110837619A publication Critical patent/CN110837619A/en
Application granted granted Critical
Publication of CN110837619B publication Critical patent/CN110837619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for checking a website. Wherein, the method comprises the following steps: determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and first key features under different station building tools; and if the website building tool corresponding to the current data stream is the website building tool to be audited, determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories. According to the technical scheme provided by the embodiment of the invention, the corresponding auditing category can be obtained only by determining the characteristics of the current data stream, the domain name of the website where the current data stream is located does not need to be determined, the auditing of the unregistered website can be realized, the auditing omission of the unregistered website is avoided, and the comprehensiveness of the website auditing is improved.

Description

Website auditing method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of information security processing, in particular to a method, a device, equipment and a storage medium for checking a website.
Background
With the rapid development of internet technology, a large number of forum websites are commonly applied to daily life of people, at this time, lawless persons may publish some violations or illegal statements on some unreported forum websites, which has hidden dangers of harming netizen safety and thought health, so in order to avoid such hidden dangers, the websites need to publish information content to check whether the websites are violating.
With the adoption of a website building tool, such as open source configuration of discuz, phpwind and the like, building of various forum websites in the internet becomes more and more convenient, a large number of unreported Xiaozhong forums are built by frames of discuz, phpwind and the like, and because the website domain names of the unreported Xiaozhong forums are unknown, information published on the websites cannot be detected through crawler software, so that whether the websites violate rules or not cannot be audited, and a certain website management blind area exists.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for website auditing, which can avoid auditing omission of unregistered websites and improve the comprehensiveness of website auditing.
In a first aspect, an embodiment of the present invention provides a method for website review, where the method includes:
determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and first key features under different station building tools;
and if the website building tool corresponding to the current data stream is the website building tool to be audited, determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories.
In a second aspect, an embodiment of the present invention provides an apparatus for website auditing, where the apparatus includes:
the station building tool determining module is used for determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools;
and the auditing category determining module is used for determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories if the website building tool corresponding to the current data stream is a target website building tool.
In a third aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method for website review as described in any of the embodiments of the invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for website auditing according to any embodiment of the present invention.
The embodiment of the invention provides a method, a device, equipment and a storage medium for website auditing, wherein according to the feature matching result of a current data stream and first key features under different website building tools, when determining whether the website building tool corresponding to the current data stream is a to-be-audited website building tool, feature matching is continuously carried out on the current data stream and second key features under different preset website categories, and then the auditing category of the website where the current data stream is located is determined according to the feature matching degree.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a flowchart of a website auditing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a website review method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a website review method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for website review according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a website auditing method according to an embodiment of the present invention, where the embodiment is applicable to auditing a website where any data stream on the internet is located. The website auditing method provided by this embodiment of the present invention may be implemented by the website auditing apparatus provided by this embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, and is integrated in an apparatus for implementing the method, and the apparatus may be an apparatus cluster formed by multiple physical computers.
Specifically, referring to fig. 1, the method specifically includes the following steps:
and S110, determining the station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools.
Specifically, as various forum websites are more and more conveniently established through an open-source website establishment tool in the internet, such websites are usually not recorded in a system, so that a lawbreaker can not discover and process some improper statements published on the website in time, and the safety and thought health of netizens are harmed, therefore, the websites where data streams transmitted on the internet are located need to be audited, and whether the websites are illegal websites or not is judged.
At this time, the current data stream in this embodiment is any data stream that is transmitted through a hypertext Transfer Protocol (HTTP) over the internet; the website building tool is an application program capable of building forum websites through corresponding frames, the data streams issued by the websites built through different website building tools include characteristic information capable of indicating the specific website building tool selected by the website, and the first key characteristic in the embodiment is an identification characteristic which is preset for the data stream in the website and is used for indicating the website building tool specifically adopted by the website where the data stream is located when the website is built through different website building tools.
Optionally, for a current data stream propagated through an HTTP protocol in the internet, analyzing the current data stream to obtain various features included under the current data stream, determining first key features preset under each existing station building tool at the same time, and further performing feature matching on the features in the current data stream and the first key features under each station building tool, that is, determining whether each feature of the current data stream completely includes the first key feature under a certain station building tool, if the features in the current data stream completely include the first key features under a certain station building tool, indicating that the matching degree between the current data stream and the first key features under the station building tool is the highest, using the station building tool as the station building tool selected when the website where the current data stream is located is built; if the features in the current data stream do not completely include the first key features under each website building tool, it is indicated that the current data stream is not matched with the first key features under the known website building tools, and the website where the current data stream is located is built by other website building tools which are not clear, but the website category under the website building tools which are not clear is not checked temporarily in the embodiment.
And S120, if the website building tool corresponding to the current data stream is the website building tool to be audited, determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and the second key features under different website categories.
The website building tool to be audited is a website building tool which does not make a requirement on whether a website built under the website building tool is recorded or not, and at the moment, lawless persons can make some illegal statements on the website built under the website building tool, so that the website built through the website building tool to be audited needs to be audited, and whether the data stream on the website is mostly illegal information or not is judged. The website category is a forum type to which most information content published on the website may belong, and may include a normal category and a violation category, where the violation category may specifically include a violation category containing various violations or illegal statements, such as a violence category, a terrorist category, or a political sensitivity category. It should be noted that, in this embodiment, it is mainly determined whether there is a large number of misunderstandings issued by a lawbreaker in an unregistered website, and then it is determined whether the unregistered website is an offending website, at this time, since a data stream in the website may include many different types of offending contents, and currently, only some offending types need to be determined, the website category in this embodiment is a specified offending type to be determined.
Specifically, if it is determined that the website building tool corresponding to the current data stream is the website building tool to be checked, that is, the website where the current data stream is located is built by the website building tool to be checked, and some inappropriate statements may be issued by lawless persons due to non-record, at this time, a website category which is specified in advance and needs to be checked is found out first, the website category may include a plurality of website categories, a second key feature which is preset under each website category and can clearly indicate that the data stream is a certain website category is determined at the same time, at this time, features in the current data stream are respectively matched with second key features under different website categories to obtain feature matching results of the second key features under the current data stream and the different website categories, and then, the checking category of the website where the current data stream is located is determined according to each feature matching degree; if the website where the current data stream is located is determined to be the illegal website, a professional department is informed to manage and modify the website, the safety of internet information is guaranteed, and information which is harmful to netizen safety and thought health is prevented from being spread in the internet.
Meanwhile, after determining that the station building tool corresponding to the current data stream is the station building tool to be checked, it is determined that the website where the current data stream is located needs to be checked at this time, so this embodiment may further include: and analyzing the website where the current data stream is located. Specifically, the corresponding host information is obtained by analyzing the current data stream, and the website information where the current data stream is located is determined according to the host information.
In addition, when the present embodiment performs feature matching on the current data stream with the first key feature under different website building tools and the second key feature under different website categories, because the characteristics of the first key feature and the second key feature are different, and the positions of the first key feature and the second key feature in the data stream are also different, the present embodiment may further include: and extracting the identification features and the content features of the current data stream.
Correspondingly, in this embodiment, determining the station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key feature under different station building tools may specifically include: determining a station building tool corresponding to the current data stream according to the feature matching result of the identification feature in the current data stream and the first key feature under different station building tools; determining the review category of the website where the current data stream is located according to the feature matching result of the second key features of the current data stream and different website categories, which may specifically include: and determining the auditing category of the website where the current data stream is located according to the feature matching result of the content features of the current data stream and the second key features under different website categories.
Specifically, the current data stream includes a Uniform Resource Locator (URL) and a data body corresponding to specific content of the data, at this time, a corresponding identification feature is extracted from the URL of the current data stream, and feature matching is subsequently performed with a first key feature under different website building tools; corresponding content features are extracted from a data body of the current data stream, and feature matching is subsequently carried out on the content features and second key features under different website categories, so that the high efficiency and the accuracy of each feature matching are ensured.
According to the technical scheme provided by the embodiment, when determining whether the station building tool corresponding to the current data stream is the station building tool to be checked according to the feature matching result of the current data stream and the first key features under different station building tools, feature matching is continuously performed on the current data stream and the preset second key features under different website categories, so that the checking category of the website where the current data stream is located is determined according to the feature matching degree, at the moment, the corresponding checking category can be obtained only by determining the features of the current data stream, the domain name of the website where the current data stream is located does not need to be determined, checking of unregistered websites can be realized, checking omission of unregistered websites is avoided, and the comprehensiveness of checking of websites is improved.
Example two
Fig. 2 is a flowchart of a website auditing method according to a second embodiment of the present invention. The embodiment of the invention is optimized on the basis of the embodiment. Optionally, this embodiment mainly explains in detail the specific determination process of the audit category of the website where the current data stream is located.
Specifically, referring to fig. 2, the method of this embodiment may specifically include:
s210, determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools.
And S220, if the station building tool corresponding to the current data stream is the station building tool to be checked, calculating the matching frequency of the current data stream and each second key feature under different website types.
Optionally, if it is determined that the site building tool corresponding to the current data stream is the to-be-audited tool, acquiring each feature analyzed from the current data stream, performing feature matching with each second key feature included in different preset website categories, and counting the occurrence frequency of each second key feature in the features of the current data stream, that is, the matching frequency of the second key feature, to represent the feature content of the side expression of the current data stream.
And S230, calculating the likelihood of the website where the current data stream is located in different website categories according to the matching frequency and the corresponding weight of the second key features.
Optionally, for each second key feature in different website categories, a corresponding weight may be preset according to the association degree between the second key feature and the corresponding website category, in this embodiment, the weight of each second key feature may be set according to a part of speech of the feature and experience, and at this time, after the matching frequency between the current data stream and each second key feature in different website categories is calculated, the likelihood of the website where the current data stream is located in different website categories may be calculated in a weighted summation manner according to the matching frequency and the corresponding weight of the second key feature in different website categories, so as to indicate the feature matching degree between the current data stream and each website category.
Illustratively, the calculation formula of the likelihood in this embodiment is: value ═ Σiγi*Counti
Wherein Value is the current data flowLikelihood, gamma, of the site under a certain site categoryiIs the weight, Count, of the ith second key feature under the website categoryiAnd matching the current data stream with the ith second key feature in the website category.
S240, determining the auditing category of the website where the current data stream is located according to the likelihood of the website where the current data stream is located under different website categories and the corresponding likelihood threshold.
Optionally, in this embodiment, for each website category, a corresponding likelihood threshold may be preset, and whether the likelihood of the website where the current data stream is located in different website categories exceeds the likelihood threshold corresponding to the website category is determined, so as to determine the audit category of the website where the current data stream is located. Specifically, if the likelihood of the website where the current data stream is located in the corresponding website category exceeds the likelihood threshold corresponding to the website category, the website category is used as one of the review category compositions until the review category composition of the website where the current data stream is located is obtained, so that the review category of the website where the current data stream is located is not limited to one.
According to the technical scheme provided by the embodiment, when determining whether the station building tool corresponding to the current data stream is the station building tool to be checked according to the feature matching result of the current data stream and the first key features under different station building tools, feature matching is continuously performed on the current data stream and the preset second key features under different website categories, so that the checking category of the website where the current data stream is located is determined according to the feature matching degree, at the moment, the corresponding checking category can be obtained only by determining the features of the current data stream, the domain name of the website where the current data stream is located does not need to be determined, checking of unregistered websites can be realized, checking omission of unregistered websites is avoided, and the comprehensiveness of checking of websites is improved.
EXAMPLE III
Fig. 3 is a flowchart of a website review method according to a third embodiment of the present invention. The embodiment of the invention is optimized on the basis of the embodiment. Optionally, the present embodiment mainly explains the specific determination process of the first key feature under different website building tools in detail.
Specifically, referring to fig. 3, the method of this embodiment may specifically include:
s310, two or more data stream samples under different station building tools are obtained.
Optionally, because the data stream issued by the website built by the different website building tools includes the feature information capable of indicating the specific website building tool selected by the website, for the first key feature under the different website building tools, in this embodiment, the first key feature under the website building tool may be determined by performing feature analysis on a large number of known data streams issued on the website built by the corresponding website building tool, and at this time, a large number of known historical data streams issued by the known website built by the different website building tools are first obtained and used as the data stream sample in this embodiment.
And S320, taking the feature which commonly exists in the data stream samples under each station building tool as the first key feature under the station building tool.
Optionally, analyzing a large number of data stream samples under different station building tools to obtain each feature included in each data stream sample, and at this time, analyzing, by using the features in each data stream sample under the station building tool, for each station building tool, to screen out a feature that commonly exists in each data stream sample under the station building tool, which is used as a first key feature under the station building tool; and sequentially obtaining the first key characteristics under each station building tool according to the steps.
For example, if there are a plurality of first key features under a certain station building tool, it will take a long time to match the features in the current data stream, so in this embodiment, the features that coexist in each data stream sample under each station building tool are used as the first key features under the station building tool, which may specifically include: selecting the common existing characteristics in each data stream sample under each station building tool, and calculating the frequency of the characteristics; and taking the characteristic of the target quantity at the top of the frequency as the first key characteristic under the station building tool.
Specifically, after the features that commonly exist in each data stream sample under each station building tool are selected, the frequency of occurrence of each feature in the data stream sample, that is, the frequency of the feature in this embodiment, is calculated to indicate the specific gravity of the feature under the corresponding station building tool, and then the topN algorithm is adopted to select again the features of the target number with the top frequency as the first key feature under the station building tool, at this time, the target number generally selects any one of 3 to 5, so that the common feature with the higher specific gravity is used as the first key feature in this embodiment, the accuracy of the first key feature is ensured, and the efficiency of matching the subsequent features is improved.
S330, determining the station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools.
S340, if the website building tool corresponding to the current data stream is the website building tool to be audited, the auditing category of the website where the current data stream is located is determined according to the feature matching result of the current data stream and the second key features under different website categories.
In this embodiment, the determination of the first key feature, the determination of the site establishment tool of the website where the current data stream is located, and the calculation of the likelihood of the website may be configured on the same physical computer, and in order to ensure the high efficiency of the calculation, the determination of the first key feature, the determination of the site establishment tool of the website where the current data stream is located, and the calculation of the likelihood of the website may be configured on different physical computers in the same cluster, respectively, so as to improve the calculation efficiency of each step.
The technical solution provided by this embodiment takes the feature commonly existing in a large number of data stream samples under each station building tool as the first key feature under the station building tool, so as to ensure the accuracy of the first key feature, meanwhile, when the station building tool corresponding to the current data stream is determined to be the station building tool to be checked according to the feature matching result of the current data stream and the first key feature under different station building tools, continuously carrying out feature matching on the current data stream and second key features under different preset website categories, and then determining the auditing category of the website where the current data stream is located according to the feature matching degree, wherein the corresponding auditing category can be obtained only by determining the features of the current data stream, the auditing of the unregistered website can be realized without determining the domain name of the website where the current data stream is located, the auditing omission of the unregistered website is avoided, and the comprehensiveness of the website auditing is improved.
Example four
Fig. 4 is a schematic structural diagram of an apparatus for website review according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus may include:
the station building tool determining module 410 is configured to determine a station building tool corresponding to a current data stream according to a feature matching result of the current data stream and first key features of different station building tools;
and the review category determining module 420 is configured to determine, if the website building tool corresponding to the current data stream is the target website building tool, the review category of the website where the current data stream is located according to the feature matching result of the second key features of the current data stream and different website categories.
According to the technical scheme provided by the embodiment, when the station building tool corresponding to the current data stream is determined to be the station building tool to be checked according to the feature matching result of the first key features of the current data stream and different station building tools, feature matching is continuously performed on the current data stream and the preset second key features of different website types, the checking type of the website where the current data stream is located is further determined according to the feature matching degree, at the moment, the corresponding checking type can be obtained only by determining the features of the current data stream, the domain name of the website where the current data stream is located does not need to be determined, checking of unregistered websites can be achieved, checking omission of unregistered websites is avoided, and the comprehensiveness of website checking is improved.
Further, the audit class determination module 420 may be specifically configured to:
calculating the matching frequency of the current data stream and each second key feature under different website categories;
calculating the likelihood of the website where the current data stream is located under different website types according to the matching frequency and the corresponding weight of the second key feature;
and determining the auditing type of the website where the current data stream is located according to the likelihood of the website where the current data stream is located under different website types and the corresponding likelihood threshold value.
Further, the audit category determining module 420 may be specifically configured to:
and if the likelihood of the website where the current data stream is located under the corresponding website type exceeds the corresponding likelihood threshold, taking the website type as one of the auditing type compositions until the auditing type composition of the website where the current data stream is located is obtained.
Further, the apparatus for checking a website may further include:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein the sample acquisition module is used for acquiring two or more data stream samples under different station building tools;
and the first characteristic determining module is used for taking the characteristic which commonly exists in the data stream samples under each station building tool as the first key characteristic under the station building tool.
Further, the first characteristic determining module may be specifically configured to:
selecting the common existing characteristics in each data stream sample under each station building tool, and calculating the frequency of the characteristics;
and taking the characteristic of the target quantity at the top of the frequency as the first key characteristic under the station building tool.
Further, the apparatus for checking a website may further include:
the characteristic extraction module is used for extracting the identification characteristic and the content characteristic of the current data stream;
accordingly, the station building tool determination module 410 is described above. May be used in particular for:
determining a station building tool corresponding to the current data stream according to the feature matching result of the identification feature in the current data stream and the first key feature under different station building tools;
the audit class determination module 420 may be specifically configured to:
and determining the auditing type of the website where the current data stream is located according to the feature matching result of the content features in the current data stream and the second key features under different website types.
Further, the apparatus for checking a website may further include:
and the website analysis module is used for analyzing the website where the current data stream is located.
The website auditing device provided by the embodiment can be applied to the website auditing method provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the apparatus comprises a processor 50, storage means 51 and communication means 52; the number of processors 50 in the device may be one or more, and one processor 50 is taken as an example in fig. 5; the processor 50, the storage means 51 and the communication means 52 of the device may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The storage device 51 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as modules corresponding to the website auditing method in the embodiment of the present invention. The processor 50 executes various functional applications and data processing of the device by executing software programs, instructions and modules stored in the storage device 51, that is, the above-described website auditing method is realized.
The storage device 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 51 may further include memory located remotely from the processor 50, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The communication device 52 may implement a network connection or a mobile data connection.
The device provided by the embodiment can be used for executing the website auditing method provided by any embodiment, and has corresponding functions and beneficial effects.
Example six
The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the method for website auditing in any of the above embodiments. The method specifically comprises the following steps:
determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools;
and if the website building tool corresponding to the current data stream is the website building tool to be audited, determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and the second key features under different website categories.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method for website review provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the apparatus for checking a website, each unit and each module included in the apparatus are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for website review, comprising:
determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and first key features under different station building tools;
if the website building tool corresponding to the current data stream is the website building tool to be audited, determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories;
determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories, wherein the determining step comprises the following steps:
calculating the matching frequency of the current data stream and each second key feature under different website categories;
calculating the likelihood of the website where the current data stream is located under different website categories according to the matching frequency and the corresponding weight of the second key feature;
and determining the auditing category of the website where the current data stream is located according to the likelihood of the website where the current data stream is located in different website categories and the corresponding likelihood threshold value.
2. The method of claim 1, wherein determining the review category of the website where the current data flow is located according to the likelihoods of the websites where the current data flow is located in different website categories and corresponding likelihood thresholds comprises:
and if the likelihood of the website where the current data stream is located under the corresponding website category exceeds the corresponding likelihood threshold, taking the website category as one of the audit category compositions until the audit category composition of the website where the current data stream is located is obtained.
3. The method of claim 1, before determining the site creation tool corresponding to the current data stream according to the feature matching result between the current data stream and the first key feature under different site creation tools, further comprising:
acquiring two or more data stream samples under different station building tools;
and taking the feature which is commonly existed in the data stream samples under each station building tool as the first key feature under the station building tool.
4. The method of claim 3, wherein using the feature that is commonly present in the data stream samples under each site creation tool as the first key feature under the site creation tool comprises:
selecting the common existing characteristics in each data stream sample under each station building tool, and calculating the frequency of the characteristics;
and taking the characteristic of the target quantity with the top frequency as a first key characteristic under the station building tool.
5. The method according to any one of claims 1-4, further comprising:
extracting identification features and content features of the current data stream;
correspondingly, determining the station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key feature under different station building tools comprises:
determining a station building tool corresponding to the current data stream according to the feature matching result of the identification feature in the current data stream and a first key feature under different station building tools;
determining the auditing category of the website where the current data stream is located according to the feature matching result of the current data stream and second key features under different website categories, wherein the determining step comprises the following steps:
and determining the auditing category of the website where the current data stream is located according to the feature matching result of the content features in the current data stream and the second key features under different website categories.
6. The method according to any one of claims 1 to 4, wherein before determining the review category of the website where the current data stream is located according to the feature matching result between the current data stream and the second key feature in different website categories, the method further comprises:
and analyzing the website where the current data stream is located.
7. An apparatus for website auditing, comprising:
the station building tool determining module is used for determining a station building tool corresponding to the current data stream according to the feature matching result of the current data stream and the first key features under different station building tools;
an audit category determining module, configured to determine, if the site building tool corresponding to the current data stream is a target site building tool, an audit category of a website where the current data stream is located according to a feature matching result of second key features of the current data stream and different website categories;
the audit category determining module is specifically configured to:
calculating the matching frequency of the current data stream and each second key feature under different website categories;
calculating the likelihood of the website where the current data stream is located under different website types according to the matching frequency and the corresponding weight of the second key feature;
and determining the auditing category of the website where the current data stream is located according to the likelihood of the website where the current data stream is located under different website categories and the corresponding likelihood threshold.
8. An apparatus, characterized in that the detection apparatus comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method for website review as recited in any of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of website review as set forth in any one of claims 1-6.
CN201911072110.8A 2019-11-05 2019-11-05 Website auditing method, device, equipment and storage medium Active CN110837619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911072110.8A CN110837619B (en) 2019-11-05 2019-11-05 Website auditing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911072110.8A CN110837619B (en) 2019-11-05 2019-11-05 Website auditing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110837619A CN110837619A (en) 2020-02-25
CN110837619B true CN110837619B (en) 2022-07-12

Family

ID=69576331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911072110.8A Active CN110837619B (en) 2019-11-05 2019-11-05 Website auditing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110837619B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9654495B2 (en) * 2006-12-01 2017-05-16 Websense, Llc System and method of analyzing web addresses
CN102663000B (en) * 2012-03-15 2016-08-03 北京百度网讯科技有限公司 The maliciously recognition methods of the method for building up of network address database, maliciously network address and device
CN102722567B (en) * 2012-05-30 2016-08-03 杭州遥指科技有限公司 The screening technique of a kind of internal information of standing and device
CN103530562A (en) * 2013-10-23 2014-01-22 腾讯科技(深圳)有限公司 Method and device for identifying malicious websites
CN104125209B (en) * 2014-01-03 2015-09-09 腾讯科技(深圳)有限公司 Malice website prompt method and router
CN107612893B (en) * 2017-09-01 2020-06-02 北京百悟科技有限公司 Short message auditing system and method and short message auditing model building method
CN109067726B (en) * 2018-07-24 2021-04-13 北京知道创宇信息技术股份有限公司 Identification method and device for station building system, electronic equipment and storage medium
CN109672678B (en) * 2018-12-24 2021-05-14 亚信科技(中国)有限公司 Phishing website identification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Information Filtering: Overview of Issues, Research and Systems;ri Hanani,Bracha Shapira,Peretz Shoval;《User Modeling and User-Adapted Interaction》;20010331;全文 *
基于内容过滤的企业建站审核系统;翟艳娣;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20110315;全文 *

Also Published As

Publication number Publication date
CN110837619A (en) 2020-02-25

Similar Documents

Publication Publication Date Title
CN108768883B (en) Network traffic identification method and device
CN105634855B (en) The abnormality recognition method and device of network address
CN111401416A (en) Abnormal website identification method and device and abnormal countermeasure identification method
CN106534146A (en) Safety monitoring system and method
JP2018528517A (en) Method, apparatus and system for detecting fraudulent software promotions
CN110602029A (en) Method and system for identifying network attack
CN112311803B (en) Rule base updating method and device, electronic equipment and readable storage medium
CN112769633B (en) Proxy traffic detection method and device, electronic equipment and readable storage medium
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN111131070B (en) Port time sequence-based network traffic classification method and device and storage medium
CN106301979B (en) Method and system for detecting abnormal channel
CN116015842A (en) Network attack detection method based on user access behaviors
CN110827036A (en) Method, device, equipment and storage medium for detecting fraudulent transactions
CN111125704B (en) Webpage Trojan horse recognition method and system
CN106850632B (en) Method and device for detecting abnormal combined data
CN110837619B (en) Website auditing method, device, equipment and storage medium
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN112073396A (en) Method and device for detecting transverse movement attack behavior of intranet
CN109409091B (en) Method, device and equipment for detecting Web page and computer storage medium
CN107995167B (en) Equipment identification method and server
CN110598115A (en) Sensitive webpage identification method and system based on artificial intelligence multi-engine
CN110866831A (en) Asset activity level determination method and device and server
CN115567316A (en) Method and device for detecting abnormality of access data
CN112468444B (en) Internet domain name abuse identification method and device, electronic equipment and storage medium
CN114900356A (en) Malicious user behavior detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant