CN112733057A - Network content security detection method, electronic device and storage medium - Google Patents

Network content security detection method, electronic device and storage medium Download PDF

Info

Publication number
CN112733057A
CN112733057A CN202011355159.7A CN202011355159A CN112733057A CN 112733057 A CN112733057 A CN 112733057A CN 202011355159 A CN202011355159 A CN 202011355159A CN 112733057 A CN112733057 A CN 112733057A
Authority
CN
China
Prior art keywords
data
network
content
preset
network content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011355159.7A
Other languages
Chinese (zh)
Inventor
龙文洁
莫金友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Anheng Information Security Technology Co Ltd
Original Assignee
Hangzhou Anheng Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Anheng Information Security Technology Co Ltd filed Critical Hangzhou Anheng Information Security Technology Co Ltd
Priority to CN202011355159.7A priority Critical patent/CN112733057A/en
Publication of CN112733057A publication Critical patent/CN112733057A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a network content security detection method, an electronic device and a storage medium, wherein the network content security detection method comprises the following steps: acquiring first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic; detecting first data of first network content through a deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity. By the method and the device, the problem of limited detection range of the network content is solved, the network content is detected by analyzing network flow and a crawler, and the detection range of the network content is expanded.

Description

Network content security detection method, electronic device and storage medium
Technical Field
The present application relates to the field of security detection, and in particular, to a method, an electronic device, and a storage medium for detecting security of network content.
Background
With the rapid development of the internet, intelligent devices and various new businesses, data presentation on the internet is increasing explosively, and interactive contents such as pictures, videos, messages, chats and the like become indispensable parts for people to express feelings, record events and daily work. These increasing contents are also full of various uncontrollable risk factors, and currently, there is a lack of effective detection means for content compliance of pictures and videos in websites and traffic.
The existing website content safety detection device and method are mainly based on a crawler technology, the source of a detection object is single, the detection range is limited, data cannot be passively acquired from large-scale network flow, illegal information in the data cannot be stored, and the problem that the detection range of network content is limited is caused.
At present, no effective solution is provided for the problem of limited network content detection range in the related art.
Disclosure of Invention
The embodiment of the application provides a network content security detection method, an electronic device and a storage medium, which are used for at least solving the problem of limited network content detection range in the related art.
In a first aspect, an embodiment of the present application provides a method for detecting network content security, including:
acquiring first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic;
detecting first data of the first network content through a deep learning model, and determining the similarity between the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents;
and determining the network content safety detection result according to the similarity.
In some embodiments, determining the network content security detection result according to the similarity includes:
judging whether the similarity of the first data and the preset network data is greater than a preset threshold value or not;
and determining that the network content has illegal content under the condition that the similarity is judged to be larger than a preset threshold value.
In some embodiments, the preset data acquisition mode includes the network content analysis based on the network traffic, and the acquiring the first network content acquired by the preset data acquisition mode includes:
acquiring access data generated by website access, wherein the access data at least comprises flow data;
intercepting target traffic data from the traffic data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a traffic mirror image;
analyzing the target flow data to obtain at least first picture data, and determining that the first network content comprises the first picture data.
In some embodiments, intercepting the target traffic data in a preset interception manner with respect to the traffic data includes: and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain the target flow data.
In some embodiments, the preset data acquisition mode includes web crawler crawling, and the acquiring the first web content acquired according to the preset data acquisition mode includes:
the method comprises the steps of adopting a web crawler to at least obtain website home page content of a first target website, and determining that the first network content at least comprises the website home page content of the first target website.
In some embodiments, the first network content includes second picture data, the preset network data includes sample pictures, detecting first data of the first network content through a deep learning model, and determining similarity between the first data and the preset network data includes:
detecting picture content of the second picture data through a deep learning model;
and comparing the picture content of the second picture data with the sample picture to determine the similarity.
In some embodiments, the first network content further includes first target information of first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes acquiring the first target information, where the target information includes a source address and a target address corresponding to the first data;
and storing the target information and the first data into a preset database.
In some embodiments, the first network content further includes second target information of the first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes:
acquiring the second target information, wherein the second target information at least comprises a URL (uniform resource locator) website of a second target website crawled to the first network content by a web crawler:
crawling the second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website;
and at least storing the webpage content, the URL website and the first data into a preset database.
In a second aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the network content security detection method according to the first aspect is implemented.
In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the network content security detection method according to the first aspect.
Compared with the related art, the network content security detection method, the electronic device and the storage medium provided by the embodiment of the application acquire the first network content acquired according to the preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic; detecting first data of the first network content through a deep learning model, and determining the similarity between the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity, solving the problem of limited network content detection range, and realizing the detection of the network content by two modes of analyzing network flow and crawler.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a terminal of a network content security detection method according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for detecting network content security according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for web content security detection according to a preferred embodiment of the present application;
fig. 4 is a block diagram of a network content security detection method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking the operation on the terminal as an example, fig. 1 is a hardware structure block diagram of the terminal of the network content security detection method according to the embodiment of the present invention. As shown in fig. 1, the terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the network content security detection method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
Various technologies described in the application can be used for a content safety detection system, content safety is based on a deep learning technology, content risk intelligent identification services of multimedia such as pictures, videos, voices and characters are provided, and manual auditing cost can be greatly reduced.
Before describing and explaining embodiments of the present application, a description will be given of the related art used in the present application as follows:
and (3) deep learning algorithm: (Deep Learning, DL), also called artificial neural network, is a sub-field of machine Learning, and its final goal is to make a machine able to have an analysis Learning ability like a human, and to recognize data such as characters, images, and sounds.
Optical character recognition technology: optical Character Recognition (OCR) uses Optical technology and computer technology to read out characters printed or written on paper and convert them into a format that can be accepted by computer and understood by human, and at present, it mainly uses convolutional neural network as feature extractor and classifier to input Character image and output Recognition result.
And (3) natural language processing: natural Language Processing (NLP) is an important research direction in the field of computer science and artificial intelligence, and it uses computer to process, understand and use human Language (such as chinese and english) to achieve effective communication between human and computer.
The present embodiment provides a method for detecting network content security, and fig. 2 is a flowchart of a method for detecting network content security according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, acquiring a first network content acquired according to a preset data acquisition mode, where the preset data acquisition mode at least includes one of the following: and analyzing the network content and crawling the network crawler on the basis of the network traffic.
Step S202, detecting first data of the first network content through a deep learning model, and determining similarity between the first data and preset network data, wherein the preset network data comprises preset illegal network data and is used for determining whether the first data is illegal content.
In this embodiment, the first data includes pictures, videos, and texts in the website, and the deep learning model includes a deep learning algorithm, an OCR algorithm, and an NLP algorithm.
And step S203, determining a network content safety detection result according to the similarity.
In this embodiment, the similarity refers to the similarity between the pictures, videos, and texts in the website and the illegal network data.
Through the steps S201 to S203, the first network content acquired according to the preset data acquisition mode is acquired, where the preset data acquisition mode at least includes one of the following: analyzing network content and crawling a web crawler based on network traffic; detecting first data of first network content through a deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity, solving the problem of limited network content detection range, realizing the detection of the network content by two modes of analyzing network flow and crawler, and expanding the detection range of the network content.
In this embodiment, determining the network content security detection result according to the similarity includes the following steps:
step 1, judging whether the similarity of the first data and the preset network data is greater than a preset threshold value.
And 2, determining that the network content has illegal content under the condition that the similarity is judged to be greater than a preset threshold value.
In this embodiment, the preset threshold is set according to actual needs, the smaller the threshold is set, the more the output illegal content related information is, the larger the threshold device is, the higher the similarity between the output illegal content and the preset network data is, for example, the preset threshold may be set to 100%, and the output illegal content is completely consistent with the preset network data.
Whether the network content has illegal content or not is determined through the similarity and the preset threshold in the steps, so that the safety detection of the network content is realized, and the quality of the network content is improved.
In this embodiment, the preset data acquisition mode includes analyzing network content based on network traffic, and acquiring the first network content acquired according to the preset data acquisition mode includes the following steps:
step 1, obtaining access data generated by website access, wherein the access data at least comprises flow data.
And 2, intercepting the target flow data of the flow data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a flow mirror image.
In this embodiment, the preset intercepting manner further includes storing a log of the network data, and the log of the network data may be obtained by setting an address for storing the log of the network data.
And 3, analyzing the target flow data to at least obtain first picture data, and determining that the first network content comprises the first picture data.
And acquiring target flow data from the flow data in the preset program mode in the step, analyzing the target flow to obtain first picture data, acquiring pictures in network contents from the network flow, and preparing for subsequent network content safety detection.
In this embodiment, intercepting the target traffic data according to the preset interception mode for the traffic data includes: and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain target flow data.
By the method, the interception of the target flow data is realized, and preparation is made for subsequently acquiring the picture data in the target flow data.
In this embodiment, analyzing the target traffic data to obtain at least the first picture data includes the following steps:
step 1, analyzing the target flow data to obtain key field information in the target flow data.
In this embodiment, the key field information includes an image, and the image indicates that the target traffic data includes picture content.
And 2, acquiring the byte stream containing the pictures according to the key field information.
In the present embodiment, a byte stream containing pictures is acquired according to the key field information image.
And 3, restoring the byte stream containing the picture into first picture data according to the content restoration function.
In this embodiment, before the first picture data is obtained through the content reduction function, the content reduction function is converted according to the first picture data format so as to conform to the first picture data format, and assuming that the content reduction function is a buff2Image function and the first picture data format is jpg, the buff2Image function is converted so that the byte stream containing the picture can decode and output the first picture data in the jpg format through the buff2Image function.
And obtaining field value contents through the key field information in the steps, and reducing the field value contents into first picture data according to a content reduction function, so that conversion of target flow data into the first picture data is realized, and preparation is made for subsequent identification of the first picture data.
In this embodiment, the preset data acquisition mode includes crawling by a web crawler, and acquiring the first network content acquired according to the preset data acquisition mode includes: and adopting a web crawler to at least obtain the website home page content of the first target website, and determining that the first network content at least comprises the website home page content of the first target website.
By the method, the crawler can actively crawl the target website content, and preparation is made for subsequently recognizing the website content.
In this embodiment, the first network content includes second picture data, the preset network data includes a sample picture, the detecting of the first data of the first network content by the deep learning model, and the determining of the similarity between the first data and the preset network data includes the following steps:
step 1, detecting picture content of second picture data through a deep learning model;
and 2, comparing the picture content of the second picture data with the sample picture, and determining the similarity.
And detecting the picture content of the second picture data through the deep learning model in the steps, comparing the picture content with the sample picture to determine the similarity, determining the size of the similarity, and preparing for subsequently determining illegal contents in the network content.
In some embodiments, the first network content further includes first target information of the first data, and after it is determined that the network content has illegal content when the similarity is greater than the preset threshold, the method further includes acquiring the first target information, where the target information includes a source address and a target address corresponding to the first data; and storing the target information and the first data into a preset database.
By the method, the illegal contents existing in the network contents and the related information of the illegal contents are stored, the related information of the illegal contents comprises the source address and the target address corresponding to the first data, and the source tracing of the illegal contents is facilitated.
In some embodiments, the first network content further includes second target information of the first data, and after determining that the network content has illegal content when the similarity is greater than the preset threshold, the method further includes the following steps:
step 1, second target information is obtained, wherein the second target information at least comprises a URL (uniform resource locator) website of a second target website crawled to first network content by a web crawler:
step 2, crawling a second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website;
and 3, at least storing the webpage content, the URL website and the first data into a preset database.
In this embodiment, the preset database includes a cloud server and a mobile terminal device.
And crawling a second target website through the URL website in the step to obtain the webpage content of the second target website, and storing the webpage content, the URL website and the first data into a preset database, so that the illegal content in the network content and the related information of the illegal content are stored, wherein the related information of the illegal content comprises the whole webpage content with the illegal content and the corresponding URL website.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 3 is a flowchart of a network content security detection method according to a preferred embodiment of the present application, and as shown in fig. 3, the network content security detection method includes the following steps:
step S301, a configuration rule is set.
Setting a configuration rule before acquiring webpage content, wherein the configuration rule comprises configuration of a detection scene, whether a flow analysis and reduction function is started and configuration of a screening strategy, the screening strategy comprises removing a sensitive picture with illegal information, removing a sensitive word with illegal information and removing a video with illegal information, and if an image field in a webpage contains an illegal field font, the illegal field font is deleted.
Step S302, setting configuration data.
Setting configuration data comprises setting flow analysis rules, user names/passwords, operation logs and webpage content intercepting modes.
The webpage content intercepting mode comprises flow mirror image and log acquisition, wherein the flow mirror image is acquired through an open source flow acquisition tool, the flow mirror image is used for configuring the IP and the port of a monitored website, and the log acquisition is realized through configuring a website log storage address.
Step S303, acquiring the web page content.
The method comprises the steps of acquiring webpage content by adopting a mechanism combining passive detection and active detection, wherein the passive detection comprises the step of acquiring network flow through flow mirroring and the step of acquiring log data of the network flow through stored logs, and the active detection comprises the step of acquiring a home page of a target website and a page of a total station by using a crawler module, so that original pictures and characters in the home page of the target website and the page of the total station can be acquired.
By the method, the key network flow data are collected through the flow analysis and reduction equipment, the home page and the total-station page of the target website are obtained by combining the crawler module, the information sources are richer, and the scene of content compliance detection of pictures and videos in the outlet flow of the large broadband network is supported.
The method comprises the steps of obtaining SIP (source IP), SPORT (source port), DIP (target IP) and DPORT (target port) by analyzing data in log/flow, for example, in the obtained website flow, the content related to pictures generally has an identification image, obtaining specific identification names by analyzing the content of the flow, the habits of different websites are different, obtaining the content from the beginning of a picture to the end of the picture by analyzing the website flow, then obtaining the content, carrying out byte stream conversion to obtain the original picture and characters, and restoring the picture, the characters and the video in the data in the log/flow by the following steps.
Step 1, adopting a passive data acquisition unit to intercept a POST request of HTTP/HTTPS in network flow to obtain network flow data, and obtaining log data of the network flow through a stored log;
and 2, analyzing the network flow data and the log data, acquiring a key field in the network flow, acquiring a byte stream of the network content according to the key field in the network flow, converting a content reduction function according to the output format of the network content, and reducing the byte stream of the network content into corresponding pictures, characters and videos through the content conversion function.
For example, the key field of the picture is Image, the content reduction function is buff2Image, and the byte stream containing the picture is obtained according to the key field Image, because many illegal pictures in an illegal website are disguised, for example, a.jpg is disguised as a.jpg.bak, the content reduction function buff2Image needs to be converted before the picture is reduced, so that the byte stream containing the picture can be decoded by the content reduction function to generate a picture with a corresponding format, and the picture with the corresponding format comprises jpg, bmp and png.
Through the steps, after the network flow data and the log data are analyzed, the pictures, the characters and the videos in the flow are restored through the content conversion function, and preparation is made for subsequently detecting whether the contents of the pictures, the characters and the videos are legal or not.
And step S304, intelligently detecting and storing the webpage content.
The intelligent detection and storage of the webpage content comprises the following steps:
step 1, inputting the obtained original pictures and characters into a content security detection model, wherein the content security detection model comprises a plurality of content security detection algorithms and sample data, the content security detection algorithms comprise a deep learning algorithm, an OCR algorithm and an NLP algorithm, the images and videos can be audited through the deep learning algorithm, whether the pictures and the videos contain unsafe information or not is audited, universal characters and rarely-used characters are identified through the OCR algorithm, semantics, emotional tendencies and comment viewpoints of the network articles are analyzed through the NLP algorithm, and after the semantics, emotional tendencies and comment viewpoints of the network articles are analyzed through the NLP algorithm, if the semantics, emotional tendencies and comment viewpoints of the articles have illegal information, corresponding documents and comments are stored, so that subsequent tracing is facilitated.
And 2, identifying the original picture and the characters by using a content security detection algorithm, and comparing the identification result with sample data to obtain the type of the original picture and the characters and the similarity between the original picture and the sample data and the similarity between the characters and the sample data, wherein the sample data comprises illegal pictures and characters, and the type comprises scenes with illegal information.
And 3, setting a threshold, judging whether the similarity is greater than the threshold, if the similarity is greater than the set threshold, setting the corresponding picture or character as illegal information, for example, setting the threshold to be 70%, if the similarity is greater than 70%, indicating that the corresponding picture or character is illegal information, if the illegal information is found, retaining the illegal information, and storing SIP and DIP of the illegal information, wherein the storage positions comprise a server, equipment for intercepting network traffic and equipment where a crawler module is located.
Through the steps, the user can know who uploads the illegal pictures and characters through the SIP and DIP of the illegal information and know who accesses the illegal contents, and the purpose of tracing the source of the illegal information is achieved by storing the illegal information and the SIP and DIP of the illegal information.
In one embodiment, a crawler module is used for acquiring a home page and a total station page of a target website, after unconventional pictures and characters are found through a content security detection model, an original webpage is crawled and stored through the crawler module, a subsequent output report is used for tracing and evidence obtaining of a supervision unit, and report contents comprise webpage addresses, storage time and the unconventional pictures and characters.
Through the mode, the crawler module crawls and stores illegal original webpages, and outputs corresponding reports according to illegal information, so that follow-up tracing and evidence obtaining are facilitated.
Step S305 returns the result of detecting compliance of the web content.
The illegal information in the web page content is determined through the step S304, the illegal information in the web page is removed through the compliance control layer, and the content compliance check result including the URL, the original web page, the type and the similarity of the illegal information is returned.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here. For example, step S301 and step S302 may be interchanged.
The present embodiment further provides a device for detecting network content security, where the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a network content security detection apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes:
an obtaining module 41, configured to obtain a first network content collected according to a preset data collection manner, where the preset data collection manner at least includes one of the following: analyzing network content and crawling a web crawler based on network traffic;
a detecting module 42, connected to the obtaining module 41, configured to detect first data of the first network content through the deep learning model, and determine a similarity between the first data and preset network data, where the preset network data includes preset illegal network data, and is used to determine whether the first data is illegal content;
and the content determining module 43 is connected to the detecting module 42 and is used for determining the network content security detection result according to the similarity.
In one embodiment, the content determining module 43 is configured to determine whether the similarity between the first data and the preset network data is greater than a preset threshold; and determining that the illegal content exists in the network content under the condition that the similarity is judged to be larger than the preset threshold value.
In one embodiment, the preset data acquisition mode includes analyzing network content based on network traffic, and the acquisition module 41 is configured to acquire access data generated by website access, where the access data at least includes traffic data; intercepting target flow data of the flow data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a flow mirror image; and analyzing the target flow data to obtain at least first picture data, and determining that the first network content comprises the first picture data.
In one embodiment, the obtaining module 41 is configured to intercept a POST request of HTTP/HTTPs by using a preset traffic interpreter, and obtain target traffic data.
In one embodiment, the preset data acquisition manner includes web crawler crawling, and the obtaining module 41 is configured to use the web crawler to obtain at least the website homepage content of the first target website, and determine that the first web content at least includes the website homepage content of the first target website.
In one embodiment, the first network content includes second picture data, the preset network data includes sample pictures, and the detection module 42 is configured to detect the picture content of the second picture data through a deep learning model; and comparing the picture content of the second picture data with the sample picture to determine the similarity.
In one embodiment, the first network content further includes first target information of the first data, and the network content security detection apparatus is further configured to obtain the first target information, where the target information includes a source address and a target address corresponding to the first data; and storing the target information and the first data into a preset database.
In one embodiment, the first network content further includes second target information of the first data, and the network content security detection device is further configured to obtain the second target information, where the second target information at least includes a URL website of a second target website crawled to the first network content by a web crawler; crawling a second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website; at least the webpage content, the URL website and the first data are stored in a preset database.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring the first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: and analyzing the network content and crawling the network crawler on the basis of the network traffic.
And S2, detecting first data of the first network content through the deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprises preset illegal network data and is used for determining whether the first data is illegal content.
And S3, determining the network content safety detection result according to the similarity.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the network content security detection method in the foregoing embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any one of the network content security detection methods in the above embodiments.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for detecting the security of network contents is characterized by comprising the following steps:
acquiring first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic;
detecting first data of the first network content through a deep learning model, and determining the similarity between the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents;
and determining the network content safety detection result according to the similarity.
2. The method according to claim 1, wherein determining the network content security detection result according to the similarity comprises:
judging whether the similarity of the first data and the preset network data is greater than a preset threshold value or not;
and determining that the network content has illegal content under the condition that the similarity is judged to be larger than a preset threshold value.
3. The method according to claim 1, wherein the preset data acquisition mode includes the network content analysis based on the network traffic, and the acquiring of the first network content acquired by the preset data acquisition mode includes:
acquiring access data generated by website access, wherein the access data at least comprises flow data;
intercepting target traffic data from the traffic data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a traffic mirror image;
analyzing the target flow data to obtain at least first picture data, and determining that the first network content comprises the first picture data.
4. The method for detecting the security of the network content according to claim 3, wherein intercepting the target traffic data according to a preset interception mode for the traffic data comprises:
and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain the target flow data.
5. The method for detecting the security of the network contents according to claim 1, wherein the preset data acquisition manner comprises web crawler crawling, and the acquiring of the first network contents acquired according to the preset data acquisition manner comprises:
the method comprises the steps of adopting a web crawler to at least obtain website home page content of a first target website, and determining that the first network content at least comprises the website home page content of the first target website.
6. The method according to claim 1, wherein the first network content includes second picture data, the preset network data includes sample pictures, the detecting the first data of the first network content through a deep learning model, and the determining the similarity between the first data and the preset network data includes:
detecting picture content of the second picture data through a deep learning model;
and comparing the picture content of the second picture data with the sample picture to determine the similarity.
7. The method according to claim 2, wherein the first network content further includes first target information of first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes obtaining the first target information, where the target information includes a source address and a target address corresponding to the first data;
and storing the target information and the first data into a preset database.
8. The method according to claim 2, wherein the first network content further includes second target information of the first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes:
acquiring the second target information, wherein the second target information at least comprises a URL (uniform resource locator) website of a second target website crawled to the first network content by a web crawler:
crawling the second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website;
and at least storing the webpage content, the URL website and the first data into a preset database.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the network content security detection method according to any one of claims 1 to 8.
10. A storage medium having a computer program stored thereon, wherein the computer program is configured to execute the network content security detection method according to any one of claims 1 to 8 when the computer program runs.
CN202011355159.7A 2020-11-27 2020-11-27 Network content security detection method, electronic device and storage medium Pending CN112733057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011355159.7A CN112733057A (en) 2020-11-27 2020-11-27 Network content security detection method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011355159.7A CN112733057A (en) 2020-11-27 2020-11-27 Network content security detection method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN112733057A true CN112733057A (en) 2021-04-30

Family

ID=75597819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011355159.7A Pending CN112733057A (en) 2020-11-27 2020-11-27 Network content security detection method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112733057A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821681A (en) * 2021-09-17 2021-12-21 深圳力维智联技术有限公司 Video tag generation method, device and equipment
CN113904851A (en) * 2021-10-11 2022-01-07 中国电信股份有限公司 Network information processing method, user plane function system, medium, and electronic device
CN114610982A (en) * 2022-04-06 2022-06-10 微纵联合网络科技(武汉)有限公司 Computer network data acquisition, analysis and management method, equipment and storage medium
CN115905600A (en) * 2022-12-25 2023-04-04 合肥仟佰策科技有限公司 Network security analysis system and method based on big data platform

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899549A (en) * 2015-12-18 2017-06-27 北京奇虎科技有限公司 A kind of network security detection method and device
CN107743128A (en) * 2017-10-31 2018-02-27 哈尔滨工业大学(威海) It is a kind of that domain name and the illegal website method for digging with service IP are associated based on homepage
CN108304584A (en) * 2018-03-06 2018-07-20 百度在线网络技术(北京)有限公司 Illegal page detection method, apparatus, intruding detection system and storage medium
CN109274632A (en) * 2017-07-12 2019-01-25 中国移动通信集团广东有限公司 A kind of recognition methods of website and device
CN109391706A (en) * 2018-11-07 2019-02-26 顺丰科技有限公司 Domain name detection method, device, equipment and storage medium based on deep learning
CN111651658A (en) * 2020-06-05 2020-09-11 杭州安恒信息技术股份有限公司 Method and computer equipment for automatically identifying website based on deep learning
CN111859234A (en) * 2020-06-03 2020-10-30 北京神州泰岳智能数据技术有限公司 Illegal content identification method and device, electronic equipment and storage medium
US20200366712A1 (en) * 2019-05-14 2020-11-19 International Business Machines Corporation Detection of Phishing Campaigns Based on Deep Learning Network Detection of Phishing Exfiltration Communications

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899549A (en) * 2015-12-18 2017-06-27 北京奇虎科技有限公司 A kind of network security detection method and device
CN109274632A (en) * 2017-07-12 2019-01-25 中国移动通信集团广东有限公司 A kind of recognition methods of website and device
CN107743128A (en) * 2017-10-31 2018-02-27 哈尔滨工业大学(威海) It is a kind of that domain name and the illegal website method for digging with service IP are associated based on homepage
CN108304584A (en) * 2018-03-06 2018-07-20 百度在线网络技术(北京)有限公司 Illegal page detection method, apparatus, intruding detection system and storage medium
CN109391706A (en) * 2018-11-07 2019-02-26 顺丰科技有限公司 Domain name detection method, device, equipment and storage medium based on deep learning
US20200366712A1 (en) * 2019-05-14 2020-11-19 International Business Machines Corporation Detection of Phishing Campaigns Based on Deep Learning Network Detection of Phishing Exfiltration Communications
CN111859234A (en) * 2020-06-03 2020-10-30 北京神州泰岳智能数据技术有限公司 Illegal content identification method and device, electronic equipment and storage medium
CN111651658A (en) * 2020-06-05 2020-09-11 杭州安恒信息技术股份有限公司 Method and computer equipment for automatically identifying website based on deep learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821681A (en) * 2021-09-17 2021-12-21 深圳力维智联技术有限公司 Video tag generation method, device and equipment
CN113821681B (en) * 2021-09-17 2023-09-26 深圳力维智联技术有限公司 Video tag generation method, device and equipment
CN113904851A (en) * 2021-10-11 2022-01-07 中国电信股份有限公司 Network information processing method, user plane function system, medium, and electronic device
CN114610982A (en) * 2022-04-06 2022-06-10 微纵联合网络科技(武汉)有限公司 Computer network data acquisition, analysis and management method, equipment and storage medium
CN114610982B (en) * 2022-04-06 2023-01-06 中咨数据有限公司 Computer network data acquisition, analysis and management method, equipment and storage medium
CN115905600A (en) * 2022-12-25 2023-04-04 合肥仟佰策科技有限公司 Network security analysis system and method based on big data platform
CN115905600B (en) * 2022-12-25 2023-12-12 广东朝阳企讯通科技有限公司 Network security analysis system and method based on big data platform

Similar Documents

Publication Publication Date Title
CN112733057A (en) Network content security detection method, electronic device and storage medium
CN108737423B (en) Phishing website discovery method and system based on webpage key content similarity analysis
CN112468520B (en) Data detection method, device and equipment and readable storage medium
KR100848319B1 (en) Harmful web site filtering method and apparatus using web structural information
CN106844685B (en) Method, device and server for identifying website
CN111008405A (en) Website fingerprint identification method based on file Hash
CN111931188A (en) Vulnerability testing method and system under login scene
CN113038153B (en) Financial live broadcast violation detection method, device, equipment and readable storage medium
CN104023046B (en) Mobile terminal recognition method and device
US20130191323A1 (en) System and method for identifying the context of multimedia content elements displayed in a web-page
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN115757991A (en) Webpage identification method and device, electronic equipment and storage medium
CN114422211A (en) HTTP malicious traffic detection method and device based on graph attention network
CN115563600A (en) Data auditing method and device, electronic equipment and storage medium
CN113810375B (en) Webshell detection method, device and equipment and readable storage medium
CN114448664A (en) Phishing webpage identification method and device, computer equipment and storage medium
CN107786529B (en) Website detection method, device and system
CN106982147B (en) Communication monitoring method and device for Web communication application
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
US20160315886A1 (en) Network information push method, apparatus and system based on instant messaging
CN115437930B (en) Webpage application fingerprint information identification method and related equipment
CN113762040B (en) Video identification method, device, storage medium and computer equipment
CN115410201A (en) Method, device and related equipment for processing verification code characters
CN113300915A (en) Device identification method, system, electronic apparatus, and storage medium
Li et al. Edge‐Based Detection and Classification of Malicious Contents in Tor Darknet Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210430

RJ01 Rejection of invention patent application after publication