CN112733057A

CN112733057A - Network content security detection method, electronic device and storage medium

Info

Publication number: CN112733057A
Application number: CN202011355159.7A
Authority: CN
Inventors: 龙文洁; 莫金友
Original assignee: Hangzhou Anheng Information Security Technology Co Ltd
Current assignee: Hangzhou Anheng Information Security Technology Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-04-30

Abstract

The application relates to a network content security detection method, an electronic device and a storage medium, wherein the network content security detection method comprises the following steps: acquiring first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic; detecting first data of first network content through a deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity. By the method and the device, the problem of limited detection range of the network content is solved, the network content is detected by analyzing network flow and a crawler, and the detection range of the network content is expanded.

Description

Network content security detection method, electronic device and storage medium

Technical Field

The present application relates to the field of security detection, and in particular, to a method, an electronic device, and a storage medium for detecting security of network content.

Background

With the rapid development of the internet, intelligent devices and various new businesses, data presentation on the internet is increasing explosively, and interactive contents such as pictures, videos, messages, chats and the like become indispensable parts for people to express feelings, record events and daily work. These increasing contents are also full of various uncontrollable risk factors, and currently, there is a lack of effective detection means for content compliance of pictures and videos in websites and traffic.

The existing website content safety detection device and method are mainly based on a crawler technology, the source of a detection object is single, the detection range is limited, data cannot be passively acquired from large-scale network flow, illegal information in the data cannot be stored, and the problem that the detection range of network content is limited is caused.

At present, no effective solution is provided for the problem of limited network content detection range in the related art.

Disclosure of Invention

The embodiment of the application provides a network content security detection method, an electronic device and a storage medium, which are used for at least solving the problem of limited network content detection range in the related art.

In a first aspect, an embodiment of the present application provides a method for detecting network content security, including:

acquiring first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic;

detecting first data of the first network content through a deep learning model, and determining the similarity between the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents;

and determining the network content safety detection result according to the similarity.

In some embodiments, determining the network content security detection result according to the similarity includes:

judging whether the similarity of the first data and the preset network data is greater than a preset threshold value or not;

and determining that the network content has illegal content under the condition that the similarity is judged to be larger than a preset threshold value.

In some embodiments, the preset data acquisition mode includes the network content analysis based on the network traffic, and the acquiring the first network content acquired by the preset data acquisition mode includes:

acquiring access data generated by website access, wherein the access data at least comprises flow data;

intercepting target traffic data from the traffic data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a traffic mirror image;

analyzing the target flow data to obtain at least first picture data, and determining that the first network content comprises the first picture data.

In some embodiments, intercepting the target traffic data in a preset interception manner with respect to the traffic data includes: and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain the target flow data.

In some embodiments, the preset data acquisition mode includes web crawler crawling, and the acquiring the first web content acquired according to the preset data acquisition mode includes:

the method comprises the steps of adopting a web crawler to at least obtain website home page content of a first target website, and determining that the first network content at least comprises the website home page content of the first target website.

In some embodiments, the first network content includes second picture data, the preset network data includes sample pictures, detecting first data of the first network content through a deep learning model, and determining similarity between the first data and the preset network data includes:

detecting picture content of the second picture data through a deep learning model;

and comparing the picture content of the second picture data with the sample picture to determine the similarity.

In some embodiments, the first network content further includes first target information of first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes acquiring the first target information, where the target information includes a source address and a target address corresponding to the first data;

and storing the target information and the first data into a preset database.

In some embodiments, the first network content further includes second target information of the first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes:

acquiring the second target information, wherein the second target information at least comprises a URL (uniform resource locator) website of a second target website crawled to the first network content by a web crawler:

crawling the second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website;

and at least storing the webpage content, the URL website and the first data into a preset database.

In a second aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the network content security detection method according to the first aspect is implemented.

In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the network content security detection method according to the first aspect.

Compared with the related art, the network content security detection method, the electronic device and the storage medium provided by the embodiment of the application acquire the first network content acquired according to the preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: analyzing network content and crawling a web crawler based on network traffic; detecting first data of the first network content through a deep learning model, and determining the similarity between the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity, solving the problem of limited network content detection range, and realizing the detection of the network content by two modes of analyzing network flow and crawler.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a terminal of a network content security detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for detecting network content security according to an embodiment of the present application;

FIG. 3 is a flow chart of a method for web content security detection according to a preferred embodiment of the present application;

fig. 4 is a block diagram of a network content security detection method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking the operation on the terminal as an example, fig. 1 is a hardware structure block diagram of the terminal of the network content security detection method according to the embodiment of the present invention. As shown in fig. 1, the terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the network content security detection method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

Various technologies described in the application can be used for a content safety detection system, content safety is based on a deep learning technology, content risk intelligent identification services of multimedia such as pictures, videos, voices and characters are provided, and manual auditing cost can be greatly reduced.

Before describing and explaining embodiments of the present application, a description will be given of the related art used in the present application as follows:

and (3) deep learning algorithm: (Deep Learning, DL), also called artificial neural network, is a sub-field of machine Learning, and its final goal is to make a machine able to have an analysis Learning ability like a human, and to recognize data such as characters, images, and sounds.

Optical character recognition technology: optical Character Recognition (OCR) uses Optical technology and computer technology to read out characters printed or written on paper and convert them into a format that can be accepted by computer and understood by human, and at present, it mainly uses convolutional neural network as feature extractor and classifier to input Character image and output Recognition result.

And (3) natural language processing: natural Language Processing (NLP) is an important research direction in the field of computer science and artificial intelligence, and it uses computer to process, understand and use human Language (such as chinese and english) to achieve effective communication between human and computer.

The present embodiment provides a method for detecting network content security, and fig. 2 is a flowchart of a method for detecting network content security according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S201, acquiring a first network content acquired according to a preset data acquisition mode, where the preset data acquisition mode at least includes one of the following: and analyzing the network content and crawling the network crawler on the basis of the network traffic.

Step S202, detecting first data of the first network content through a deep learning model, and determining similarity between the first data and preset network data, wherein the preset network data comprises preset illegal network data and is used for determining whether the first data is illegal content.

In this embodiment, the first data includes pictures, videos, and texts in the website, and the deep learning model includes a deep learning algorithm, an OCR algorithm, and an NLP algorithm.

And step S203, determining a network content safety detection result according to the similarity.

In this embodiment, the similarity refers to the similarity between the pictures, videos, and texts in the website and the illegal network data.

Through the steps S201 to S203, the first network content acquired according to the preset data acquisition mode is acquired, where the preset data acquisition mode at least includes one of the following: analyzing network content and crawling a web crawler based on network traffic; detecting first data of first network content through a deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprise preset illegal network data and are used for determining whether the first data are illegal contents; and determining the network content safety detection result according to the similarity, solving the problem of limited network content detection range, realizing the detection of the network content by two modes of analyzing network flow and crawler, and expanding the detection range of the network content.

In this embodiment, determining the network content security detection result according to the similarity includes the following steps:

step 1, judging whether the similarity of the first data and the preset network data is greater than a preset threshold value.

And 2, determining that the network content has illegal content under the condition that the similarity is judged to be greater than a preset threshold value.

In this embodiment, the preset threshold is set according to actual needs, the smaller the threshold is set, the more the output illegal content related information is, the larger the threshold device is, the higher the similarity between the output illegal content and the preset network data is, for example, the preset threshold may be set to 100%, and the output illegal content is completely consistent with the preset network data.

Whether the network content has illegal content or not is determined through the similarity and the preset threshold in the steps, so that the safety detection of the network content is realized, and the quality of the network content is improved.

In this embodiment, the preset data acquisition mode includes analyzing network content based on network traffic, and acquiring the first network content acquired according to the preset data acquisition mode includes the following steps:

step 1, obtaining access data generated by website access, wherein the access data at least comprises flow data.

And 2, intercepting the target flow data of the flow data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a flow mirror image.

In this embodiment, the preset intercepting manner further includes storing a log of the network data, and the log of the network data may be obtained by setting an address for storing the log of the network data.

And 3, analyzing the target flow data to at least obtain first picture data, and determining that the first network content comprises the first picture data.

And acquiring target flow data from the flow data in the preset program mode in the step, analyzing the target flow to obtain first picture data, acquiring pictures in network contents from the network flow, and preparing for subsequent network content safety detection.

In this embodiment, intercepting the target traffic data according to the preset interception mode for the traffic data includes: and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain target flow data.

By the method, the interception of the target flow data is realized, and preparation is made for subsequently acquiring the picture data in the target flow data.

In this embodiment, analyzing the target traffic data to obtain at least the first picture data includes the following steps:

step 1, analyzing the target flow data to obtain key field information in the target flow data.

In this embodiment, the key field information includes an image, and the image indicates that the target traffic data includes picture content.

And 2, acquiring the byte stream containing the pictures according to the key field information.

In the present embodiment, a byte stream containing pictures is acquired according to the key field information image.

And 3, restoring the byte stream containing the picture into first picture data according to the content restoration function.

In this embodiment, before the first picture data is obtained through the content reduction function, the content reduction function is converted according to the first picture data format so as to conform to the first picture data format, and assuming that the content reduction function is a buff2Image function and the first picture data format is jpg, the buff2Image function is converted so that the byte stream containing the picture can decode and output the first picture data in the jpg format through the buff2Image function.

And obtaining field value contents through the key field information in the steps, and reducing the field value contents into first picture data according to a content reduction function, so that conversion of target flow data into the first picture data is realized, and preparation is made for subsequent identification of the first picture data.

In this embodiment, the preset data acquisition mode includes crawling by a web crawler, and acquiring the first network content acquired according to the preset data acquisition mode includes: and adopting a web crawler to at least obtain the website home page content of the first target website, and determining that the first network content at least comprises the website home page content of the first target website.

By the method, the crawler can actively crawl the target website content, and preparation is made for subsequently recognizing the website content.

In this embodiment, the first network content includes second picture data, the preset network data includes a sample picture, the detecting of the first data of the first network content by the deep learning model, and the determining of the similarity between the first data and the preset network data includes the following steps:

step 1, detecting picture content of second picture data through a deep learning model;

and 2, comparing the picture content of the second picture data with the sample picture, and determining the similarity.

And detecting the picture content of the second picture data through the deep learning model in the steps, comparing the picture content with the sample picture to determine the similarity, determining the size of the similarity, and preparing for subsequently determining illegal contents in the network content.

In some embodiments, the first network content further includes first target information of the first data, and after it is determined that the network content has illegal content when the similarity is greater than the preset threshold, the method further includes acquiring the first target information, where the target information includes a source address and a target address corresponding to the first data; and storing the target information and the first data into a preset database.

By the method, the illegal contents existing in the network contents and the related information of the illegal contents are stored, the related information of the illegal contents comprises the source address and the target address corresponding to the first data, and the source tracing of the illegal contents is facilitated.

In some embodiments, the first network content further includes second target information of the first data, and after determining that the network content has illegal content when the similarity is greater than the preset threshold, the method further includes the following steps:

step 1, second target information is obtained, wherein the second target information at least comprises a URL (uniform resource locator) website of a second target website crawled to first network content by a web crawler:

step 2, crawling a second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website;

and 3, at least storing the webpage content, the URL website and the first data into a preset database.

In this embodiment, the preset database includes a cloud server and a mobile terminal device.

And crawling a second target website through the URL website in the step to obtain the webpage content of the second target website, and storing the webpage content, the URL website and the first data into a preset database, so that the illegal content in the network content and the related information of the illegal content are stored, wherein the related information of the illegal content comprises the whole webpage content with the illegal content and the corresponding URL website.

The embodiments of the present application are described and illustrated below by means of preferred embodiments.

Fig. 3 is a flowchart of a network content security detection method according to a preferred embodiment of the present application, and as shown in fig. 3, the network content security detection method includes the following steps:

step S301, a configuration rule is set.

Setting a configuration rule before acquiring webpage content, wherein the configuration rule comprises configuration of a detection scene, whether a flow analysis and reduction function is started and configuration of a screening strategy, the screening strategy comprises removing a sensitive picture with illegal information, removing a sensitive word with illegal information and removing a video with illegal information, and if an image field in a webpage contains an illegal field font, the illegal field font is deleted.

Step S302, setting configuration data.

Setting configuration data comprises setting flow analysis rules, user names/passwords, operation logs and webpage content intercepting modes.

The webpage content intercepting mode comprises flow mirror image and log acquisition, wherein the flow mirror image is acquired through an open source flow acquisition tool, the flow mirror image is used for configuring the IP and the port of a monitored website, and the log acquisition is realized through configuring a website log storage address.

Step S303, acquiring the web page content.

The method comprises the steps of acquiring webpage content by adopting a mechanism combining passive detection and active detection, wherein the passive detection comprises the step of acquiring network flow through flow mirroring and the step of acquiring log data of the network flow through stored logs, and the active detection comprises the step of acquiring a home page of a target website and a page of a total station by using a crawler module, so that original pictures and characters in the home page of the target website and the page of the total station can be acquired.

By the method, the key network flow data are collected through the flow analysis and reduction equipment, the home page and the total-station page of the target website are obtained by combining the crawler module, the information sources are richer, and the scene of content compliance detection of pictures and videos in the outlet flow of the large broadband network is supported.

The method comprises the steps of obtaining SIP (source IP), SPORT (source port), DIP (target IP) and DPORT (target port) by analyzing data in log/flow, for example, in the obtained website flow, the content related to pictures generally has an identification image, obtaining specific identification names by analyzing the content of the flow, the habits of different websites are different, obtaining the content from the beginning of a picture to the end of the picture by analyzing the website flow, then obtaining the content, carrying out byte stream conversion to obtain the original picture and characters, and restoring the picture, the characters and the video in the data in the log/flow by the following steps.

Step 1, adopting a passive data acquisition unit to intercept a POST request of HTTP/HTTPS in network flow to obtain network flow data, and obtaining log data of the network flow through a stored log;

and 2, analyzing the network flow data and the log data, acquiring a key field in the network flow, acquiring a byte stream of the network content according to the key field in the network flow, converting a content reduction function according to the output format of the network content, and reducing the byte stream of the network content into corresponding pictures, characters and videos through the content conversion function.

For example, the key field of the picture is Image, the content reduction function is buff2Image, and the byte stream containing the picture is obtained according to the key field Image, because many illegal pictures in an illegal website are disguised, for example, a.jpg is disguised as a.jpg.bak, the content reduction function buff2Image needs to be converted before the picture is reduced, so that the byte stream containing the picture can be decoded by the content reduction function to generate a picture with a corresponding format, and the picture with the corresponding format comprises jpg, bmp and png.

Through the steps, after the network flow data and the log data are analyzed, the pictures, the characters and the videos in the flow are restored through the content conversion function, and preparation is made for subsequently detecting whether the contents of the pictures, the characters and the videos are legal or not.

And step S304, intelligently detecting and storing the webpage content.

The intelligent detection and storage of the webpage content comprises the following steps:

step 1, inputting the obtained original pictures and characters into a content security detection model, wherein the content security detection model comprises a plurality of content security detection algorithms and sample data, the content security detection algorithms comprise a deep learning algorithm, an OCR algorithm and an NLP algorithm, the images and videos can be audited through the deep learning algorithm, whether the pictures and the videos contain unsafe information or not is audited, universal characters and rarely-used characters are identified through the OCR algorithm, semantics, emotional tendencies and comment viewpoints of the network articles are analyzed through the NLP algorithm, and after the semantics, emotional tendencies and comment viewpoints of the network articles are analyzed through the NLP algorithm, if the semantics, emotional tendencies and comment viewpoints of the articles have illegal information, corresponding documents and comments are stored, so that subsequent tracing is facilitated.

And 2, identifying the original picture and the characters by using a content security detection algorithm, and comparing the identification result with sample data to obtain the type of the original picture and the characters and the similarity between the original picture and the sample data and the similarity between the characters and the sample data, wherein the sample data comprises illegal pictures and characters, and the type comprises scenes with illegal information.

And 3, setting a threshold, judging whether the similarity is greater than the threshold, if the similarity is greater than the set threshold, setting the corresponding picture or character as illegal information, for example, setting the threshold to be 70%, if the similarity is greater than 70%, indicating that the corresponding picture or character is illegal information, if the illegal information is found, retaining the illegal information, and storing SIP and DIP of the illegal information, wherein the storage positions comprise a server, equipment for intercepting network traffic and equipment where a crawler module is located.

Through the steps, the user can know who uploads the illegal pictures and characters through the SIP and DIP of the illegal information and know who accesses the illegal contents, and the purpose of tracing the source of the illegal information is achieved by storing the illegal information and the SIP and DIP of the illegal information.

In one embodiment, a crawler module is used for acquiring a home page and a total station page of a target website, after unconventional pictures and characters are found through a content security detection model, an original webpage is crawled and stored through the crawler module, a subsequent output report is used for tracing and evidence obtaining of a supervision unit, and report contents comprise webpage addresses, storage time and the unconventional pictures and characters.

Through the mode, the crawler module crawls and stores illegal original webpages, and outputs corresponding reports according to illegal information, so that follow-up tracing and evidence obtaining are facilitated.

Step S305 returns the result of detecting compliance of the web content.

The illegal information in the web page content is determined through the step S304, the illegal information in the web page is removed through the compliance control layer, and the content compliance check result including the URL, the original web page, the type and the similarity of the illegal information is returned.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here. For example, step S301 and step S302 may be interchanged.

The present embodiment further provides a device for detecting network content security, where the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a network content security detection apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes:

an obtaining module 41, configured to obtain a first network content collected according to a preset data collection manner, where the preset data collection manner at least includes one of the following: analyzing network content and crawling a web crawler based on network traffic;

a detecting module 42, connected to the obtaining module 41, configured to detect first data of the first network content through the deep learning model, and determine a similarity between the first data and preset network data, where the preset network data includes preset illegal network data, and is used to determine whether the first data is illegal content;

and the content determining module 43 is connected to the detecting module 42 and is used for determining the network content security detection result according to the similarity.

In one embodiment, the content determining module 43 is configured to determine whether the similarity between the first data and the preset network data is greater than a preset threshold; and determining that the illegal content exists in the network content under the condition that the similarity is judged to be larger than the preset threshold value.

In one embodiment, the preset data acquisition mode includes analyzing network content based on network traffic, and the acquisition module 41 is configured to acquire access data generated by website access, where the access data at least includes traffic data; intercepting target flow data of the flow data according to a preset intercepting mode, wherein the preset intercepting mode at least comprises a flow mirror image; and analyzing the target flow data to obtain at least first picture data, and determining that the first network content comprises the first picture data.

In one embodiment, the obtaining module 41 is configured to intercept a POST request of HTTP/HTTPs by using a preset traffic interpreter, and obtain target traffic data.

In one embodiment, the preset data acquisition manner includes web crawler crawling, and the obtaining module 41 is configured to use the web crawler to obtain at least the website homepage content of the first target website, and determine that the first web content at least includes the website homepage content of the first target website.

In one embodiment, the first network content includes second picture data, the preset network data includes sample pictures, and the detection module 42 is configured to detect the picture content of the second picture data through a deep learning model; and comparing the picture content of the second picture data with the sample picture to determine the similarity.

In one embodiment, the first network content further includes first target information of the first data, and the network content security detection apparatus is further configured to obtain the first target information, where the target information includes a source address and a target address corresponding to the first data; and storing the target information and the first data into a preset database.

In one embodiment, the first network content further includes second target information of the first data, and the network content security detection device is further configured to obtain the second target information, where the second target information at least includes a URL website of a second target website crawled to the first network content by a web crawler; crawling a second target website according to the URL website to obtain webpage content of the second target website, wherein the webpage content comprises a website home page and a website total station page of the second target website; at least the webpage content, the URL website and the first data are stored in a preset database.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring the first network content acquired according to a preset data acquisition mode, wherein the preset data acquisition mode at least comprises one of the following modes: and analyzing the network content and crawling the network crawler on the basis of the network traffic.

And S2, detecting first data of the first network content through the deep learning model, and determining the similarity of the first data and preset network data, wherein the preset network data comprises preset illegal network data and is used for determining whether the first data is illegal content.

And S3, determining the network content safety detection result according to the similarity.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the network content security detection method in the foregoing embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any one of the network content security detection methods in the above embodiments.

It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for detecting the security of network contents is characterized by comprising the following steps:

2. The method according to claim 1, wherein determining the network content security detection result according to the similarity comprises:

3. The method according to claim 1, wherein the preset data acquisition mode includes the network content analysis based on the network traffic, and the acquiring of the first network content acquired by the preset data acquisition mode includes:

4. The method for detecting the security of the network content according to claim 3, wherein intercepting the target traffic data according to a preset interception mode for the traffic data comprises:

and intercepting the POST request of the HTTP/HTTPS by adopting a preset flow interpreter to obtain the target flow data.

5. The method for detecting the security of the network contents according to claim 1, wherein the preset data acquisition manner comprises web crawler crawling, and the acquiring of the first network contents acquired according to the preset data acquisition manner comprises:

6. The method according to claim 1, wherein the first network content includes second picture data, the preset network data includes sample pictures, the detecting the first data of the first network content through a deep learning model, and the determining the similarity between the first data and the preset network data includes:

7. The method according to claim 2, wherein the first network content further includes first target information of first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes obtaining the first target information, where the target information includes a source address and a target address corresponding to the first data;

and storing the target information and the first data into a preset database.

8. The method according to claim 2, wherein the first network content further includes second target information of the first data, and after determining that the network content has illegal content when determining that the similarity is greater than a preset threshold, the method further includes:

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the network content security detection method according to any one of claims 1 to 8.

10. A storage medium having a computer program stored thereon, wherein the computer program is configured to execute the network content security detection method according to any one of claims 1 to 8 when the computer program runs.