CN114553486A - Illegal data processing method and device, electronic equipment and storage medium - Google Patents

Illegal data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114553486A
CN114553486A CN202210066367.8A CN202210066367A CN114553486A CN 114553486 A CN114553486 A CN 114553486A CN 202210066367 A CN202210066367 A CN 202210066367A CN 114553486 A CN114553486 A CN 114553486A
Authority
CN
China
Prior art keywords
carrier
carriers
illegal data
time period
preset time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210066367.8A
Other languages
Chinese (zh)
Other versions
CN114553486B (en
Inventor
刘伟
林赛群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210066367.8A priority Critical patent/CN114553486B/en
Publication of CN114553486A publication Critical patent/CN114553486A/en
Application granted granted Critical
Publication of CN114553486B publication Critical patent/CN114553486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure provides an illegal data processing method and device, electronic equipment and a storage medium, and relates to the technical field of big data processing and the like. The specific implementation scheme is as follows: acquiring characteristic information of a carrier in a preset time period; detecting and determining that the carrier is an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier; and shielding the webpage of the illegal data carrier. According to the technology disclosed by the invention, the harm of illegal data is effectively reduced, so that the safety of a network environment is effectively improved.

Description

Illegal data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of big data processing, and in particular, to a method and an apparatus for processing illegal data, an electronic device, and a storage medium.
Background
Illegal data such as black and grey products in the internet form a relatively mature industrial chain.
The production of illegal data such as black and grey products and the like can be completely automated, even hundreds of billions of cheating links can be produced on a daily scale and distributed to the internet environment, and the environment of the internet is affected very badly.
Disclosure of Invention
The disclosure provides an illegal data processing method and device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided an illegal data processing method, including:
acquiring characteristic information of a carrier in a preset time period;
detecting and determining that the carrier is an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier;
and shielding the webpage of the illegal data carrier.
According to another aspect of the present disclosure, there is provided an illegal data processing apparatus including:
the acquisition module is used for acquiring the characteristic information of the carrier within a preset time period;
the detection module is used for detecting and determining the carrier as an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier;
and the processing module is used for shielding the webpage of the illegal data carrier.
According to still another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the aspects and any possible implementation described above.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the above-described aspect and any possible implementation.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspects and any possible implementation as described above.
According to the technology disclosed by the invention, the harm of illegal data is effectively reduced, so that the safety of a network environment is effectively improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is to be understood that the described embodiments are only a few, and not all, of the disclosed embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that the terminal device involved in the embodiments of the present disclosure may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), and other intelligent devices; the display device may include, but is not limited to, a personal computer, a television, and the like having a display function.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
At present, the excavation and treatment of illegal data such as black and gray products are basically posterior, and even after the user is injured, the illegal data is discovered after verification when the feedback of the user is obtained. Therefore, it is desirable to provide a technical solution capable of timely and effectively detecting illegal data, so as to detect illegal data at an earlier stage and reduce the harm of illegal data.
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure; as shown in fig. 1, this embodiment provides a method for processing illegal data, which may be applied to a server in a network, and specifically includes the following steps:
s101, acquiring characteristic information of a carrier in a preset time period;
s102, detecting and determining that the carrier is an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier;
s103, shielding the webpage of the illegal data carrier.
The illegal data of the present embodiment may refer to abnormal data that is illegal or not compliant, such as black gray. The appearance of the data not only destroys the ecological environment of the internet, but also causes a little bad experience for users. Therefore, it is necessary to detect and manage illegal data timely and effectively.
It is considered that the illegal data may be hidden in the network under an account corresponding to a certain entity, or under a home domain corresponding to a certain account, or under a site corresponding to a certain subject, or under a web page corresponding to a certain site, or even within a certain link of a certain web page. The carrier of the illegal data in the network may include web pages, sites, main domains, account numbers or entity information, etc., and may even be links.
No matter what form the carrier of illegal data is, if illegal data is, certain characteristic information will be generated on the carrier. For example, some illegal data producers may operate illegal data under their corresponding account, main domain, site, or web page by purchasing an entity in order to distribute the illegal data. At this time, it can be detected that the entity information, such as the entity information registered in the official records, is changed, and special attention is needed. And once an illegal data producer produces illegal data on a carrier, the carrier will produce some relevant characteristic information. In this embodiment, the characteristic information of the carrier in the preset time period is obtained. Detecting and determining the carrier as an illegal data carrier based on the characteristic information of the carrier and the characteristic range of the normal carrier; and further shielding the webpage of the carrier of illegal data.
Since the corresponding carrier is abnormal due to the illegal data, and the abnormal carrier is not represented at a certain time but only after accumulation of a certain time period, in this embodiment, the characteristic information of the carrier in the preset time period needs to be acquired. The length of the preset time period may be one day, several hours, or multiple consecutive days, or other time periods, and may be specifically set according to the property of the carrier.
In the illegal data processing method of this embodiment, when the carrier is detected and determined to be the carrier of the illegal data by the acquired characteristic information of the carrier within the preset time period and the characteristic range of the normal carrier, the web page of the carrier of the illegal data is shielded. By adopting the technical scheme of the embodiment, when the illegal data are generated on the carrier, the corresponding carrier can be detected, the webpage of the corresponding carrier can be shielded at the earliest stage of the production of the illegal data, the harm of the illegal data is effectively reduced, and the safety of a network environment is effectively improved.
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure; the illegal data processing method of the present embodiment is further described in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the illegal data processing method of this embodiment may specifically include the following steps:
s201, acquiring the quantity information of lower-level carriers included by carriers in a preset time period, the quantity information of flow guide links included by the carriers, the data quality information of the carriers, the explosion times of the quantity of the lower-level carriers of the carriers, and the ratio of the carriers to carriers of adjacent levels;
if the carrier is entity information, the corresponding subordinate carrier may sequentially include an account, a home domain, a site, a web page, and the like from top to bottom. The corresponding subordinate carrier information may include the number of accounts under the entity, the number of home domains under the entity, the number of sites under the entity, and the number of web pages under the entity. Similarly, if the carrier is an account, the corresponding subordinate carrier sequentially includes a home domain, a site, a webpage, and the like from top to bottom. The corresponding lower carrier information at this time may include the number of main domains under the account, the number of sites under the account, and the number of web pages under the account. For other similar carrier types, the description is omitted.
In practical application, the number information of the nearest neighbor subordinate carrier of the current carrier, or the number information of the nearest neighbor subordinate carrier, the subordinate carrier of the next subordinate carrier, and the carriers from the next subordinate carrier to the lowest level can be obtained according to requirements.
Based on the characteristics of the existing illegal data generation, if an illegal data producer uses a lower level carrier of a certain carrier to produce illegal data, the number of the lower level carrier of the certain carrier will increase suddenly, so that in this embodiment, it is necessary to obtain the number information of the lower level carriers included in the carrier within a preset time period.
In practical application, illegal data is spread in a network and can be spread in a diversion mode. Therefore, in this embodiment, the obtained characteristic information of the carrier may further include information of the number of the flow guide links included in the carrier.
In addition, in consideration of the fact that illegal data inevitably affects the data quality in the carrier since the illegal data exists in the carrier, the acquired characteristic information of the carrier may further include the data quality information of the carrier in the present embodiment. Specifically, the web pages in the carrier are sampled, whether the sampled web pages have quality problems or not is detected, if the quality problems exist, the web pages are suspected to be illegal data, and the score of the web pages is 1, and the score of the web pages without the quality problems is 0. A scored average of the sampled plurality of web pages is then calculated, with higher scores indicating lower quality of the carrier. For example, if 10 webpages under a certain carrier are sampled, 8 are divided into 1, and two are 0, the data quality information of the corresponding carrier is 0.8. Of course, in practical applications, other data quality identification methods may also be used to identify the data quality information, which is not limited herein.
Similar to the number information of the downloaded carriers, if the number of the lower level carriers of a certain carrier is exploded for a plurality of times within a preset time period, the carrier is considered to be an illegal carrier with a high possibility. In this embodiment, the attributes of the bursts may be set according to actual demand. For example, it may be defined that if the number of lower carriers exceeds a preset number threshold set based on experience or history, an outbreak is considered. Alternatively, 5%, 10% or some other percentage of the preset threshold may be exceeded and considered as an outbreak.
Furthermore, referring to the attributes of the illegal data carriers, for example, some illegal data operators can operate a plurality of account numbers, a plurality of main domains are arranged under each account number, a plurality of sites are arranged under each main domain, and only one first page is arranged under each site. Based on the special attribute of illegal data, the number ratio of bearer to bearer of adjacent level can also be obtained in this embodiment. Specifically, the number of the current bearer and the adjacent superior bearer may be taken as a ratio, and the number of the current bearer and the adjacent inferior bearer may also be taken as a ratio; or the method can also be realized in a cross-level mode, and the number ratio of the current carrier to the separated lower carriers is obtained. In short, the corresponding normal ranges are different in different quantity ratios.
In this embodiment, taking the simultaneous acquisition of the above feature information of the carrier within the preset time period as an example, in practical application, the acquired feature information of the carrier may also include only one of the feature information, or may also include a combination of two or more of the feature information. The more feature information is acquired, the higher the accuracy of detection of illegal data.
In addition, in practical applications, other characteristic information may also exist, which is not illustrated here.
S202, detecting the quantity information of lower carriers included by carriers in a preset time period, the quantity information of flow guide links included by the carriers, the data quality information of the carriers, the explosion times of the quantity of the lower carriers of the carriers, and whether the ratios of the carriers to carriers of adjacent levels are all in the characteristic range corresponding to normal carriers, if yes, executing a step S203; if not, go to step S204;
s203, determining the vector to be a normal vector; and (6) ending.
S204, determining the carrier as an illegal data carrier; step S205 is executed;
s205, shielding the webpage of the illegal data carrier.
According to statistics of the whole network data, the feature information of each type of normal carrier has a relatively normal feature range within a preset time period, and the historical data based on the whole network can be obtained through statistics.
In the step S202, when detecting, as long as there is a feature information, which is not in the feature range of the corresponding type of normal carrier, the carrier can be regarded as a carrier of illegal data. The method can be specifically divided into the following tests:
(1) detecting the quantity information of the subordinate carriers included in the carriers in a preset time period, and determining whether the quantity information of the subordinate carriers included in the normal carriers is within a first preset quantity range of the subordinate carriers included in the normal carriers, if so, determining that the carriers are normal carriers, otherwise, determining that the carriers are illegal data carriers;
(2) detecting the quantity information of the flow guide links included by the carrier in a preset time period, and judging whether the quantity information is within a second preset quantity range of the flow guide links included by a normal carrier, if so, determining that the carrier is the normal carrier, otherwise, determining that the carrier is an illegal data carrier;
(3) detecting data quality information of a carrier in a preset time period, judging whether the data quality information is in a preset scoring range of data quality included by a normal carrier, if so, determining that the carrier is the normal carrier, otherwise, determining that the carrier is an illegal data carrier;
(4) detecting the number of the outbreaks of the lower-level carrier number of the carrier in a preset time period, and judging whether the number of the lower-level carriers of the carrier is within the range of the outbreak threshold of the lower-level carrier number of the normal carrier, if so, determining that the carrier is the normal carrier, otherwise, determining that the carrier is the carrier of illegal data;
(5) detecting the number ratio of the carrier in a preset time period to the adjacent level carrier, and determining whether the number ratio is in the preset ratio range of a normal carrier, if so, determining that the carrier is the normal carrier, otherwise, determining that the carrier is an illegal data carrier.
In the illegal data processing method of this embodiment, by comparing the above feature information of the carrier with the corresponding feature range of the normal carrier, it can be detected whether the carrier is the carrier of the illegal data, and when the carrier is determined to be the carrier of the illegal data, the web page of the carrier is shielded. By adopting the mode of the embodiment, the carrier is detected when the illegal data is produced, the webpage of the corresponding carrier can be shielded at the earliest stage of the production of the illegal data, the harm of the illegal data is effectively reduced, and the safety of the network environment is effectively improved.
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure; the illegal data processing method of the present embodiment is further described in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 3, the illegal data processing method of this embodiment may specifically include the following steps:
s301, acquiring the ratio of the number of lower-level carriers included by carriers in a preset time period and a preset historical period, the ratio of the number of diversion links included by the carriers, the ratio of the number of outbreaks of the lower-level carriers of the carriers, and the ratio of the number of the carriers to the number of carriers in an adjacent level;
the difference from step S201 in the embodiment shown in fig. 2 is that the characteristic information of the carrier acquired in this step in this embodiment is different from the characteristic information shown in fig. 2. The characteristic information of the carrier acquired in this embodiment is biased toward some proportion information.
In this embodiment, the preset history period may be a nearest neighbor history period selected according to a requirement, or a history period of one year, one month, or longer history may be selected.
For example, the number of subordinate carriers included in a carrier in a preset time period and the number of subordinate carriers included in a carrier in a preset history period may be statistically obtained; then, the quantity ratio of the former to the latter is taken.
For example, the number of the flow guide links included in the carrier in a preset time period and the number of the flow guide links included in the carrier in a preset history period may also be counted; then, the quantity ratio of the former to the latter is taken.
For example, the number of outbreaks of the number of the lower-level carriers of the carrier in the preset time period and the number of outbreaks of the number of the lower-level carriers of the carrier in the preset history period may be counted; then, the quantity ratio of the former to the latter is taken.
For the data quality information of the carriers within the preset time period and the acquisition of the number ratio of the carriers to the carriers of the adjacent level, reference may be made to the related description in step S201.
Similarly, in this embodiment, taking the simultaneous acquisition of the above feature information as an example, in practical application, only one, two, or a combination of multiple of the feature information may be acquired. Of course, the more feature information is obtained, the more accurate it is to detect whether the current carrier is a carrier of illegal data.
S302, generating a comprehensive characteristic score of the carrier based on at least one of the number ratio of lower-level carriers included by the carrier in a preset time period and a preset historical period, the number ratio of diversion links included by the carrier, the ratio of the number of outbreaks of the lower-level carriers of the carrier, the data quality information of the carrier in the preset time period and the ratio of the carrier to the adjacent-level carrier, and a preset weight ratio;
the preset weight ratio in this embodiment may be set based on experience according to the importance of each feature information, and the important feature weight ratio may be set higher and the unimportant may be set lower. Of course, the weight ratio can be uniformly set. Then multiplying all the characteristic information by the weight ratio of the characteristic information respectively, and summing; and then averaged. Of course, averaging may not be required according to actual requirements.
S303, detecting whether the comprehensive characteristic score of the carrier is within a preset score range of a normal carrier, and if so, executing a step S304; if not, go to step S305;
the preset value range of the normal carrier in this embodiment may be counted based on the above feature information of the normal carrier in a plurality of historical time periods, so as to obtain a corresponding preset value range.
S304, determining the vector to be a normal vector; and (6) ending.
S305, determining the carrier as an illegal data carrier; executing step S306;
s306, shielding the webpage of the illegal data carrier.
In the illegal data processing method of this embodiment, the comprehensive characteristic score of the carrier is generated by matching the obtained characteristic information of the carrier with the corresponding weight ratio, and whether the carrier is an illegal carrier is detected based on the comprehensive characteristic score and the preset score range of the normal carrier, and when the carrier is determined to be an illegal data carrier, the web page of the carrier is shielded. By adopting the mode of the embodiment, the carrier is detected when the illegal data is produced, the webpage of the corresponding carrier can be shielded at the earliest stage of the production of the illegal data, the harm of the illegal data is effectively reduced, and the safety of the network environment is effectively improved.
It should be noted that the above-mentioned embodiments shown in fig. 2 and 3 can be used in combination, as long as one of the ways detects that the carrier is an illegal data carrier, the carrier is confirmed to be an illegal data carrier.
In addition, it should be noted that the vectors in different fields have different feature ranges corresponding to normal vectors and preset score ranges of normal vectors. In the embodiments shown in fig. 2 and fig. 3, illegal carriers in the field can be accurately detected based on the field information of the carrier.
Further optionally, in the above embodiment, as before steps S101, S201, and S301, the method may include:
monitoring the carrier scales of various carrier types based on the normal scale ranges of the various carrier types in a preset time period; determining whether all are within normal scale ranges; if the travel scale of a certain carrier type is abnormal, an alarm or other notification messages can be sent out; it can be determined that the vector type is abnormal in scale.
In this embodiment, during monitoring, the scale of the carrier corresponding to each carrier type in the whole network may be detected specifically, whether the scale is within the normal scale range of the corresponding carrier type or not may be detected, and if not, it may be determined that the carrier of the type has scale abnormality. Each corresponding vector of the abnormal vector type is then detected in a manner further employing any of the embodiments of fig. 1-3 described above.
By adopting the method, each type of carrier can be prevented from being directly detected, the detection types can be effectively reduced, and the detection efficiency is improved.
The carriers of the embodiments shown in fig. 1-3 described above are only for the case of entities, accounts, home domains, sites, and web pages. In practical application, some links can also be directly used as carriers to spread illegal data. At this time, whether the carrier is the carrier of illegal data or not can be detected and determined based on a preset rule or by adopting a pre-trained illegal data recognition model; if the carrier of illegal data is directly shielded.
The preset rule in this embodiment may summarize the corresponding rule based on the characteristics of illegal data propagated in the link.
In addition, the link carrying illegal data can be collected in advance, and an illegal data identification model is trained, so that the model can identify whether the illegal data exists in the link or not. When the link detection method is used, the current link to be detected is input to the illegal data identification model, and the illegal data identification model can predict and output the probability that the link carries illegal data. If the probability is greater than or equal to a preset probability threshold, the link is considered to carry illegal data, otherwise, if the probability is smaller than the preset probability threshold, the link is considered to be a normal link and does not carry illegal data.
By the method, the condition that the carrier is the link can be accurately identified, and whether the link carries illegal data or not can be accurately and effectively detected.
In addition, it should be noted that, in the above embodiments, the detection is performed in a preset time period, and in practical applications, the detection may be performed in some special scenes by shortening the preset time period. Or the detection time may also be adjusted according to actual needs, which is not limited herein.
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure; the embodiment provides an illegal data processing apparatus 400, which includes:
an obtaining module 401, configured to obtain feature information of a carrier within a preset time period;
a detecting module 402, configured to detect and determine that the carrier is an illegal data carrier based on the characteristic information of the carrier and the characteristic range of the normal carrier;
a processing module 403, configured to mask a webpage of the carrier of the illegal data.
The implementation principle and technical effect of the illegal data processing apparatus 400 of this embodiment that uses the above modules to implement the processing apparatus of illegal data are the same as those of the related method embodiments, and details of the above embodiments may be referred to, and are not described herein again.
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure; the embodiment provides an illegal data processing apparatus 500, which includes the same-name functional modules shown in fig. 4, an obtaining module 501, a detecting module 502, and a processing module 503.
In this embodiment, the obtaining module 501 is configured to:
acquiring at least one of the number information of the lower level carriers included by the carriers in the preset time period, the number information of the flow guide links included by the carriers, the data quality information of the carriers, the number of times of outbreaks of the lower level carriers of the carriers, and the number ratio of the carriers to the carriers of the adjacent level.
Further, the detecting module 502 is configured to:
detecting and determining that at least one of the quantity information of the subordinate carriers included in the carriers in a preset time period, the quantity information of the flow guide links included in the carriers, the data quality information of the carriers, the number of times of outbreaks of the quantity of the subordinate carriers of the carriers and the quantity ratio of the carriers to the carriers of the adjacent level is not in the characteristic range of normal carriers;
the carrier is determined to be a carrier of illegal data.
Or optionally, the obtaining module 501 is configured to:
acquiring at least one of the ratio of the number of the subordinate carriers included in the carriers in the preset time period to the number of the carriers in the preset history period, the ratio of the number of the diversion links included in the carriers, the ratio of the number of the subordinate carriers of the carriers, the data quality information of the carriers in the preset time period, and the ratio of the number of the carriers to the number of the carriers in the adjacent level.
Further, the detecting module 502 is configured to:
generating a comprehensive characteristic score of the carrier based on at least one of the number ratio of the subordinate carriers included in the carrier in a preset time period and a preset historical period, the number ratio of the diversion links included in the carrier, the ratio of the number of the subordinate carriers of the carrier, the data quality information of the carrier in the preset time period and the ratio of the carrier to the adjacent carrier, and a preset weight ratio;
and detecting and determining that the comprehensive characteristic score of the carrier is not within the preset value range of the normal carrier, and determining that the carrier is the carrier of illegal data.
Further, in an embodiment of the present disclosure, the illegal data processing apparatus 500 of this embodiment further includes:
a monitoring module 504, configured to monitor carrier scales of various carrier types based on normal scale ranges of the various carrier types within a preset time period;
and a determining module 505, configured to determine that a carrier type corresponding to the carrier has an abnormal scale.
Further, if the carrier type is a link; the detecting module 502 is further configured to detect and determine that the carrier is an illegal data carrier based on a preset rule or by using a pre-trained illegal data recognition model;
the processing module 503 is further configured to shield the carrier of illegal data.
The implementation principle and technical effect of the illegal data processing device 400 of this embodiment are the same as those of the related method embodiments described above, and details of the implementation principle and technical effect may be referred to the description of the embodiments described above, and are not repeated herein.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs various methods and processes described above, such as the methods described above of the present disclosure. For example, in some embodiments, the above-described methods of the present disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above-described method of the present disclosure described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the above-described methods of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method for processing illegal data comprises the following steps:
acquiring characteristic information of a carrier in a preset time period;
detecting and determining that the carrier is an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier;
and shielding the webpage of the illegal data carrier.
2. The method of claim 1, wherein the obtaining of the characteristic information of the carrier within the preset time period comprises:
acquiring at least one of the number information of the subordinate carriers included in the carrier, the number information of the flow guide links included in the carrier, the data quality information of the carrier, the number of times of outbreaks of the subordinate carriers of the carrier, and the number ratio of the carrier to the carriers of the adjacent level in the preset time period.
3. The method of claim 2, wherein detecting and determining that the carrier is a carrier of illegal data based on the characteristic information of the carrier and the characteristic range of a normal carrier comprises:
detecting and determining that at least one of the number information of the subordinate carriers included in the carrier, the number information of the flow guide links included in the carrier, the data quality information of the carrier, the number of times of outbreaks of the number of the subordinate carriers of the carrier and the number ratio of the carrier to the carriers of the adjacent level is not in the characteristic range of a normal carrier;
and determining that the carrier is an illegal data carrier.
4. The method of claim 1, wherein the obtaining of the characteristic information of the carrier within the preset time period comprises:
acquiring at least one of the number ratio of the subordinate carriers included in the carriers in the preset time period and the preset history period, the number ratio of the diversion links included in the carriers, the ratio of the number of the subordinate carriers of the carriers, the data quality information of the carriers in the preset time period, and the number ratio of the carriers to the carriers of the adjacent level.
5. The method of claim 4, wherein detecting and determining that the carrier is a carrier of illegal data based on the characteristic information of the carrier and the characteristic range of a normal carrier comprises:
generating a comprehensive characteristic score of the carrier based on at least one of the number ratio of subordinate carriers included by the carrier in the preset time period and a preset historical period, the number ratio of diversion links included by the carrier, the ratio of the number of outbreaks of subordinate carriers of the carrier, the data quality information of the carrier in the preset time period and the ratio of the carrier to carriers of adjacent levels, and a preset weight ratio;
and detecting and determining that the comprehensive characteristic score of the carrier is not within the preset value range of a normal carrier, and determining that the carrier is an illegal data carrier.
6. The method according to any one of claims 1 to 5, wherein before obtaining the characteristic information of the carrier within the preset time period, the method comprises:
monitoring the carrier scales of various carrier types based on the normal scale ranges of various carrier types in the preset time period;
and determining that the carrier type corresponding to the carrier has abnormal scale.
7. The method according to any one of claims 1-6, wherein if the bearer type is a link, the method further comprises:
detecting and determining that the carrier is an illegal data carrier based on a preset rule or by adopting a pre-trained illegal data recognition model;
the carrier of said illegal data is shielded.
8. An illegal data processing apparatus comprising:
the acquisition module is used for acquiring the characteristic information of the carrier within a preset time period;
the detection module is used for detecting and determining the carrier as an illegal data carrier based on the characteristic information of the carrier and the characteristic range of a normal carrier;
and the processing module is used for shielding the webpage of the illegal data carrier.
9. The apparatus of claim 8, wherein the means for obtaining is configured to:
acquiring at least one of the number information of the subordinate carriers included in the carrier, the number information of the flow guide links included in the carrier, the data quality information of the carrier, the number of times of outbreaks of the subordinate carriers of the carrier, and the number ratio of the carrier to the carriers of the adjacent level in the preset time period.
10. The apparatus of claim 9, wherein the detection module is to:
detecting and determining that at least one of the number information of the subordinate carriers included in the carrier, the number information of the flow guide links included in the carrier, the data quality information of the carrier, the number of times of outbreaks of the number of the subordinate carriers of the carrier and the number ratio of the carrier to the carriers of the adjacent level is not in the characteristic range of a normal carrier;
and determining that the carrier is an illegal data carrier.
11. The apparatus of claim 8, wherein the means for obtaining is configured to:
acquiring at least one of the number ratio of the subordinate carriers included in the carriers in the preset time period and the preset history period, the number ratio of the diversion links included in the carriers, the ratio of the number of the subordinate carriers of the carriers, the data quality information of the carriers in the preset time period, and the number ratio of the carriers to the carriers of the adjacent level.
12. The apparatus of claim 11, wherein the detection module is configured to detect the presence of the object
Generating a comprehensive characteristic score of the carrier based on at least one of the number ratio of subordinate carriers included by the carrier in the preset time period and a preset historical period, the number ratio of diversion links included by the carrier, the ratio of the number of outbreaks of subordinate carriers of the carrier, the data quality information of the carrier in the preset time period and the ratio of the carrier to carriers of adjacent levels, and a preset weight ratio;
and detecting and determining that the comprehensive characteristic score of the carrier is not within the preset value range of a normal carrier, and determining that the carrier is an illegal data carrier.
13. The apparatus of any of claims 8-12, wherein the apparatus further comprises:
the monitoring module is used for monitoring the carrier scales of various carrier types based on the normal scale ranges of the various carrier types in the preset time period;
and the determining module is used for determining that the carrier type corresponding to the carrier has scale abnormality.
14. The apparatus according to any of claims 8-12, wherein if the bearer type is a link;
the detection module is further used for detecting and determining that the carrier is an illegal data carrier based on a preset rule or by adopting a pre-trained illegal data recognition model;
the processing module is also used for shielding the carrier of the illegal data.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202210066367.8A 2022-01-20 2022-01-20 Illegal data processing method and device, electronic equipment and storage medium Active CN114553486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210066367.8A CN114553486B (en) 2022-01-20 2022-01-20 Illegal data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210066367.8A CN114553486B (en) 2022-01-20 2022-01-20 Illegal data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114553486A true CN114553486A (en) 2022-05-27
CN114553486B CN114553486B (en) 2023-07-21

Family

ID=81671969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210066367.8A Active CN114553486B (en) 2022-01-20 2022-01-20 Illegal data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114553486B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150067853A1 (en) * 2013-08-27 2015-03-05 Georgia Tech Research Corporation Systems and methods for detecting malicious mobile webpages
CN104766014A (en) * 2015-04-30 2015-07-08 安一恒通(北京)科技有限公司 Method and system used for detecting malicious website
CN106055574A (en) * 2016-05-19 2016-10-26 微梦创科网络科技(中国)有限公司 Method and device for recognizing illegal URL
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees
CN106502879A (en) * 2015-09-07 2017-03-15 中国移动通信集团公司 A kind of method and device for realizing applications security detection
CN106992975A (en) * 2017-03-21 2017-07-28 腾讯科技(深圳)有限公司 The recognition methods of malice network address and device
CN107957872A (en) * 2017-10-11 2018-04-24 中国互联网络信息中心 A kind of full web site source code acquisition methods and illegal website detection method, system
CN108667855A (en) * 2018-07-19 2018-10-16 百度在线网络技术(北京)有限公司 Network traffic anomaly monitor method, apparatus, electronic equipment and storage medium
CN108734011A (en) * 2017-04-17 2018-11-02 中国移动通信有限公司研究院 software link detection method and device
US20190068632A1 (en) * 2017-08-22 2019-02-28 ZeroFOX, Inc Malicious social media account identification
CN112711723A (en) * 2019-10-25 2021-04-27 北京搜狗科技发展有限公司 Malicious website detection method and device and electronic equipment
CN113765841A (en) * 2020-06-01 2021-12-07 中国电信股份有限公司 Malicious domain name detection method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees
US20150067853A1 (en) * 2013-08-27 2015-03-05 Georgia Tech Research Corporation Systems and methods for detecting malicious mobile webpages
CN104766014A (en) * 2015-04-30 2015-07-08 安一恒通(北京)科技有限公司 Method and system used for detecting malicious website
CN106502879A (en) * 2015-09-07 2017-03-15 中国移动通信集团公司 A kind of method and device for realizing applications security detection
CN106055574A (en) * 2016-05-19 2016-10-26 微梦创科网络科技(中国)有限公司 Method and device for recognizing illegal URL
CN106992975A (en) * 2017-03-21 2017-07-28 腾讯科技(深圳)有限公司 The recognition methods of malice network address and device
CN108734011A (en) * 2017-04-17 2018-11-02 中国移动通信有限公司研究院 software link detection method and device
US20190068632A1 (en) * 2017-08-22 2019-02-28 ZeroFOX, Inc Malicious social media account identification
CN107957872A (en) * 2017-10-11 2018-04-24 中国互联网络信息中心 A kind of full web site source code acquisition methods and illegal website detection method, system
CN108667855A (en) * 2018-07-19 2018-10-16 百度在线网络技术(北京)有限公司 Network traffic anomaly monitor method, apparatus, electronic equipment and storage medium
CN112711723A (en) * 2019-10-25 2021-04-27 北京搜狗科技发展有限公司 Malicious website detection method and device and electronic equipment
CN113765841A (en) * 2020-06-01 2021-12-07 中国电信股份有限公司 Malicious domain name detection method and device

Also Published As

Publication number Publication date
CN114553486B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN107168854B (en) Internet advertisement abnormal click detection method, device, equipment and readable storage medium
CN110647522A (en) Data mining method, device and system
CN109729069B (en) Abnormal IP address detection method and device and electronic equipment
CN110401660B (en) False flow identification method and device, processing equipment and storage medium
CN114580916A (en) Enterprise risk assessment method and device, electronic equipment and storage medium
CN110730164A (en) Safety early warning method, related equipment and computer readable storage medium
CN113904943A (en) Account detection method and device, electronic equipment and storage medium
CN113312560A (en) Group detection method and device and electronic equipment
CN114697247B (en) Fault detection method, device, equipment and storage medium of streaming media system
CN114553486A (en) Illegal data processing method and device, electronic equipment and storage medium
US20220321598A1 (en) Method of processing security information, device and storage medium
CN114661562A (en) Data warning method, device, equipment and medium
CN113204467A (en) Monitoring method, device, equipment and storage medium of online business system
CN113779098B (en) Data processing method, device, electronic equipment and storage medium
CN115643182A (en) Flow detection method and device and electronic equipment
CN114840798A (en) Information generation method, device, equipment and storage medium
CN113591095A (en) Identification information processing method and device and electronic equipment
CN114218059A (en) Page stability evaluation method and device, electronic equipment and readable storage medium
CN114003459A (en) Fault detection method and device, electronic equipment and readable storage medium
CN113515568A (en) Graph relation network construction method, graph neural network model training method and device
CN116112255A (en) Abnormal flow data processing method, device, equipment and storage medium
CN117934001A (en) Transaction abnormality detection method and device, electronic equipment and storage medium
CN116112245A (en) Attack detection method, attack detection device, electronic equipment and storage medium
CN116204843A (en) Abnormal account detection method and device, electronic equipment and storage medium
CN113961834A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant