CN110413866B - Data processing method and device for data processing - Google Patents

Data processing method and device for data processing Download PDF

Info

Publication number
CN110413866B
CN110413866B CN201810394877.1A CN201810394877A CN110413866B CN 110413866 B CN110413866 B CN 110413866B CN 201810394877 A CN201810394877 A CN 201810394877A CN 110413866 B CN110413866 B CN 110413866B
Authority
CN
China
Prior art keywords
page
detected
filtering
detection result
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810394877.1A
Other languages
Chinese (zh)
Other versions
CN110413866A (en
Inventor
何筱妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201810394877.1A priority Critical patent/CN110413866B/en
Publication of CN110413866A publication Critical patent/CN110413866A/en
Application granted granted Critical
Publication of CN110413866B publication Critical patent/CN110413866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a data processing method and device and a device for data processing, wherein the method specifically comprises the following steps: determining page content of a page to be detected; detecting the page to be detected according to the page content to obtain a first detection result page; and filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page. The embodiment of the invention can detect the newly added malicious popularization content, can reduce the workload of rechecking, saves the labor cost and can improve the rechecking efficiency.

Description

Data processing method and device for data processing
Technical Field
The present invention relates to the field of network security technologies, and in particular, to a data processing method and apparatus, and a device for data processing.
Background
The high-speed development of the network brings a large number of users for each large company platform, meanwhile, the internet advertising technology is gradually mature, and the high-tech advertising operation mode of transmitting the popularization content to the users through the network fully utilizes the advantages of high efficiency and wide spread of the network and becomes a profit mode of the internet company at present. Since commercial garbage and low-custom content may exist in the promotion content of the client, in order to build a good network environment, the promotion content needs to be checked.
The inventor finds that in the process of realizing the invention, the prior art generally examines the promotion content of the client in a manual mode before advertising, and if the examination is passed, the uploading content is delivered. However, after the audit is passed, some clients may modify the uploaded content to achieve malicious promotion, which increases the risk of illegal delivery of the uploaded content.
Moreover, the prior art audits the promotion content of the clients in a manual mode, consumes more labor cost and has lower auditing efficiency.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are provided to provide a data processing method, a data processing apparatus, and a device for data processing that overcome or at least partially solve the above problems, where the embodiments of the present invention can detect newly added malicious promotion content, can reduce the workload of review, save labor costs, and can improve the efficiency of review.
In order to solve the above problems, an embodiment of the present invention discloses a data processing method, including:
determining page content of a page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
And filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
In another aspect, an embodiment of the present invention discloses a data processing apparatus, including:
the page content determining module is used for determining page content of the page to be detected;
the detection module is used for detecting the page to be detected according to the page content so as to obtain a first detection result page; and
and the filtering module is used for filtering the first detection result page according to the attribute information of the first detection result page so as to obtain a second detection result page.
Optionally, the attribute information includes: at least one of page address, customer identification, and matching keywords.
Optionally, the filtering module includes:
the first filtering sub-module is used for carrying out first filtering on the first detection result page according to the first filtering characteristics obtained based on the attribute information so as to obtain a first filtering result;
the second filtering sub-module performs second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
Wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification.
Optionally, the page content includes: and identifying the text content obtained by identifying the picture in the page to be detected.
Optionally, the page content determining module includes:
the analysis sub-module is used for analyzing the page to be detected to obtain at least one picture in the page to be detected; and
and the identification sub-module is used for identifying at least one picture in the page to be detected so as to obtain text content corresponding to the at least one picture.
Optionally, the detection module includes:
the matching sub-module is used for matching the page content with keywords in the keyword set; and
and the determining submodule is used for taking the page to be detected as a first detection result page if the content matched with the keyword exists in the page content.
Optionally, the apparatus further comprises:
And the adding module is used for adding the expansion word corresponding to the keyword set if the expansion word exists in the first detection result page.
Optionally, the apparatus further comprises:
and the page determining module is used for determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
Optionally, the page determining module includes:
and the page determining sub-module is used for taking the page as a page to be detected if the client identifier corresponding to the page in the page clicking log is not matched with the white list.
In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs comprising operational instructions for:
determining page content of a page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
And filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
In yet another aspect, embodiments of the present invention disclose a machine-readable medium having stored thereon download instructions that, when executed by one or more processors, cause an apparatus to perform the aforementioned data processing method.
According to the embodiment of the invention, the page to be detected can be automatically detected according to the page content of the page to be detected, so that a first detection result page is obtained; the scheme of the embodiment of the invention can be executed at any time or periodically, so that the modification of the page content can be detected, namely, in the case that the page content is modified (for example, the malicious promotion content is newly added in the page), the newly added malicious promotion content can also be detected.
In addition, the embodiment of the invention can filter the first detection result page according to the attribute information, and the filtered second detection result page can be used for rechecking.
Drawings
FIG. 1 is a schematic illustration of an application environment for a data processing method of the present invention;
FIG. 2 is a flowchart illustrating steps of a first embodiment of a data processing method according to the present invention;
FIG. 3 is a flowchart illustrating steps of a second embodiment of a data processing method according to the present invention;
FIG. 4 is a block diagram of an embodiment of a data processing apparatus of the present invention;
fig. 5 is a block diagram illustrating an apparatus for data processing as a terminal according to an exemplary embodiment;
fig. 6 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The embodiment of the invention provides a data processing scheme, which can comprise the following steps: determining page content of a page to be detected; detecting the page to be detected according to the page content to obtain a first detection result page; and filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
In the embodiment of the invention, the page is a daily term, which refers to one side of a book or other reading type. In a network environment, information is organized in pages, which are implemented in a language, and hypertext links are established between different pages for browsing. Alternatively, the Page may correspond to content for promoting delivery of content, for example, in the case of delivering advertisements, the Page may be a Landing Page (Landing Page) of advertisements. In network marketing, a landing page is a web page that is opened for a user to see when a potential user clicks on an advertisement or searches using a search engine. Typically, the page will display expanded content related to the clicked advertisement or search result link, and the page should be search engine optimized for a certain keyword (or phrase). The detection of the embodiment of the invention can be used for judging whether the page to be detected is a malicious page or a suspected malicious interface, so the detection result page can comprise: malicious pages, or interfaces suspected of being malicious.
According to the embodiment of the invention, the page to be detected can be automatically detected according to the page content of the page to be detected, so that a first detection result page is obtained; the scheme of the embodiment of the invention can be executed at any time or periodically, so that the modification of the page content can be detected, namely, in the case that the page content is modified (for example, the malicious promotion content is newly added in the page), the newly added malicious promotion content can also be detected.
In the embodiment of the invention, the first detection result page can be checked; if the detection result page is forbidden, processing relevant clients of the detection result page, for example, performing sealing processing and the like on the corresponding clients; if the detection result page is not forbidden, the relevant client of the detection result page can not be processed. The above review may be performed manually or automatically. The review can exclude semantic interference (for example, a secret method appears in the introduction page of a book, and the secret method is wrongly segmented into secret parties), so that the practically forbidden clients can be processed. For example, if a secret prescription appears in a page promoted by a certain client, the client is determined to be forbidden, and the client is subjected to sealing and stopping processing.
In addition, the embodiment of the invention can filter the first detection result page, and the filtered second detection result page can be used for rechecking, and the repeated data in the filtered second detection result page can be reduced, so that the rechecking workload can be reduced, and the rechecking efficiency can be improved.
The data processing method provided by the embodiment of the present invention may be applied to the application environment shown in fig. 1, where as shown in fig. 1, the first terminal 100, the server 200 and the second terminal 300 are located in a wired or wireless network, and through the wired or wireless network, the first terminal 100 performs data interaction with the server 200, or the second terminal 300 performs data interaction with the server 200, or the first terminal 100 performs data interaction with the second terminal 300.
The first terminal 100 may refer to a terminal used by a client. A customer may refer to a natural person or organization that receives a service in exchange for money or some valuable item, be it a buyer of the service, and they may be the final consumer, an agent, or an intermediary in the supply chain. Alternatively, customers are more demanding for a particular market segment. In particular, the service may be a promotion service, such as an advertisement service, etc.
The second terminal 300 may refer to a terminal used by an audience. Audience refers to the recipients of information distribution, including readers of newspapers and books, listeners of broadcasts, viewers of movies and television, netizens. In particular, for embodiments of the present invention, the audience may be recipients of promotional content, such as clickers of advertisements.
The first terminal 100 and the second terminal 300 may include, but are not limited to: a mobile terminal, or a fixed terminal. Specifically, the first terminal and the second terminal may include, but are not limited to: smart phones, tablet computers, e-book readers, MP3 (dynamic video expert compression standard audio plane 3,Moving Picture Experts Group Audio Layer III) players, MP4 (dynamic video expert compression standard audio plane 4,Moving Picture Experts Group Audio Layer IV) players, laptop portable computers, car computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like.
The server 200 may receive the request of the first terminal 100 and establish a mapping relationship between the client identifier and the page address. One customer may correspond to at least one customer identification. The page address may correspond to a page for promotional content, where the page address may include, but is not limited to: URL (uniform resource locator ), URI (uniform resource identifier, uniform Resource Identifier), etc.
The server 200 may perform page delivery for the page corresponding to the page address in the mapping relationship, for example, may perform page delivery through channels such as a search engine, a portal platform, a headline platform, a self-media platform, and the like. In the page delivery process, the second terminal 300 may load the corresponding page by clicking the page address. Accordingly, the server 200 may record the pages clicked by the audience through the page click log, and the pages recorded in the page click log may be used as a source of the pages to be detected.
The server 200 may determine whether the page to be detected is a malicious page or an interface suspected to be malicious through the data processing scheme in the embodiment of the present invention. Optionally, if it is determined that the page to be detected is a malicious page, performing sealing processing on the corresponding client, specifically, determining a target client corresponding to the malicious page according to the mapping relationship, and stopping putting the page corresponding to the client identifier of the target client; of course, embodiments of the present invention are not limited to a particular shut down process.
Method embodiment one
Referring to fig. 2, a flowchart illustrating steps of a first embodiment of a data processing method according to the present invention may specifically include the following steps:
Step 201, determining page content of a page to be detected;
step 202, detecting the page to be detected according to the page content to obtain a first detection result page;
step 203, filtering the first detection result page according to the attribute information of the first detection result page, so as to obtain a second detection result page.
The embodiment of the present invention does not limit the source corresponding to the page to be detected in step 201.
According to an embodiment, the source corresponding to the page to be detected may be a platform where a page detection requirement exists.
According to another embodiment, the source corresponding to the page to be detected in step 201 may be a page click log. Alternatively, the page to be detected may be determined by: and determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
According to the client identification, the page to be detected is determined from the page click log, and the determination of the page to be detected can be performed with the client identification as granularity, so that the page of the client identification matched with the keyword can be detected, and the detection efficiency and the detection accuracy can be improved. In an optional embodiment of the present invention, the determining, according to the client identifier, the to-be-detected page from the page click log may include: and if the client identifier corresponding to the page in the page click log is not matched with the white list, taking the page as the page to be detected.
The white list can be used for storing the client identification of the normal client, namely the client with high safety, so that the condition that the page of the normal client is detected (namely the page is used as a malicious page) can be avoided to a certain extent, and the detection accuracy can be improved. Under the traditional condition, the pages of the normal clients are usually required to be removed in a manual mode under the condition that the pages of the normal clients are detected, and the embodiment of the invention can remove the pages of the normal clients in advance, so that the detection efficiency can be improved.
In an alternative embodiment of the invention, it may be determined whether to add the customer to the whitelist, depending on the authority of the customer. The authority of a customer can be measured by external links. Generally, the more external links of high quality, the higher the authority of the web site or web page itself. In addition, details such as domain name registration history, stability, privacy policy, etc. can also affect the authority of the client.
In another alternative embodiment of the invention, it may be determined whether to add the customer to the whitelist based on the customer's feedback information on the page detection behavior. Specifically, if the feedback rate of the client to the page detection behavior is high, the client may be added to the white list, for example, in a case where the client senses that the page is subjected to the page detection behavior, the client may trigger a complaint operation, and in this case, the feedback rate of the client to the page detection behavior may be considered to be high.
It will be appreciated that the above process of determining whether to add a customer to a whitelist is merely an alternative embodiment, and in fact, embodiments of the present invention are not limited to the specific process of determining whether to add a customer to a whitelist, and are not limited to the specific identification of a customer in a whitelist.
In embodiments of the present invention, a field may refer to a certain area of expertise, and all matters related to that area may be cited. Examples of fields may include: medicine, electronics, household appliances, entertainment, machinery, sports, science and technology, environmental protection, and the like. For example, the keyword "cure" is not basically used in the "mechanical" field, so in the case of using the keyword "cure", the "mechanical" field may not be selected for page detection. And determining the page to be detected from the page click log according to the field. The method and the device can determine the pages to be detected by taking the domain as granularity, so that the pages in the domain matched with the keywords can be detected, and the detection efficiency and the detection accuracy can be improved.
The click region identification may be used to characterize the location region where the user is located when the page is clicked by the user. Aiming at the characteristic that some clients popularize the content to the partial area, the embodiment of the invention can determine the page to be detected by taking the click area identification as granularity, so that the page with the click area identification matched with the keyword can be detected, and the detection efficiency and the detection accuracy can be improved.
In an alternative embodiment of the present invention, the page content of the page to be detected in step 201 may include: and identifying the text content obtained by identifying the picture in the page to be detected. The text content corresponding to the picture can increase the detection range of the page content, so that the malicious promotion content put through the picture can be detected under the condition that the malicious promotion content is put through the picture, and the coverage rate and the accuracy of the page detection can be improved.
In an alternative embodiment of the present invention, the determining the page content of the page to be detected in step 201 may include: analyzing a page to be detected to obtain at least one picture in the page to be detected; and identifying at least one picture in the page to be detected to obtain text content corresponding to the at least one picture. Optionally, HTML (HyperText markup Language) code corresponding to the page to be detected may be parsed to obtain at least one picture in the page to be detected. Further, the picture in the page to be detected can be identified by calling a picture identification interface, so that text content corresponding to the picture can be obtained.
In step 202, the page to be detected may be detected according to the page content obtained in step 201, so as to obtain a first detection result page.
In an alternative embodiment of the present invention, the determining the page content of the page to be detected in step 202 may include: matching the page content with keywords in the keyword set; and if the content matched with the keyword exists in the page content, taking the page to be detected as a first detection result page. Wherein the matching of the content with the keyword may include: the content is the same as, similar to, or related to the keywords, etc.
In the embodiment of the invention, the keywords can refer to words conforming to preset malicious rules, such as low-custom words, malicious words, forbidden words, sensitive words, limit words and the like. Alternatively, the keywords may be preset manually, or may be obtained according to the forbidden page, for example, the keywords may be extracted from the forbidden page. The forbidden page can refer to the forbidden page, and the embodiment of the invention does not limit the specific forbidden page and the specific acquisition mode of the forbidden page.
It should be noted that, the keywords in the keyword set may be updated along with the update of the forbidden page. Accordingly, the method of the embodiment of the invention can further comprise: and if the expansion word corresponding to the keyword exists in the first detection result page, adding the expansion word into the keyword set.
The expanded words may refer to words expanded according to the keywords, and the expanded words may include, but are not limited to: synonyms, paraphraseology, related words, and the like. For example, the extension of "secret" includes: secret recipe, recipe bias, etc. The extension of "cure" includes: "treatment", "diagnosis" and the like.
The embodiment of the invention can judge whether the expansion word corresponding to the keyword exists in the detection result page, if so, the expansion word can be added into the keyword set, thereby realizing expansion of the keyword. It can be understood that, according to actual application requirements, a person skilled in the art can update the keyword set regularly according to the detection result page and daily feedback of supervision, so as to improve the coverage rate and accuracy of page detection.
In one embodiment of the present invention, the first detection result page obtained in step 202 may be checked; if the detection result page is forbidden, processing relevant clients of the detection result page, for example, performing sealing processing and the like on the corresponding clients; if the detection result page is not forbidden, the relevant client of the detection result page can not be processed. The above review may be performed manually or automatically. The review can exclude semantic interference (for example, a secret method appears in the introduction page of a book, and the secret method is wrongly segmented into secret parties), so that the practically forbidden clients can be processed. For example, if a secret prescription appears in a page promoted by a certain client, the client is determined to be forbidden, and the client is subjected to sealing and stopping processing. In step 203, the first detection result page may be filtered according to the attribute information of the first detection result page obtained in step 202, so as to obtain a second detection result page.
The attribute information may refer to a property of the first detection result page, and according to the attribute information, the embodiment of the present invention filters the first detection result page. The filtered second detection result page can be used for rechecking, and the filtering can reduce repeated data in the first detection result page, so that the rechecking workload can be reduced, and the rechecking efficiency can be improved.
Alternatively, the attribute information may include: at least one of page address, customer identification, and matching keywords. The page address may refer to an address of the page in the internet, that is, the address may be used to locate the page; the client identifier can be used for identifying clients so as to realize the distinction between different clients, and further can realize the distinction between pages corresponding to promotion contents of different clients; the matching keyword may refer to a word included in the first detection result page and matched with the keyword.
One skilled in the art can filter the first detection result page by adopting any one or combination of page address, customer identification and matching keywords according to actual application requirements. Optionally, in an embodiment of the present invention, the filtering basis may include: at least two of page addresses, client identifications and matching keywords corresponding to the result pages to be detected. Specifically, the filtering basis may include: page address and customer identification, or customer identification and matching keywords, or page address and matching keywords, etc.
The aspects of embodiments of the present invention may be performed at any time or periodically. If the data method of the embodiment of the present invention needs to be repeatedly executed for one to-be-detected page, the data processing method of the embodiment of the present invention may be repeatedly executed for the page address of one to-be-detected page, or the data method of the embodiment of the present invention may be repeatedly executed for the page screenshot of one to-be-detected page, but since the page screenshot needs to occupy a large amount of storage space, the detection efficiency is easily reduced.
In summary, according to the data processing method of the embodiment of the invention, the page to be detected can be automatically detected according to the page content of the page to be detected, so as to obtain a first detection result page; the scheme of the embodiment of the invention can be executed at any time or periodically, so that the modification of the page content can be detected, namely, in the case that the page content is modified (for example, the malicious promotion content is newly added in the page), the newly added malicious promotion content can also be detected.
In addition, the page content adopted in the detection of the embodiment of the invention can comprise: identifying text content obtained by identifying pictures in the page to be detected; the detection range of the page content can be increased, so that the malicious promotion content put through the picture can be detected under the condition that the malicious promotion content is put through the picture, and the coverage rate and the accuracy of the page detection can be improved.
Method embodiment II
Referring to fig. 3, a flowchart illustrating steps of a second embodiment of a data processing method according to the present invention may specifically include the following steps:
step 301, determining page content of a page to be detected;
step 302, detecting the page to be detected according to the page content to obtain a first detection result page;
step 303, performing a first filtering on the first detection result page according to a first filtering feature obtained based on the attribute information, so as to obtain a first filtering result;
step 304, performing second filtering on the first filtering result according to the second filtering characteristic obtained based on the attribute information to obtain a second detection result page;
wherein the first filtering feature may comprise: the page address and the customer identification, the second filtering feature may include: customer identification and matching keywords; or alternatively
The first filtering feature may include: customer identification and matching keywords, the second filtering feature may include: page address and customer identification.
In contrast to the first embodiment of the method shown in fig. 2, the present embodiment refines the process of filtering the first detection result page through steps 303 and 304.
The embodiment of the invention sequentially adopts the first filtering characteristic and the second filtering characteristic to carry out the first filtering and the second filtering on the first detection result page so as to reduce repeated auditing of the same matching keywords of the same client and repeated auditing of the same matching keywords of the same client.
In one embodiment of the present invention, the first filtering in step 303 uses "page address+customer identifier" filtering, which can reduce repeated audits of the same customer on the same page; the second filtering in step 304 adopts "customer identification+matching keyword" filtering, so that repeated audits of the same customer and the same matching keyword can be reduced.
In an application example of the invention, it is assumed that 8 pages are put in by a client with a client identifier 18920653, and according to practical situations, the page addresses of the 8 pages are completely different due to incomplete consistency of corresponding promotion contents. Detecting that 3 pages are detected to have a matching keyword secret recipe, 2 pages are detected to have a matching keyword cure, a first filtering result is not repeated, and the first filtering result comprises 8 pages; after the second filtering, a checking record of a secret prescription and a checking record of healing can be reserved as a second checking result page to be checked, so that the checking workload can be reduced.
Of course, the first filtering of step 303 may employ a "page address+customer identification", "customer identification+matching keyword" filtering, and the second filtering of step 304 may employ a "page address+customer identification" filtering.
In summary, according to the data processing method provided by the embodiment of the invention, through the first filtering and the second filtering, repeated auditing of the same page of the same client can be reduced, and repeated auditing of the same matching keyword of the same client can be reduced; therefore, repeated data in the first detection result page can be reduced, the workload of rechecking can be reduced, and the rechecking efficiency can be improved.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of motion acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, it should be understood by those skilled in the art that the embodiments described in the specification are all preferred embodiments and that the movement involved is not necessarily required by the embodiments of the present invention.
Device embodiment
With reference to FIG. 4, there is shown a block diagram of an embodiment of a data processing apparatus of the present invention, which may include in particular:
a page content determining module 401, configured to determine page content of a page to be detected; and
the detection module 402 is configured to detect the page to be detected according to the page content, so as to obtain a first detection result page; and
and the filtering module 403 is configured to filter the first detection result page according to the attribute information of the first detection result page, so as to obtain a second detection result page.
Alternatively, the attribute information may include: at least one of page address, customer identification, and matching keywords.
Optionally, the filtering module 403 may include:
the first filtering sub-module is used for carrying out first filtering on the first detection result page according to the first filtering characteristics obtained based on the attribute information so as to obtain a first filtering result;
the second filtering sub-module performs second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature may comprise: the page address and the customer identification, the second filtering feature may include: customer identification and matching keywords; or alternatively
The first filtering feature may include: customer identification and matching keywords, the second filtering feature may include: page address and customer identification.
Optionally, the page content may include: and identifying the text content obtained by identifying the picture in the page to be detected.
Optionally, the page content determining module 401 may include:
the analysis sub-module is used for analyzing the page to be detected to obtain at least one picture in the page to be detected; and
and the identification sub-module is used for identifying at least one picture in the page to be detected so as to obtain text content corresponding to the at least one picture.
Optionally, the page content determining module 401 may include:
the matching sub-module is used for matching the page content with keywords in the keyword set; and
and the determining submodule is used for taking the page to be detected as a first detection result page if the content matched with the keyword exists in the page content.
Optionally, the apparatus may further include:
and the adding module is used for adding the expansion word corresponding to the keyword set if the expansion word exists in the first detection result page.
Optionally, the apparatus may further include:
and the page determining module is used for determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
Optionally, the page determining module may include:
and the page determining sub-module is used for taking the page as a page to be detected if the client identifier corresponding to the page in the page clicking log is not matched with the white list.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The embodiment of the invention also provides a device for data processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise operation instructions for performing operations included in one or more of the methods in fig. 2-3.
Fig. 5 is a block diagram illustrating an apparatus for data processing as a terminal according to an exemplary embodiment. For example, terminal 1100 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 5, a terminal 1100 may include one or more of the following components: a processing component 1102, a memory 1104, a power component 1106, a multimedia component 1108, an audio component 1110, an input/output (I/O) interface 1112, a sensor component 1114, and a communication component 1116.
The processing component 1102 generally controls overall operation of the terminal 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1102 may include one or more processors 1120 to execute the download instructions to perform all or part of the steps of the methods described above. Further, the processing component 1102 can include one or more modules that facilitate interactions between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.
The memory 1104 is configured to store various types of data to support operations at the terminal 1100. Examples of such data include download instructions for any application or method operating on terminal 1100, contact data, phonebook data, messages, pictures, video, and the like. The memory 1104 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 1106 provides power to the various components of the terminal 1100. Power supply component 1106 can comprise a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for terminal 1100.
Multimedia component 1108 includes a screen between the terminal 1100 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or a sliding motion action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, multimedia component 1108 includes a front camera and/or a rear camera. When the terminal 1100 is in an operation mode, such as a photographing mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 1110 is configured to output and/or input an audio signal. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the terminal 1100 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1104 or transmitted via the communication component 1116. In some embodiments, the audio component 1110 further comprises a speaker for outputting audio signals.
The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
Sensor assembly 1114 includes one or more sensors for providing status assessment of various aspects of terminal 1100. For example, sensor assembly 1114 may detect the on/off state of terminal 1100, the relative positioning of the components, such as the display and keypad of terminal 1100, the sensor assembly 1114 may also detect a change in position of terminal 1100 or a component of terminal 1100, the presence or absence of user contact with terminal 1100, the orientation or acceleration/deceleration of terminal 1100, and a change in temperature of terminal 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1116 is configured to facilitate communication between the terminal 1100 and other devices, either wired or wireless. Terminal 1100 can access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1116 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1116 further includes a Near Field Communication (NFC) module to facilitate short range communication. For example, in NFC modules may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra wideband (UW claim) technology, bluetooth (claim T) technology, and other technologies.
In an exemplary embodiment, the terminal 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1104, including download instructions that are executable by processor 1120 of terminal 1100 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Fig. 6 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, free claims SD, and the like.
The download instructions in the storage medium, when executed by a processor of an apparatus (terminal or server), enable the apparatus to perform a data processing method comprising: determining page content of a page to be detected; detecting the page to be detected according to the page content to obtain a first detection result page; and filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
The embodiment of the invention discloses A1, a data processing method, which comprises the following steps:
determining page content of a page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
and filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
A2, the method according to A1, wherein the attribute information includes: at least one of page address, customer identification, and matching keywords.
A3, filtering the first detection result page according to the method of A1, including:
According to the first filtering characteristics obtained based on the attribute information, performing first filtering on the first detection result page to obtain a first filtering result;
performing second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification
A4, the method according to any of A1 to A3, the page content comprising: and identifying the text content obtained by identifying the picture in the page to be detected.
A5, determining page content of the page to be detected according to the method of A4, wherein the method comprises the following steps:
analyzing a page to be detected to obtain at least one picture in the page to be detected;
and identifying at least one picture in the page to be detected to obtain text content corresponding to the at least one picture.
A6, according to the method of any one of A1 to A3, the detecting the page to be detected according to the page content comprises the following steps:
Matching the page content with keywords in the keyword set;
and if the content matched with the keyword exists in the page content, taking the page to be detected as a first detection result page.
A7, the method according to A6, the method further comprising:
and if the expansion word corresponding to the keyword exists in the first detection result page, adding the expansion word into the keyword set.
A8, determining the page to be detected according to the method of any one of A1 to A3 through the following steps:
and determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
A9, determining the page to be detected from the page click log according to the method of A8, wherein the method comprises the following steps:
and if the client identifier corresponding to the page in the page click log is not matched with the white list, taking the page as the page to be detected.
The embodiment of the invention discloses claim 10 and a data processing device, which comprises:
the page content determining module is used for determining page content of the page to be detected;
the detection module is used for detecting the page to be detected according to the page content so as to obtain a first detection result page; and
And the filtering module is used for filtering the first detection result page according to the attribute information of the first detection result page so as to obtain a second detection result page.
The apparatus of claim 11, claim 10, the attribute information comprising: at least one of page address, customer identification, and matching keywords.
The apparatus of claim 12, claim 10, the filtration module comprising:
the first filtering sub-module is used for carrying out first filtering on the first detection result page according to the first filtering characteristics obtained based on the attribute information so as to obtain a first filtering result;
the second filtering sub-module performs second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification.
The apparatus of any one of claim 13, claim 10 to claim 12, the page content comprising: and identifying the text content obtained by identifying the picture in the page to be detected.
The apparatus of claim 14, claim 13, the page content determination module comprising:
the analysis sub-module is used for analyzing the page to be detected to obtain at least one picture in the page to be detected; and
and the identification sub-module is used for identifying at least one picture in the page to be detected so as to obtain text content corresponding to the at least one picture.
The apparatus of claim 15, any one of claims 10 to 12, the detection module comprising:
the matching sub-module is used for matching the page content with keywords in the keyword set; and
and the determining submodule is used for taking the page to be detected as a first detection result page if the content matched with the keyword exists in the page content.
The apparatus of claim 16, claim 15, the apparatus further comprising:
and the adding module is used for adding the expansion word corresponding to the keyword set if the expansion word exists in the first detection result page.
The apparatus of claim 17, any one of claims 10 to 12, further comprising:
And the page determining module is used for determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
The apparatus of claim 18, claim 17, the page determination module comprising:
and the page determining sub-module is used for taking the page as a page to be detected if the client identifier corresponding to the page in the page clicking log is not matched with the white list.
The embodiment of the invention discloses a C19, a device for data processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by one or more processors, and the one or more programs comprise operation instructions for:
determining page content of a page to be detected; the page content includes: identifying text content obtained by identifying pictures in the page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
and filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page.
C20, the apparatus of C19, the attribute information comprising: at least one of page address, customer identification, and matching keywords.
C21, the apparatus of C19, the filtering the first detection result page, including:
according to the first filtering characteristics obtained based on the attribute information, performing first filtering on the first detection result page to obtain a first filtering result;
performing second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification
C22, the apparatus of any one of C19 to C20, the page content comprising: and identifying the text content obtained by identifying the picture in the page to be detected.
C23, according to the device of C22, the determining the page content of the page to be detected includes:
analyzing a page to be detected to obtain at least one picture in the page to be detected;
And identifying at least one picture in the page to be detected to obtain text content corresponding to the at least one picture.
C24, the apparatus according to any one of C19 to C20, wherein the detecting the page to be detected according to the page content includes:
matching the page content with keywords in the keyword set;
and if the content matched with the keyword exists in the page content, taking the page to be detected as a first detection result page.
C25, the device of C24, the device further configured to be executed by one or more processors, the one or more programs including operational instructions for:
and if the expansion word corresponding to the keyword exists in the first detection result page, adding the expansion word into the keyword set.
C26, the device of any one of C19 to C20, the device further configured to be executed by one or more processors, the one or more programs including operational instructions for:
and determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
C27, the device of C26, the determining the page to be detected from the page click log, includes:
and if the client identifier corresponding to the page in the page click log is not matched with the white list, taking the page as the page to be detected.
The embodiment of the invention discloses a D28, a machine-readable medium, on which is stored downloading instructions, which when executed by one or more processors, cause an apparatus to perform the data processing method as described in one or more of A1 to A9.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
The foregoing has outlined a data processing method, a data processing apparatus, an apparatus for data processing, and a machine readable medium in detail, wherein specific examples are presented herein to illustrate the principles and embodiments of the present invention and to help understand the method and core concepts thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (22)

1. A method of data processing, the method comprising:
determining page content of a page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page;
And detecting the page to be detected according to the page content, including:
matching the page content with keywords in the keyword set;
if the content matched with the keyword exists in the page content, the page to be detected is used as the first detection result page;
the filtering the first detection result page includes:
according to the first filtering characteristics obtained based on the attribute information, performing first filtering on the first detection result page to obtain a first filtering result;
performing second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification
And rechecking the first detection result page according to the second detection result page.
2. The method of claim 1, wherein the attribute information comprises: at least one of page address, customer identification, and matching keywords.
3. The method according to any one of claims 1 to 2, wherein the page content comprises: and identifying the text content obtained by identifying the picture in the page to be detected.
4. A method according to claim 3, wherein said determining page content of the page to be detected comprises:
analyzing a page to be detected to obtain at least one picture in the page to be detected;
and identifying at least one picture in the page to be detected to obtain text content corresponding to the at least one picture.
5. The method according to claim 1, wherein the method further comprises:
and if the expansion word corresponding to the keyword exists in the first detection result page, adding the expansion word into the keyword set.
6. The method according to any one of claims 1 to 2, wherein the page to be detected is determined by:
and determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
7. The method of claim 6, wherein the determining the page to be detected from a page click log comprises:
And if the client identifier corresponding to the page in the page click log is not matched with the white list, taking the page as the page to be detected.
8. A data processing apparatus, the apparatus comprising:
the page content determining module is used for determining page content of the page to be detected;
the detection module is used for detecting the page to be detected according to the page content so as to obtain a first detection result page; and
the filtering module is used for filtering the first detection result page according to the attribute information of the first detection result page so as to obtain a second detection result page;
the detection module comprises:
the matching sub-module is used for matching the page content with keywords in the keyword set; and
a determining submodule, configured to take the page to be detected as the first detection result page if the content matched with the keyword exists in the page content;
the filter module includes:
the first filtering sub-module is used for carrying out first filtering on the first detection result page according to the first filtering characteristics obtained based on the attribute information so as to obtain a first filtering result;
The second filtering sub-module performs second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification;
and rechecking the first detection result page according to the second detection result page.
9. The apparatus of claim 8, wherein the attribute information comprises: at least one of page address, customer identification, and matching keywords.
10. The apparatus according to any one of claims 8 to 9, wherein the page content comprises: and identifying the text content obtained by identifying the picture in the page to be detected.
11. The apparatus of claim 10, the page content determination module comprising:
the analysis sub-module is used for analyzing the page to be detected to obtain at least one picture in the page to be detected; and
And the identification sub-module is used for identifying at least one picture in the page to be detected so as to obtain text content corresponding to the at least one picture.
12. The apparatus of claim 8, wherein the apparatus further comprises:
and the adding module is used for adding the expansion word corresponding to the keyword set if the expansion word exists in the first detection result page.
13. The apparatus according to any one of claims 8 to 9, further comprising:
and the page determining module is used for determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
14. The apparatus of claim 13, the page determination module comprising:
and the page determining sub-module is used for taking the page as a page to be detected if the client identifier corresponding to the page in the page clicking log is not matched with the white list.
15. An apparatus for data processing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising operational instructions for:
Determining page content of a page to be detected; the page content includes: identifying text content obtained by identifying pictures in the page to be detected;
detecting the page to be detected according to the page content to obtain a first detection result page;
filtering the first detection result page according to the attribute information of the first detection result page to obtain a second detection result page;
and detecting the page to be detected according to the page content, including:
matching the page content with keywords in the keyword set;
if the content matched with the keyword exists in the page content, the page to be detected is used as the first detection result page;
the filtering the first detection result page includes:
according to the first filtering characteristics obtained based on the attribute information, performing first filtering on the first detection result page to obtain a first filtering result;
performing second filtering on the first filtering result according to second filtering characteristics obtained based on the attribute information so as to obtain a second detection result page;
wherein the first filtering feature comprises: a page address and a customer identification, the second filtering feature comprising: customer identification and matching keywords; or alternatively
The first filtering feature comprises: customer identification and matching keywords, the second filtering feature comprising: page address and customer identification
And rechecking the first detection result page according to the second detection result page.
16. The apparatus of claim 15, wherein the attribute information comprises: at least one of page address, customer identification, and matching keywords.
17. The apparatus according to any one of claims 15 to 16, wherein the page content comprises: and identifying the text content obtained by identifying the picture in the page to be detected.
18. The apparatus of claim 17, wherein the determining page content of the page to be detected comprises:
analyzing a page to be detected to obtain at least one picture in the page to be detected;
and identifying at least one picture in the page to be detected to obtain text content corresponding to the at least one picture.
19. The device of claim 15, wherein the device is further configured to execute the one or more programs by one or more processors includes operational instructions for:
And if the expansion word corresponding to the keyword exists in the first detection result page, adding the expansion word into the keyword set.
20. The device of any one of claims 15-16, wherein the device is further configured to be executed by one or more processors the one or more programs include operational instructions for:
and determining the page to be detected from a page click log according to at least one of the client identifier, the field and the click region identifier.
21. The apparatus of claim 20, the determining the page to be detected from a page click log comprising:
and if the client identifier corresponding to the page in the page click log is not matched with the white list, taking the page as the page to be detected.
22. A machine readable medium having stored thereon download instructions which, when executed by one or more processors, cause an apparatus to perform the data processing method of one or more of claims 1 to 7.
CN201810394877.1A 2018-04-27 2018-04-27 Data processing method and device for data processing Active CN110413866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810394877.1A CN110413866B (en) 2018-04-27 2018-04-27 Data processing method and device for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810394877.1A CN110413866B (en) 2018-04-27 2018-04-27 Data processing method and device for data processing

Publications (2)

Publication Number Publication Date
CN110413866A CN110413866A (en) 2019-11-05
CN110413866B true CN110413866B (en) 2024-02-02

Family

ID=68346803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810394877.1A Active CN110413866B (en) 2018-04-27 2018-04-27 Data processing method and device for data processing

Country Status (1)

Country Link
CN (1) CN110413866B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647422A (en) * 2012-04-10 2012-08-22 中国科学院计算机网络信息中心 Phishing website detection method and device
CN102938041A (en) * 2012-10-30 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Comprehensive detection method and system for page tampering
CN103279710A (en) * 2013-04-12 2013-09-04 深圳市易聆科信息技术有限公司 Method and system for detecting malicious codes of Internet information system
CN104935605A (en) * 2015-06-30 2015-09-23 北京奇虎科技有限公司 Detection method, device and system for fishing websites
CN104933056A (en) * 2014-03-18 2015-09-23 腾讯科技(深圳)有限公司 Uniform resource locator (URL) de-duplication method and device
WO2015165245A1 (en) * 2014-04-30 2015-11-05 广州市动景计算机科技有限公司 Webpage data processing method and device
CN105630780A (en) * 2014-10-27 2016-06-01 小米科技有限责任公司 Webpage information processing method and apparatus
CN106649787A (en) * 2016-12-28 2017-05-10 北京奇虎科技有限公司 Method and device for filtering advertisement in mobile terminal client
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7817309B2 (en) * 2006-11-13 2010-10-19 Ricoh Company, Ltd. Double filter fax cover page
CN103810425B (en) * 2012-11-13 2015-09-30 腾讯科技(深圳)有限公司 The detection method of malice network address and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647422A (en) * 2012-04-10 2012-08-22 中国科学院计算机网络信息中心 Phishing website detection method and device
CN102938041A (en) * 2012-10-30 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Comprehensive detection method and system for page tampering
CN103279710A (en) * 2013-04-12 2013-09-04 深圳市易聆科信息技术有限公司 Method and system for detecting malicious codes of Internet information system
CN104933056A (en) * 2014-03-18 2015-09-23 腾讯科技(深圳)有限公司 Uniform resource locator (URL) de-duplication method and device
WO2015165245A1 (en) * 2014-04-30 2015-11-05 广州市动景计算机科技有限公司 Webpage data processing method and device
CN105630780A (en) * 2014-10-27 2016-06-01 小米科技有限责任公司 Webpage information processing method and apparatus
CN104935605A (en) * 2015-06-30 2015-09-23 北京奇虎科技有限公司 Detection method, device and system for fishing websites
CN106649787A (en) * 2016-12-28 2017-05-10 北京奇虎科技有限公司 Method and device for filtering advertisement in mobile terminal client
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Liu Wenyin;Guanglin Huang;Liu Xiaoyue;Xiaotie Deng;Zhang Min State Key Laboratory of Intelligent Technology & System, Tsinghua University, Beijing, China.Phishing Web page detection.Eighth International Conference on Document Analysis and Recognition (ICDAR'05).2006,第[1-5]页. *
一个网页过滤改进算法的应用与实现;程基鹏;;电脑知识与技术(第33期);第[9192-9194]页 *
基于动态测试的XSS 漏洞检测方法研究;曹黎波, 曹天杰;计算机应用与软件;第32卷(第8期);第[272-275]页 *
网页漏洞挖掘系统设计;黄超;李毅;麻荣宽;马建勋;;信息网络安全(第09期);第[76-80]页 *

Also Published As

Publication number Publication date
CN110413866A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
US9589149B2 (en) Combining personalization and privacy locally on devices
US10862888B1 (en) Linking a forwarded contact on a resource to a user interaction on a requesting source item
CN107229527B (en) Information resource collection method and device and computer readable storage medium
WO2020082938A1 (en) Label processing method and apparatus, electronic device and storage medium
US11004163B2 (en) Terminal-implemented method, server-implemented method and terminal for acquiring certification document
WO2017181663A1 (en) Method and device for matching image to search information
RU2741479C2 (en) Mobile advertisement provisioning system and method
CN107622074B (en) Data processing method and device and computing equipment
US20160027044A1 (en) Presenting information cards for events associated with entities
CN117390330A (en) Webpage access method and device
CN107515870B (en) Searching method and device and searching device
US20210377628A1 (en) Method and apparatus for outputting information
CN107515869B (en) Searching method and device and searching device
CN108717403B (en) Processing method and device for processing
CN105976201B (en) Purchasing behavior monitoring method and device for e-commerce system
CN110633391B (en) Information searching method and device
CN105096162B (en) Content item display method and device
CN111246255B (en) Video recommendation method and device, storage medium, terminal and server
CN110413866B (en) Data processing method and device for data processing
Cahyani et al. An evidence‐based forensic taxonomy of Windows phone dating apps
WO2016127888A1 (en) Method and device for downloading multimedia file
CN108804181B (en) Control content obtaining method and device and storage medium
CN111752656A (en) Information display method and device, electronic equipment and storage medium
CN111242398B (en) Data processing method and device for data processing
CN109271615B (en) Entry processing method, apparatus and machine readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant