CN112966263A - Target information acquisition method and device and computer readable storage medium - Google Patents

Target information acquisition method and device and computer readable storage medium Download PDF

Info

Publication number
CN112966263A
CN112966263A CN202110214055.2A CN202110214055A CN112966263A CN 112966263 A CN112966263 A CN 112966263A CN 202110214055 A CN202110214055 A CN 202110214055A CN 112966263 A CN112966263 A CN 112966263A
Authority
CN
China
Prior art keywords
website
webpage
target
target information
recharging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110214055.2A
Other languages
Chinese (zh)
Inventor
郭琦
闵勇
肖梁
刘斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202110214055.2A priority Critical patent/CN112966263A/en
Publication of CN112966263A publication Critical patent/CN112966263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Abstract

The application provides a target information acquisition method, a target information acquisition device and a computer-readable storage medium, wherein the method comprises the following steps: inputting the webpage content characteristics of the website to be analyzed into the first model to judge whether the corresponding webpage is a target webpage or not; inputting the webpage code structure characteristics of the target webpage into a second model to obtain the code structure type of the target webpage; and searching a corresponding interaction strategy according to the code result type, and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs. By the method, the target website can be automatically identified and the target information can be automatically acquired.

Description

Target information acquisition method and device and computer readable storage medium
Technical Field
The application belongs to the field, and particularly relates to a target information acquisition method, a target information acquisition device and a computer-readable storage medium.
Background
This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The internet is flooded with a large number of gambling fraud websites, which brings huge financial losses to the society. One of the ways to effectively prevent gambling fraud is to block off the entry channel of the criminal, and the method is to seal the cash register number of the gambling fraud website, and the precondition of the sealing is to obtain the related cash register number. The existing method for acquiring the gambling collection card number mainly comprises the steps of checking a large number of gambling websites in a manual checking mode and obtaining collection accounts of the gambling websites. In addition, various websites, such as phishing websites and marketing websites, require manual means to investigate and determine target information therefrom. Manual methods are inefficient, have limited ability to obtain information, and consume a tremendous amount of labor.
Disclosure of Invention
In view of the foregoing problems in the prior art, embodiments of the present application provide a method and an apparatus for obtaining target information, and a computer-readable storage medium. With such a method and device, the above-mentioned problems can be at least partially solved.
The examples of the present application provide the following: a target information acquisition method includes:
inputting the webpage content characteristics of the website to be analyzed into the first model to judge whether the corresponding webpage is a target webpage or not;
inputting the webpage code structure characteristics of the target webpage into a second model to obtain the code structure type of the target webpage;
and searching a corresponding interaction strategy according to the code structure type, and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs.
In some embodiments, the web page code structure characteristics of the target web page include: at least one of the depth of the HTML elements, the number of the HTML parallel elements and the number of the picture elements.
In some embodiments, the target information comprises: a collection account for the destination web site;
the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
In some embodiments, a registration operation is performed, including:
performing machine vision analysis on the webpage to position a registration button, and performing simulated click on the registration button; alternatively, the first and second electrodes may be,
and determining the website pointed by the registered keyword in the webpage source code, and jumping to the website pointed by the registered keyword.
In some embodiments, interacting with the corresponding target website according to the found interaction policy includes:
performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located;
and displacing the prompt words relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
In some embodiments, the top-up operation to obtain the collection account includes:
performing machine vision analysis on the webpage to position a recharging button, and performing simulated click on the recharging button to acquire a money receiving account; alternatively, the first and second electrodes may be,
analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
In some embodiments, when the collection link is obtained, the collection account is retrieved based on the collection link.
In some embodiments, the web page content characteristics of the website to be analyzed include:
and at least one of character characteristics, picture characteristics and video characteristics of the web page of the website to be analyzed.
In some embodiments, further comprising:
and traversing and searching the seed website to obtain at least one website of at least one associated website and webpage content corresponding to each website, wherein the searched website is used as a website to be analyzed.
In some embodiments, performing a traversal search of the seed web site includes: and performing deep traversal or breadth traversal search on the seed website.
In some embodiments, the first model comprises a machine learning model, and/or the second model comprises a machine learning model.
The examples of the present application provide the following: a target information acquisition apparatus comprising:
the first analysis module is used for inputting the webpage content characteristics of the website to be analyzed into the first model so as to judge whether the corresponding webpage is a target webpage or not;
the second analysis module is used for inputting the webpage code structure characteristics of the target webpage into the second model to obtain the code structure type of the target webpage;
and the target information acquisition module is used for searching a corresponding interaction strategy according to the code structure type and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs.
In some embodiments, the web page code structure characteristics of the target web page include: at least one of the depth of the HTML elements, the number of the HTML parallel elements and the number of the picture elements.
In some embodiments, the target information comprises: a collection account for the destination web site;
the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
In some embodiments, the target information obtaining module is specifically configured to: :
performing machine vision analysis on the webpage to position a registration button, and performing simulated click on the registration button; alternatively, the first and second electrodes may be,
and determining the website pointed by the registered keyword in the webpage source code, and jumping to the website pointed by the registered keyword.
In some embodiments, the target information obtaining module is specifically configured to:
performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located;
and displacing the prompt words relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
In some embodiments, the target information obtaining module is specifically configured to:
performing machine vision analysis on the webpage to position a recharging button, and performing simulated click on the recharging button to obtain a money receiving account; alternatively, the first and second electrodes may be,
analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
In some embodiments, the target information obtaining module is specifically configured to: and when the collection link is acquired, extracting a collection account according to the collection link.
In some embodiments, the web page content characteristics of the website to be analyzed include:
and at least one of character characteristics, picture characteristics and video characteristics of the web page of the website to be analyzed.
In some embodiments, further comprising:
and the web page searching module is used for performing traversal search on the seed website to obtain at least one website of at least one associated website and web page content corresponding to each website, wherein the searched website is used as a website to be analyzed.
In some embodiments, the web page search module is specifically configured to: and performing deep traversal or breadth traversal search on the seed website.
In some embodiments, the first model comprises a machine learning model; and/or, the second model comprises a machine learning model.
The examples of the present application provide the following: a target information acquisition apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the method as described above.
The examples of the present application provide the following: a computer-readable storage medium storing a program that, when executed by a processor, causes the processor to perform: the method as described above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the webpage content characteristics of the target webpages of the same type have certain similarity, so that whether one webpage is the target webpage of the concerned type or not can be judged according to the webpage content characteristics. To reduce the cost of website development, target websites typically employ similar web page code structures. If the similarity of the webpage code structures of the two websites is high enough, the similarity of the interaction modes of the two websites and the user can be inferred to be high enough. Therefore, the interaction strategy with the corresponding target website can be deduced according to the webpage code structure characteristics of the target webpage, and the success rate of successfully obtaining the target information is higher in the interaction process of the target website. The processes can be executed by program operation, and labor cost is greatly reduced.
It should be understood that the above description is only an overview of the technical solutions of the present application, so as to enable the technical solutions of the present application to be more clearly understood, and thus can be implemented according to the content of the description. In order to make the aforementioned and other objects, features and advantages of the present application more comprehensible, embodiments of the present application are described below.
Drawings
The advantages and benefits described herein, as well as other advantages and benefits, will be apparent to those of ordinary skill in the art upon reading the following detailed description of the exemplary embodiments. The drawings are only for purposes of illustrating exemplary embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like elements throughout. In the drawings:
fig. 1 is a schematic flowchart of a target information obtaining method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a target information acquiring apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a target information acquiring apparatus according to another embodiment of the present application.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In this application, it is to be understood that terms such as "including" or "having" are intended to indicate the presence of the disclosed features, integers, steps, acts, components, parts, or combinations thereof, and do not preclude the presence or addition of one or more other features, integers, steps, acts, components, parts, or groups thereof.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a schematic flowchart of a target information obtaining method for determining a target website and obtaining target information therefrom according to an embodiment of the present application, in which, from a device perspective, an executing subject may be one or more electronic devices; from the program perspective, the execution main body may accordingly be a program loaded on these electronic devices.
The flow in fig. 1 may include the following steps 101 to 103.
Step 101, inputting webpage content characteristics of a website (URL) to be analyzed into a first model to judge whether a corresponding webpage is a target webpage or not;
step 102, inputting the webpage code structure characteristics of the target webpage into a second model to obtain the code structure type of the target webpage;
103, searching a corresponding interaction strategy according to the code structure type, and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs.
The webpage content characteristics of the target webpages of the same type have certain similarity, so that whether one webpage is the target webpage of the concerned type or not can be judged according to the webpage content characteristics. To reduce the cost of website development, target websites typically employ similar web page code structures. If the similarity of the webpage code structures of the two websites is high enough, the similarity of the interaction modes of the two websites and the user can be inferred to be high enough. Therefore, the webpage code structure can be classified according to the webpage code structure characteristics of the target webpage. The same or similar web page code structures typically have the same interaction policy. Therefore, the interaction strategy with the corresponding target website can be deduced, and the success rate of successfully obtaining the target information is higher in the interaction process with the target website. The processes can be executed by program operation, and labor cost is greatly reduced.
The correspondence between the code structure type and the interaction policy may be preset. This can be set empirically by the skilled person.
Based on the target information acquisition method of fig. 1, some embodiments of the present application also provide some specific embodiments of the method, and extension schemes, which are explained below. In the following embodiments, a target site is a betting site, and target information is a collection account of the betting site. Based on the same inventive concept, the target website can also be other types of websites, and the target information can also be other types of information.
In some embodiments, the web page code structure characteristics of the target web page include: at least one of a hypertext markup language (HTML) element depth, a number of HTML parallel elements, and a number of picture elements.
Namely, the webpage codes of the target webpage are analyzed, and the structural features of the webpage codes are extracted. If the web page codes of the web pages of the two gambling web sites are similar in structure, it can be inferred that the design ideas of the developers of the two gambling web sites are consistent or are mass-copy generated web sites, and further, the interaction modes of the two gambling web sites and the user are the same.
In some embodiments, the target information comprises: a collection account for the destination web site; the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
If the collection account of the gambling website can be obtained, the collection account can be further monitored, analyzed and the like.
Some wagering websites have a top-up process: the user first needs to register and then logs on to the gambling web site and then clicks the load button to load the load. Then for this type of gambling web site, the program's interoperation policy is: and performing registration operation, login operation and recharging operation in sequence to obtain a money receiving account.
Other wagering web sites have been charged by: the user first needs to register and then logs on to the wagering website and then chats with the customer service person so that the customer service person will provide a collection account or collection link at the chat interface. Then for this type of gambling web site, the program's interoperation policy is: and sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain a collection account.
Those skilled in the art can set what interaction policy corresponds to what web site of the web page code structure based on experience. The application does not limit how the corresponding relationship between the two is established.
How the program performs the registration operation is described below.
One embodiment is: machine vision analysis is performed on the web page to locate the registration button, and a simulated click is performed on the registration button to enter the registration page.
The other implementation mode is as follows: analyzing the webpage source code, extracting the website pointed by the keyword 'registration', and then jumping to the website, thereby entering a registration page.
After entering the registration page, information such as an account name, a password, a mobile phone number, a mailbox, and the like generally needs to be filled in. The information can be preset, and in order to avoid the wind control mechanism of the gambling website, the registration information can be randomly generated and filled in the registration interface.
The following is an implementation of how a program fills in information: performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located; and displacing the prompt words relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
In some wagering websites, a load is placed upon completion of the login to gain access to the collection account.
Specifically, machine vision analysis is carried out on the webpage to position a recharging button, and simulated click is carried out on the recharging button to obtain a money receiving account; or analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
In some wagering websites, a user clicks on a recharge button to present the user with a collection account or collection link. If a collection link is shown, the program needs to pull a collection account from it.
In some embodiments, in step 101, at least one of text features, picture features and video features of the web page of the website to be analyzed is analyzed, so as to determine whether the web page to be analyzed is the target web page.
Taking a gambling website as an example of a webpage, words in the content of the webpage often comprise fields such as 'baccarat', 'lotus officer', 'horse race' and the like, pictures and videos displayed in the webpage are also significantly different from pictures and videos contained in a normal website, and the gambling website can be efficiently recognized by extracting the character features, the picture features and the video features of the webpage and inputting the features into a trained machine learning model (such as a neural network model).
The above character features can be extracted from the source code of the webpage, and the pictures and videos can be obtained according to the links in the source code of the webpage.
The following is an acquisition mode of the website to be analyzed: in step 100, traversal search is performed on the seed website to obtain at least one website of at least one associated website and web page content corresponding to each website, wherein the searched website is used as a website to be analyzed.
If the program is interested in a gambling web site, the seed web site is a known gambling web site. Examples of the traversal method include depth traversal and breadth traversal.
In some embodiments, the method further comprises: the validation code is identified and filled in, or the validation mode of the particular gaming web site is identified to complete automated validation of the gaming web site.
At present, many gambling websites have certain anti-reconnaissance and anti-pickoff awareness, and if operations such as frequent visit, repeated test recharging and the like are detected, operations such as IP sealing, hardware address sealing and the like are often executed. Therefore, it is necessary to adopt a certain back-blocking technique, for example, to change the IP and the device information by the simulator, in order to cope with the wind control measure of the betting site.
In the execution process of the method, all the acquired websites (URLs), the website of the target website, the content of the target webpage, the acquired target information (for example, a payment account), the interaction strategy and other information can be stored.
The structure and form of the first and second models are not limited by this application, for example, both models are machine learning models. How to train and optimize the first model and the second model is also not limited, and a person skilled in the art can flexibly set the first model and the second model according to the purpose of model operation.
Based on the same technical concept, the embodiment of the present application further provides a target information obtaining apparatus, configured to execute the method provided in any of the above embodiments. Fig. 2 is a schematic structural diagram of a target information obtaining apparatus according to an embodiment of the present application.
As shown in fig. 2, the object information acquiring apparatus includes: the first analysis module 2 is used for inputting the webpage content characteristics of the website to be analyzed into the first model so as to judge whether the corresponding webpage is a target webpage or not;
the second analysis module 3 is used for inputting the webpage code structure characteristics of the target webpage into the second model to obtain an interaction strategy aiming at the target website, wherein the target website is a website to which the target webpage belongs;
and the target information acquisition module 4 is used for interacting with the corresponding target website according to the interaction strategy to acquire the target information.
In some embodiments, the web page code structure characteristics of the target web page include: at least one of the depth of the HTML elements, the number of the HTML parallel elements and the number of the picture elements.
In some embodiments, the target information comprises: a collection account for the destination web site;
the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
In some embodiments, the target information obtaining module 4 is specifically configured to: :
performing machine vision analysis on the webpage to position a registration button, and performing simulated click on the registration button; alternatively, the first and second electrodes may be,
and determining the website pointed by the registered keyword in the webpage source code, and jumping to the website pointed by the registered keyword.
In some embodiments, the target information obtaining module 4 is specifically configured to:
performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located;
and displacing the prompt words relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
In some embodiments, the target information obtaining module 4 is specifically configured to:
performing machine vision analysis on the webpage to position a recharging button, and performing simulated click on the recharging button to obtain a money receiving account; alternatively, the first and second electrodes may be,
analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
In some embodiments, the target information obtaining module 4 is specifically configured to: and when the collection link is acquired, extracting a collection account according to the collection link.
In some embodiments, the web page content characteristics of the website to be analyzed include:
and at least one of character characteristics, picture characteristics and video characteristics of the web page of the website to be analyzed.
In some embodiments, further comprising: the web page searching module 1 is configured to perform traversal search on the seed website to obtain at least one website of at least one associated website and web page content corresponding to each website, where the searched website is used as a website to be analyzed.
In some embodiments, the web page search module 1 is specifically configured to: and performing deep traversal or breadth traversal search on the seed website.
In some embodiments, the first model comprises a machine learning model; and/or, the second model comprises a machine learning model.
It should be noted that the apparatus in the embodiment of the present application may implement each process of the foregoing method embodiment, and achieve the same effect and function, which are not described herein again.
Fig. 3 is a target information acquiring apparatus according to an embodiment of the present application, configured to execute the target information acquiring method shown in fig. 1, where the apparatus includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the target information acquisition method is described above.
According to some embodiments of the application, there is provided a non-transitory computer storage medium of a method having stored thereon computer-executable instructions configured to, when executed by a processor, perform: the target information acquisition method is described above.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the description of the apparatus and computer-readable storage medium embodiments is simplified because they are substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for their relevance.
The apparatus and the computer-readable storage medium provided in the embodiment of the present application correspond to the method one to one, and therefore, the apparatus and the computer-readable storage medium also have similar advantageous technical effects to the corresponding method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit from the description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (24)

1. A target information acquisition method, comprising:
inputting the webpage content characteristics of the website to be analyzed into the first model to judge whether the corresponding webpage is a target webpage or not;
inputting the webpage code structure characteristics of the target webpage into a second model to obtain the code structure type of the target webpage;
and searching a corresponding interaction strategy according to the code structure type, and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs.
2. The method of claim 1, wherein the web page code structure characteristics of the target web page comprise: at least one of the depth of the HTML elements, the number of the HTML parallel elements and the number of the picture elements.
3. The method of claim 1, wherein the target information comprises: a collection account for the destination web site;
the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
4. The method of claim 3, wherein performing the registration operation comprises:
performing machine vision analysis on the webpage to position a registration button, and performing simulated click on the registration button; alternatively, the first and second electrodes may be,
and determining the website pointed by the registered keyword in the webpage source code, and jumping to the website pointed by the registered keyword.
5. The method according to claim 3, wherein interacting with the corresponding target website according to the found interaction policy comprises:
performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located;
and performing cursor displacement relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
6. The method of claim 3, wherein the act of charging to obtain the collection account comprises:
performing machine vision analysis on the webpage to position a recharging button, and performing simulated click on the recharging button to acquire a money receiving account; alternatively, the first and second electrodes may be,
analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
7. The method of claim 6, wherein upon obtaining the collection link, a collection account is retrieved based on the collection link.
8. The method of claim 1, wherein the web page content characteristics of the website to be analyzed comprise:
and at least one of character characteristics, picture characteristics and video characteristics of the web page of the website to be analyzed.
9. The method of claim 1, further comprising:
and traversing and searching the seed website to obtain at least one website of at least one associated website and webpage content corresponding to each website, wherein the searched website is used as a website to be analyzed.
10. The method of claim 9, wherein performing a traversal search of the seed web site comprises: and performing deep traversal or breadth traversal search on the seed website.
11. The method of claim 1, wherein the first model comprises a machine learning model, and/or wherein the second model comprises a machine learning model.
12. A target information acquisition apparatus characterized by comprising:
the first analysis module is used for inputting the webpage content characteristics of the website to be analyzed into the first model so as to judge whether the corresponding webpage is a target webpage or not;
the second analysis module is used for inputting the webpage code structure characteristics of the target webpage into the second model to obtain the code structure type of the target webpage;
and the target information acquisition module is used for searching a corresponding interaction strategy according to the code structure type and interacting with a corresponding target website according to the searched interaction strategy to acquire target information, wherein the target website is a website to which the target webpage belongs.
13. The apparatus of claim 12, wherein the web page code structure characteristics of the target web page comprise: at least one of the depth of the HTML elements, the number of the HTML parallel elements and the number of the picture elements.
14. The apparatus of claim 12, wherein the target information comprises: a collection account for the destination web site;
the interoperation policy includes: and sequentially performing registration operation, login operation and recharging operation to obtain a money receiving account, or sequentially performing registration operation, login operation, customer service contact and simulated chat to obtain the money receiving account.
15. The apparatus of claim 14, wherein the target information obtaining module is specifically configured to: :
performing machine vision analysis on the webpage to position a registration button, and performing simulated click on the registration button; alternatively, the first and second electrodes may be,
and determining the website pointed by the registered keyword in the webpage source code, and jumping to the website pointed by the registered keyword.
16. The apparatus of claim 15, wherein the target information obtaining module is specifically configured to:
performing machine vision analysis on the webpage to locate prompt words of a text box where information to be filled is located;
and displacing the prompt words relative to the positioned prompt words to execute clicking operation in the corresponding text box to start text filling.
17. The apparatus of claim 14, wherein the target information obtaining module is specifically configured to:
performing machine vision analysis on the webpage to position a recharging button, and performing simulated click on the recharging button to obtain a money receiving account; alternatively, the first and second electrodes may be,
analyzing the webpage source code, identifying the page element for starting recharging according to the recharging keyword, and simulating and clicking the identified page element to acquire the money receiving account.
18. The apparatus of claim 17, wherein the target information obtaining module is specifically configured to: and when the collection link is acquired, extracting a collection account according to the collection link.
19. The apparatus of claim 12, wherein the web content characteristics of the website to be analyzed comprise:
and at least one of character characteristics, picture characteristics and video characteristics of the web page of the website to be analyzed.
20. The apparatus of claim 12, further comprising:
and the web page searching module is used for performing traversal search on the seed website to obtain at least one website of at least one associated website and web page content corresponding to each website, wherein the searched website is used as a website to be analyzed.
21. The apparatus of claim 20, wherein the web search module is specifically configured to: and performing deep traversal or breadth traversal search on the seed website.
22. The apparatus of claim 12, wherein the first model comprises a machine learning model; and/or, the second model comprises a machine learning model.
23. A target information acquisition apparatus characterized by comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the method according to any one of claims 1 to 11.
24. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a program that, when executed by a processor, causes the processor to perform: the method according to any one of claims 1 to 11.
CN202110214055.2A 2021-02-25 2021-02-25 Target information acquisition method and device and computer readable storage medium Pending CN112966263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110214055.2A CN112966263A (en) 2021-02-25 2021-02-25 Target information acquisition method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110214055.2A CN112966263A (en) 2021-02-25 2021-02-25 Target information acquisition method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112966263A true CN112966263A (en) 2021-06-15

Family

ID=76275660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110214055.2A Pending CN112966263A (en) 2021-02-25 2021-02-25 Target information acquisition method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112966263A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024729A (en) * 2021-10-29 2022-02-08 恒安嘉新(北京)科技股份公司 Website background detection method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685157A (en) * 2012-09-04 2014-03-26 珠海市君天电子科技有限公司 Method and system for collecting phishing websites based on payment
CN105302884A (en) * 2015-10-19 2016-02-03 天津海量信息技术有限公司 Deep learning-based webpage mode recognition method and visual structure learning method
CN108173814A (en) * 2017-12-08 2018-06-15 深信服科技股份有限公司 Detection method for phishing site, terminal device and storage medium
CN110110075A (en) * 2017-12-25 2019-08-09 中国电信股份有限公司 Web page classification method, device and computer readable storage medium
CN112199573A (en) * 2020-08-05 2021-01-08 宝付网络科技(上海)有限公司 Active detection method and system for illegal transaction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685157A (en) * 2012-09-04 2014-03-26 珠海市君天电子科技有限公司 Method and system for collecting phishing websites based on payment
CN105302884A (en) * 2015-10-19 2016-02-03 天津海量信息技术有限公司 Deep learning-based webpage mode recognition method and visual structure learning method
CN108173814A (en) * 2017-12-08 2018-06-15 深信服科技股份有限公司 Detection method for phishing site, terminal device and storage medium
CN110110075A (en) * 2017-12-25 2019-08-09 中国电信股份有限公司 Web page classification method, device and computer readable storage medium
CN112199573A (en) * 2020-08-05 2021-01-08 宝付网络科技(上海)有限公司 Active detection method and system for illegal transaction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024729A (en) * 2021-10-29 2022-02-08 恒安嘉新(北京)科技股份公司 Website background detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Rosen et al. What are mobile developers asking about? a large scale study using stack overflow
CN108566399B (en) Phishing website identification method and system
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
Lucrédio et al. Moogle: a metamodel-based model search engine
CN104956362A (en) Analyzing structure of web application
CN109308254B (en) Test method, test device and test equipment
CN107257390B (en) URL address resolution method and system
CN107292412A (en) A kind of problem Forecasting Methodology and forecasting system
CN102609412A (en) RSS (Really Simple Syndication)-based multi-thread graphic information synchronization crawling control method and system
CN107294918B (en) Phishing webpage detection method and device
CN107015986B (en) Method and device for crawling webpage by crawler
CN110414989A (en) Method for detecting abnormality and device, electronic equipment and computer readable storage medium
CN113568841A (en) Risk detection method, device and equipment for applet
CN111324894A (en) XSS vulnerability detection method and system based on web application security
CN112966263A (en) Target information acquisition method and device and computer readable storage medium
CN114356747A (en) Display content testing method, device, equipment, storage medium and program product
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
EP2713287A1 (en) Network comment collection method and system
CN112434223A (en) Information recommendation method and device
CN112882890A (en) Log collection method and device
CN110276183B (en) Reverse Turing verification method and device, storage medium and electronic equipment
CN116738293A (en) Service evaluation processing method and device and electronic equipment
CN114238048B (en) Automatic testing method and system for Web front-end performance
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment
CN104407979A (en) Script detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination