CN111414525B - Method, device, computer equipment and storage medium for acquiring data of applet - Google Patents

Method, device, computer equipment and storage medium for acquiring data of applet Download PDF

Info

Publication number
CN111414525B
CN111414525B CN202010216892.4A CN202010216892A CN111414525B CN 111414525 B CN111414525 B CN 111414525B CN 202010216892 A CN202010216892 A CN 202010216892A CN 111414525 B CN111414525 B CN 111414525B
Authority
CN
China
Prior art keywords
applet
data
layer code
code
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010216892.4A
Other languages
Chinese (zh)
Other versions
CN111414525A (en
Inventor
周江
王建行
罗德志
王枭
刘鹏
严明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Domain Computer Network Co Ltd
Original Assignee
Shenzhen Tencent Domain Computer Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Domain Computer Network Co Ltd filed Critical Shenzhen Tencent Domain Computer Network Co Ltd
Priority to CN202010216892.4A priority Critical patent/CN111414525B/en
Publication of CN111414525A publication Critical patent/CN111414525A/en
Application granted granted Critical
Publication of CN111414525B publication Critical patent/CN111414525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The application relates to a method, a device, a computer device and a storage medium for acquiring data of an applet. The method comprises the following steps: acquiring a first applet and running the first applet; hijacking a loading interface of the first applet during the running process of the first applet; acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of a first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet; calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet; and acquiring the data in the page of the second applet by the data acquisition program. The method can realize the function of acquiring the data of the small program.

Description

Method, device, computer equipment and storage medium for acquiring data of applet
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for acquiring data of an applet, a computer device, and a storage medium.
Background
With the rapid development of networks, the world wide web has become a carrier of vast amounts of information, and how to efficiently extract and utilize such information has become a significant challenge. Traditional data acquisition methods generally acquire data by using network data acquisition. The network data acquisition refers to a technology of sending a network request through simulating browser behaviors, and automatically analyzing and storing data according to a certain rule after receiving a request response. With further development of network technology, applet technology has emerged. Applet refers to an application that is implemented based on a host program and that can be used without a download installation.
However, when acquiring the data of the applet by using network data acquisition, there is a problem that the data of the applet cannot be crawled.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data acquisition method, apparatus, computer device, and storage medium for an applet capable of acquiring data of the applet.
A method of applet data acquisition, the method comprising:
acquiring a first applet and running the first applet;
hijacking a loading interface of the first applet in the running process of the first applet;
acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of the first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet;
calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet;
and acquiring the data in the page of the second applet by the data acquisition program.
An applet data acquisition apparatus, the apparatus comprising:
the operation module is used for acquiring a first applet and operating the first applet;
The hijacking module is used for hijacking the loading interface of the first applet in the running process of the first applet;
the second applet generation module is used for acquiring a data acquisition program, injecting codes of the data acquisition program into codes of the first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet;
the page generation module is used for calling a loading interface of the second applet to load codes of the second applet and generating a page of the second applet;
and the data acquisition module is used for acquiring the data in the page of the second applet through the data acquisition program.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.
A computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method described above.
The data acquisition method, the data acquisition device, the computer equipment and the storage medium of the applet acquire a first applet and operate the first applet; hijacking a loading interface of the first applet during the running process of the first applet; acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of a first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet; calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet; the first applet is operated based on the host program, the codes of the data acquisition program are injected into the codes of the first applet, and the data acquisition program and the first applet can operate based on the same bottom layer framework, the same operation logic and the like of the host program, so that the data in the page of the second applet can be acquired through the data acquisition program contained in the second applet, and the function of acquiring the data of the second applet is realized.
Drawings
FIG. 1 is a diagram of an application environment for a method of data acquisition for a applet in one embodiment;
FIG. 2 is a flow chart of a method for acquiring data of a small program in one embodiment;
FIG. 3 is a schematic diagram of a security report generated for acquired text data in one embodiment;
FIG. 4 is a schematic diagram of a security report generated for acquired picture data in one embodiment;
FIG. 5 is a schematic illustration of the application of a applet in one embodiment;
FIG. 6 is a diagram of acquiring data in a page of a second applet, in one embodiment;
FIG. 7 is a diagram of acquiring data in a page of a second applet in another embodiment;
FIG. 8 is a flow chart of a method for acquiring data of a small program according to another embodiment;
FIG. 9 is a block diagram of a small-program data acquisition device in one embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The method for acquiring the data of the applet can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 obtains the first applet and sends a data request to the server 104 through the network; when receiving the data returned by the server 104 according to the data request, the first applet may be run; hijacking a loading interface of the first applet during the running process of the first applet; acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of a first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet; calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet; and acquiring the data in the page of the second applet by the data acquisition program. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, there is provided an applet data acquisition method comprising the steps of:
Step 202, a first applet is obtained and run.
An applet refers to an application that is implemented based on a host program and that can be used without a download installation. The host program may be a WeChat, a payment device, or other application program, etc.
The terminal opens a host program of the first applet, acquires the first applet from the host program, and runs the first applet. In one embodiment, the first applet may be obtained from an applet collection of the host program. The applet collection may be a collection of applets used by the user in history, or a collection of applets collected by the user, but is not limited thereto.
In another embodiment, the camera of the terminal can be called by the host program, so that the camera is opened to scan the scanning code corresponding to the first applet, and the first applet is obtained. The scan code may be a bar code, a two-dimensional code, or the like.
After the terminal acquires the first applet, an operation instruction generated by the first applet is received, the code of the first applet is acquired based on the operation instruction, and the code of the first applet is analyzed, so that the first applet is operated.
Step 204, hijacking the loading interface of the first applet during the running process of the first applet.
The execution environment of the first applet includes a rendering layer and a logic layer. The rendering layer of the first applet is used for displaying the data of the first applet, for example, displaying a default page of the first applet on a display interface of the terminal. The logic layer of the first applet is used for generating and processing the data of the first applet, such as transferring the data, checking the data, calling the interface, etc. The loading interface of the first applet refers to an interface for loading the code of the first applet so that the first applet can be presented on the display interface of the terminal.
Specifically, the hook technology is adopted to hijack the loading interface of the first applet during the running process of the first applet. Among other things, hook technology is a special message handling mechanism that can monitor various event messages in a system or process, intercept messages directed to a target window, and handle them. The hook technology can be used to monitor the occurrence of specific events in the system, perform specific functions such as screen word taking, log monitoring, keyboard and mouse input interception, etc.
Step 206, acquiring a data acquisition program, and injecting the code of the data acquisition program into the code of the first applet to generate a second applet; the loading interface of the second applet is identical to the loading interface of the first applet.
The data acquisition program refers to a program for automatically capturing data of the first applet according to a certain rule. The data acquisition program can be designed by a developer. The second applet refers to the applet from which the data is to be acquired. The second applet contains the code of the first applet and the code of the data acquisition program.
It will be appreciated that the second applet includes the code of the data acquisition program and the code of the first applet, and the data acquisition program is a program for acquiring data in the second applet page, and does not change the underlying architecture such as the loading interface of the first applet, so that the basic functions, basic architecture, etc. of the second applet are the same as those of the first applet, and the loading interface of the second applet is the same as that of the first applet.
And step 208, calling a loading interface of the second applet to load codes of the second applet, and generating a page of the second applet.
The page of the second applet may include data such as a picture, text, link, and video, and may include elements such as a button (wx-button), an input box (wx-input), and a navigation bar (wx-navigator).
And the terminal calls a loading interface of the second applet to load the codes of the second applet, namely the codes of the first applet and the codes of the data acquisition program, and generates a page of the second applet. In the pages of the second applet, default pages of the first applet, e.g., initial pages of the first applet, may be included; dynamic pages may also be included. Dynamic pages refer to pages that are jumped by clicking on links, buttons, etc.
Step 210, acquiring data in the page of the second applet by the data acquisition program.
The data acquisition program acquires the data in the page of the second applet, which may include data such as pictures, texts, links, videos, and the like, and may also include elements such as buttons and input boxes. The data acquisition program may also grab the network request sent by the second applet to the server.
The data acquisition program refers to a program for automatically capturing data of the second applet according to a certain rule. When the page is the default page of the first applet, the data in the default page is acquired by the data acquisition program. When the page comprises elements such as a link, a button, an input box, a navigation bar and the like, the data acquisition program can simulate user operation according to the elements, click the elements such as the link, the button, the navigation bar and the like or acquire input data and the like, so that the page jumps to the next page, and then the data in the next page is acquired through the data acquisition program.
The data acquisition method of the applet acquires a first applet and runs the first applet; hijacking a loading interface of the first applet during the running process of the first applet; acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of a first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet; calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet; the first applet is operated based on the host program, the codes of the data acquisition program are injected into the codes of the first applet, and the data acquisition program and the first applet can operate based on the same bottom layer framework, the same operation logic and the like of the host program, so that the data in the page of the second applet can be acquired through the data acquisition program contained in the second applet, and the function of acquiring the data of the second applet is realized.
By injecting the code of the data acquisition program into the code of the first applet, the data acquisition program has the ability to access DOM (Document Object Model ) data of the first applet as well as the ability to access the native interface of the host program of the first applet. Therefore, the data acquisition program can interact with the host program of the first applet, for example, call the payment function of the host program, so that the data of the interaction between the second applet and the host program can be acquired, and the more complete data of the second applet can be acquired.
The code of the data acquisition program is injected into the code of the first applet, the element types defined by the DOM (Document Object Model ) in the first applet can be directly acquired, the data of the second applet can be accurately acquired based on the element types defined by the DOM in the first applet, and the problem that page elements in the applet cannot be normally rendered due to the fact that different element types are adopted by a browser in the process of acquiring the data of the applet by adopting network data acquisition is avoided, so that the data of the applet cannot be acquired through the network data acquisition.
It can be understood that the event mechanism of the applet depends on the lower-level architecture of the applet, and when the browser acquires data from the applet, the browser cannot trigger and process the events correctly by adopting network data acquisition in the conventional technology. In this embodiment, the code of the data acquisition program is injected into the code of the first applet to generate the second applet, so that the second applet is also operated based on the applet infrastructure defined by the developer, and the data of the second applet can be acquired through the data acquisition program, thereby realizing the function of acquiring the data of the second applet.
Further, after the data in the page of the second applet is acquired by the data acquisition program, it includes: scanning the data in the acquired page of the second applet to obtain the security attribute of each data; based on the security attributes of the respective data, a security report for the second applet is generated.
The security properties of the data may be e.g. secure, unsafe, further the security properties of the data may be violence, pornography, people, scenery, etc. And scanning the acquired data in the page of the second applet to obtain the security attribute of each data, so that the security of the second applet can be evaluated more accurately.
When the data is text, acquiring a reference keyword, matching the reference keyword with the text, and determining the safety attribute of the text according to the matching result. For example, the reference keyword may be a (a represents violence), and when the text obtained from the page of the second applet matches a, indicating that a is included in the text, the security attribute of the text may be "unsafe-violence".
In one embodiment, as shown in FIG. 3, the second applet is a take-away applet, text is obtained from the second applet, and a security report of the text in the second applet is generated. The text content with the number of 1 is "A", the fact that the text content "A" is a word representing violence is detected, so that the security attribute of the generated text content "A" is "unsafe-violence", the order in which the text content "A" is located is also obtained as "B1", and the obtaining time of the text content "A" is 2019-05-31 09:47:21.
The text content with the number of 2 is "11 months", and the word representing time is detected as "11 months", so that the security attribute of generating the "11 months" is "security-time", and the order in which the text content is "11 months" is also acquired as "B2", and the acquisition time of the text content "11 months" is 2019-05-31 09:48:26.
The text content with the number of 3 is a 'high new area', and the 'high new area' is detected to be a word representing an address, so that the safety attribute of the generated 'high new area' is a 'safety-address', an order in which the text content is the 'high new area' is also acquired as a 'B2', and the acquisition time of the text content 'high new area' is 2019-05-31 09:48:38.
The text content with the number of 4 is 'service attitude difference', and the detection of the 'service attitude difference' is a word representing evaluation, so that the security attribute for generating the 'service attitude difference' is 'security-evaluation', and an order in which the text content is 'service attitude difference' is 'B3' can be obtained, and the obtaining time of the text content 'service attitude difference' is 2019-05-31 09:50:21.
When the data is a picture or a video, detecting whether the picture or the video contains sensitive information or not; when the picture or video contains sensitive information, the security attribute of the picture or video can be 'unsafe'; when no sensitive information is contained in a picture or video, the security attribute of the picture or video may be "security". The sensitive information can be violence, pornography, abuse and the like.
In one embodiment, a picture is obtained from a page of the second applet and a security report of the obtained picture is generated. As shown in fig. 4, the acquired picture is numbered 1, the picture address of the picture is "A1", the security attribute of the picture is "security-person" detected, and the acquisition time of the picture may be 2019-05-31 09:47:21.
The acquired picture is numbered 2, the picture address of the picture is 'A2', the security attribute of the picture is 'security-landscape', and the acquisition time of the picture can be acquired to be 2019-05-31 09:48:26.
The acquired picture is numbered 3, the picture address of the picture is 'A3', the security attribute of the picture is 'security-person' detected, and the acquisition time of the picture can be acquired to be 2019-05-31 09:48:38.
The acquired picture is numbered 4, the picture address of the picture is "A4", the security attribute of the picture is "security-person" detected, and the acquisition time of the picture can be 2019-05-31:09:50:21.
When the data is a link, a reference link set can be acquired from the server, the link is matched with each reference link in the reference link set, and the security attribute of the link is determined according to the matching result. The reference link set also comprises reference security attributes corresponding to the reference links. When the link acquired from the second applet matches a reference link in the reference link set, the reference security attribute corresponding to the matching reference link is used as the security attribute of the link acquired from the second applet. For example, the link acquired from the second applet is B1, and when the link B1 matches with the reference link B2, the security attribute corresponding to the reference link B2 is pornography, and the security attribute corresponding to the link B1 is pornography.
As shown in fig. 5, when the host program of the first applet is a letter, 502 refers to a letter client, and includes an application program of the letter itself and the first applet. The running environment of the applet is divided into a rendering layer where WXML (WeiXin Markup Language) templates, WXSS (WeiXin Style Sheets) styles (DOM data), rendering logic, etc. work and a logic layer where JS (JavaScript) scripts work. WXML is a set of label language designed by a framework, and can construct the structure of a page by combining a basic component and an event system. WXSS is a set of style languages used to describe the component styles of WXML. DOM (Document Object Model ) refers to the standard programming interface of the processing extensible markup language recommended by the W3C (World Wide Web Consortium, web consortium) organization. The rendering logic comprises DOM operation interface and other data, and is realized by JS script.
The interface of the rendering layer of the first applet is rendered using 504, webView; the logical layer runs JS script using 506, i.e., the JsCore thread. The first applet has multiple interfaces so the render layer has multiple webviews. Communication between the rendering layer and the logic layer is relayed via 508, native (application of the WeChat itself), and the logic layer sends a network request to be forwarded via 508, native, to the third party server 510. And the data sent by the third party server 510 is also forwarded to the logical layer via 508, native. The third party server 510 communicates with the WebSocket client 502 in the terminal. WebSocket is a protocol that performs full duplex communication over a single TCP (Transmission Control Protocol ) connection.
In one embodiment, the code of the first applet includes a first render layer code and a first logic layer code, and the code of the data acquisition program includes a second render layer code and a second logic layer code. Injecting the code of the data acquisition program into the code of the first applet to generate a second applet comprising: injecting the second rendering layer code of the data acquisition program into the first rendering layer code of the first applet to obtain the rendering layer code of the second applet; injecting the second logic layer code of the data acquisition program into the first logic layer code of the first applet to obtain the logic layer code of the second applet; the second applet is generated based on the rendering layer code of the second applet and the logic layer code of the second applet.
The rendering layer is used to present the data, for example, to display the individual elements in an interface. The logic layer is used for generating data and processing the data, such as transferring the data, checking the data, calling an interface and the like. The first rendering layer code refers to the code of the rendering layer of the first applet. The first logical layer code refers to the code of the logical layer of the first applet. The second rendering layer code refers to the code of the rendering layer of the data acquisition program. The second logical layer code refers to the code of the logical layer of the data acquisition program.
Specifically, the second rendering layer code of the data acquisition program is injected to the tail part of the first rendering layer code of the first applet, so as to obtain the rendering layer code of the second applet; and calling a v8 engine (JavaScript engine) interface to inject the second logic layer code into the first logic layer code of the first applet to obtain the logic layer code of the second applet.
In this embodiment, the rendering layer code of the second applet includes the second rendering layer code of the data acquisition program, and the logic layer code of the second applet includes the second logic layer code of the data acquisition program, so that the data acquisition program can interact with the first applet on the rendering layer, for example, acquire data, add data, modify data, etc., and can verify the data on the logic layer, so that more complete and accurate data can be acquired.
In one embodiment, invoking the load interface of the second applet to load code of the second applet, generating a page of the second applet comprises: calling a loading interface of the second applet, loading a rendering layer code of the second applet, and checking logic of the rendering layer code of the second applet by adopting the logic layer code of the second applet; and when the logic verification of the rendering layer code of the second applet is passed, generating a page corresponding to the rendering layer code of the second applet.
And the terminal calls a loading interface of the second applet, loads the rendering layer code of the second applet, and can generate a page corresponding to the rendering layer code of the second applet. Logic of the rendering layer code of the second applet may include whether an API (Application Programming Interface, application program interface) called in the rendering layer code of the second applet is correct, whether contents of elements in the rendering layer code of the second applet correspond to types of elements, and the like. For example, the type of the element is a mobile phone number, and the content of the element is a picture, and the content of the element does not correspond to the type of the element.
When the logic verification of the rendering layer code of the second applet passes, a page corresponding to the rendering layer of the second applet can be generated more accurately, so that more accurate data are obtained, and the problems that the generated page is disordered, messy codes exist and the like due to the fact that the logic of the rendering layer code of the second applet is inaccurate are avoided.
In one embodiment, as shown in FIG. 6, the retrieval of data in a page of a second applet by a data retrieval program includes:
step 602, acquiring a current page of a second applet through a data acquisition program, and acquiring data of each element in the current page; the data of the element includes a type of the element.
In the current page, various elements may be included, such as pictures, text, links, video, navigation bars, and the like. Traversing each element in the current page through a data acquisition program, and acquiring the data of each element. The data of the element may include a type of the element, and may further include a data amount size of the element, a position of the element, a size of the element, and the like. Wherein the type of element is e.g. input type, picture type, text type, link type, etc.
Step 604, when the type of the element is an input type, jumping to the next page according to the element of the input type.
Specifically, acquiring data of each element in the current page, wherein the data of the element comprises the type of the element; and screening the elements with the types of the elements being input types from the data of each element.
When the type of the element is an input type, data needs to be input to the element, so that the next page can be jumped to through the element of the input type. For example, if the element is an address, address information needs to be input to the element, so that the next page is skipped. For another example, if the element is a login button, a login instruction needs to be input to the element, and a block can be clicked or slid to a preset position, so as to log in to the next page. For another example, if the element is of a link type, a click command needs to be input to the element, and a preset operation can be clicked or executed, so that the next page is skipped.
Step 606, taking the next page as a new current page, and executing the step of acquiring the data of each element in the current page until each page in the second applet is traversed and the data of each element in each page is acquired.
And after jumping to the next page, taking the next page as a new current page, acquiring the data of each element in the current page, and the like until each page in the second applet is traversed and the data of each element in each page is acquired.
In this embodiment, the current page of the second applet is acquired by the data acquisition program, and the data of each element in the current page is acquired; the data of the element includes a type of the element; when the type of the element is an input type, jumping to the next page according to the element of the input type; and taking the next page as a new current page, and executing the step of acquiring the data of each element in the current page until each page in the second applet is traversed and the data of each element in each page is acquired, wherein each page in the second applet and the data of each element in each page can be acquired, and the more complete data of the second applet can be acquired.
In one embodiment, when the type of the element is an input type, jumping to a next page according to the element of the input type includes: when the type of the element is an input type, acquiring input data corresponding to the input type through a data acquisition program; inputting input data into elements to generate a jump instruction; and jumping to the next page according to the jump instruction.
The terminal may obtain a correspondence between the input type and the input data in advance. In one embodiment, the terminal may obtain the correspondence between the input type and the input data from a local memory. In another embodiment, the terminal may also acquire the correspondence between the input type and the input data from the data acquisition program. In another embodiment, the terminal may further obtain a correspondence between the input type and the input data from the background server. The background server may be a server corresponding to a host program of the first applet. For example, if the host program of the first applet is a WeChat, the terminal may obtain the correspondence between the input type and the input data from the WeChat server.
The jump instruction comprises the address of the next page, and the next page is searched and loaded according to the address of the next page, and is displayed on the display interface of the terminal.
The elements of the input type may be buttons, input boxes, links, navigation bars, and the like. When the input type element is a button, the corresponding input data is a click command or a sliding command and the like; and inputting input data such as a click command or a sliding command into the button, namely clicking the button, generating a jump command, and jumping to the next page.
When the input type element is an input box, the corresponding input data is text, picture, video, link and the like; inputting input data such as text, pictures, videos, links and the like into an input box, generating a jump instruction, and jumping to the next page.
When the element of the input type is an input box, in order to more accurately input the corresponding input data in the input box, the input type of the input box may be further distinguished. For example, the input types of the input box can be divided into: a telephone number input box, an authentication code input box, a password input box, a picture input box, an account input box, a text input box, an address input box and the like.
When the input type element is a link, the corresponding input data is a click command or other preset commands and the like; inputting input data such as a click command or other preset commands into the link, namely clicking the link, generating a jump command, and jumping to the next page.
In this embodiment, the user operation is simulated by the data acquisition program, such as clicking a link, inputting corresponding data in an input box, clicking a button, clicking a navigation bar, and the like, so that the user can jump to the next page, thereby acquiring the data in the next page, and more completely acquiring the data of the second applet.
In one embodiment, as shown in FIG. 7, the code of the data acquisition program includes second rendering layer code and second logic layer code.
Acquiring input data corresponding to the input type through a data acquisition program, including:
step 702, obtaining input data corresponding to the input type through a second rendering layer code of the data obtaining program.
The terminal acquires the corresponding relation between the input type and the input data in advance, and when the type of the element acquired by the data acquisition program is the input type, acquires the input data matched with the input type of the element from the corresponding relation between the input type and the input data.
Inputting the input data into the element, generating a jump instruction, comprising:
step 704, the input data is input into the element, and the input data is checked by the second logic layer code of the data acquisition program.
It can be understood that the corresponding data is input in the input box, so that the next page can be accurately skipped. For example, the login button inputs a click command to accurately jump to the next page; and the corresponding verification code is input in the verification code input box, so that the next page can be skipped. For another example, when the login button inputs a picture or a text, the input data does not correspond to the element and cannot jump to the next page; when a click command is input in the mobile phone number input box, the input data does not correspond to the element and can not jump to the next page.
Therefore, the input data is checked through the second logic layer code of the data acquisition program, so that the input data corresponds to the element of the input type, and the next page can be skipped.
Step 706, when the input data is verified, generates a jump instruction.
In this embodiment, input data corresponding to an input type is acquired through a second rendering layer code of the data acquisition program; and inputting the input data into the elements, and checking the input data through a second logic layer code of the data acquisition program, so that the input data corresponds to the elements of the input type, and a jump instruction is generated, thereby jumping to the next page.
In one embodiment, the method further comprises: and analyzing the code of the first applet to obtain a loading interface of the first applet. During the running process of the first applet, hijacking the loading interface of the first applet, comprising: monitoring an interface called by the first applet in the running process of the first applet; when the first applet calls the loading interface, the loading interface of the first applet is hijacked.
In the code of the first applet, each interface of the first applet has a corresponding identifier, and the identifier of the loading interface can be found from the code of the first applet, thereby obtaining the loading interface of the first applet.
Monitoring an interface called by the first applet by adopting a hook technology in the running process of the first applet; when the first applet is detected to call the loading interface, the loading interface of the first applet is hijacked, namely the first applet is not allowed to call the loading interface.
Further, acquiring an installation package of a host program of the first applet; decompiling the installation package of the host program to obtain the code of the host program; the code of the first applet is obtained from the code of the host program.
It will be appreciated that the applet is run based on the host program, and that the applet is not installed in a separate installation package, with the applet code residing in the host program's installation package. Therefore, firstly, the installation package of the host program of the first applet is obtained, and then decompilation is carried out on the installation package of the host program to obtain the code of the host program. The code of the host program comprises the code of the first applet, the address of the code of the first applet is found from the code of the host program, and the code of the first applet is obtained.
The system operated in the terminal can be one of an android system, a iOS (iPhone Operation System) system, a Linux system and the like. When the system running in the terminal is an android system, an APK (Android application package ) decompiling technology in a dex code decompiling tool can be adopted to decompil an installation package of a host program to obtain a pseudo code of the host program, the pseudo code of the host program is analyzed to obtain a pseudo code of a first applet, and then the pseudo code of the first applet is converted into a code of the first applet. Where pseudo code (Pseudocode) refers to a language that describes algorithms in words and symbols (including mathematical symbols) that are intermediate between natural and computer languages. The installation package of the host program comprises relevant logic of the host program in the running process and relevant logic of the first applet in the running process.
In one embodiment, as shown in FIG. 8, after the first applet and the data acquisition program are acquired, step 802 is performed to inject the code of the data acquisition program into the code of the first applet, generating a second applet. Step 804 is performed to obtain the page of the second applet by the data acquisition program. Step 806 is performed to traverse the pages.
Step 808 is performed to determine if all pages have been traversed. When the determination is negative, i.e. when not traversing all pages, step 810 is performed to jump to the next page. Step 812 is performed to traverse all elements in the page. Step 814 is performed to determine if data for all elements in the page has been obtained. When the determination is yes, that is, when the data of all the elements in the page has been acquired, the execution step 806 is returned, and each page is traversed. When the judgment is no, that is, when the data of all the elements in the page are not acquired, step 816 is executed, and the simulation operation is performed on the elements of the input type through the data acquisition program. Specifically, input data corresponding to an input type is acquired through a data acquisition program; input data into the element, generate a jump instruction. Step 810 is performed to jump to the next page.
After step 808 is performed to determine whether all pages have been traversed, when the determination is yes, that is, when all pages have been traversed, it means that the data acquisition program has acquired all elements in all pages of the second applet, and then ends.
It should be understood that, although the steps in the flowcharts of fig. 2, 6 to 8 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 2, 6-8 may include a plurality of steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
In one embodiment, as shown in fig. 9, an apparatus 900 for acquiring data of an applet is provided, which may employ software modules or hardware modules, or a combination of both, as part of a computer device, and specifically includes: a run module 902, a hijack module 904, a second applet generation module 906, a page generation module 908, and a data acquisition module 910, wherein:
The operation module 902 is configured to obtain a first applet and operate the first applet.
The hijacking module 904 is configured to hijack the loading interface of the first applet during the running process of the first applet.
A second applet generation module 906, configured to acquire a data acquisition program, inject a code of the data acquisition program into a code of the first applet, and generate a second applet; the loading interface of the second applet is identical to the loading interface of the first applet.
The page generation module 908 is configured to call a loading interface of the second applet to load the code of the second applet, and generate a page of the second applet.
The data acquisition module 910 is configured to acquire, by the data acquisition program, data in a page of the second applet.
The data acquisition device of the applet acquires a first applet and runs the first applet; hijacking a loading interface of the first applet during the running process of the first applet; acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of a first applet to generate a second applet; the loading interface of the second applet is the same as the loading interface of the first applet; calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet; the first applet is operated based on the host program, the codes of the data acquisition program are injected into the codes of the first applet, and the data acquisition program and the first applet can operate based on the same bottom layer framework, the same operation logic and the like of the host program, so that the data in the page of the second applet can be acquired through the data acquisition program contained in the second applet, and the function of acquiring the data of the second applet is realized.
In one embodiment, the code of the first applet includes a first render layer code and a first logic layer code, and the code of the data acquisition program includes a second render layer code and a second logic layer code. The second applet generating module 906 is further configured to inject a second rendering layer code of the data acquisition program into a first rendering layer code of the first applet to obtain a rendering layer code of the second applet; injecting the second logic layer code of the data acquisition program into the first logic layer code of the first applet to obtain the logic layer code of the second applet; the second applet is generated based on the rendering layer code of the second applet and the logic layer code of the second applet.
In one embodiment, the page generating module 908 is further configured to call a loading interface of the second applet, load a rendering layer code of the second applet, and verify logic of the rendering layer code of the second applet with the logic layer code of the second applet; and when the logic verification of the rendering layer code of the second applet is passed, generating a page corresponding to the rendering layer code of the second applet.
In one embodiment, the data obtaining module 910 is further configured to obtain, by using the data obtaining program, a current page of the second applet, and obtain data of each element in the current page; the data of the element includes a type of the element; when the type of the element is an input type, jumping to the next page according to the element of the input type; and taking the next page as a new current page, and executing the step of acquiring the data of each element in the current page until each page in the second applet is traversed and the data of each element in each page is acquired.
In one embodiment, the data obtaining module 910 is further configured to obtain, when the type of the element is an input type, input data corresponding to the input type through a data obtaining program; inputting input data into elements to generate a jump instruction; and jumping to the next page according to the jump instruction.
In one embodiment, the code of the data acquisition program includes second rendering layer code and second logic layer code. The data acquisition module 910 is further configured to acquire input data corresponding to an input type through a second rendering layer code of the data acquisition program; inputting input data into the elements, and checking the input data through a second logic layer code of the data acquisition program; when the input data is verified, a jump instruction is generated.
In one embodiment, the data obtaining apparatus 900 of the applet further includes an parsing module, configured to parse a code of the first applet to obtain a loading interface of the first applet. The hijacking module 904 is further configured to monitor an interface called by the first applet during the running process of the first applet; when the first applet calls the loading interface, the loading interface of the first applet is hijacked.
For specific limitations of the data acquisition device of the applet, reference may be made to the limitations of the data acquisition method of the applet hereinabove, and no further description is given here. The individual modules in the data acquisition device of the applet described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of data acquisition for an applet. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A method of applet data acquisition, the method comprising:
acquiring a first applet and running the first applet;
hijacking a loading interface of the first applet in the running process of the first applet;
acquiring a data acquisition program, and injecting codes of the data acquisition program into codes of the first applet to generate a second applet; the code of the data acquisition program comprises a second rendering layer code and a second logic layer code, and the loading interface of the second applet is the same as the loading interface of the first applet;
Calling a loading interface of the second applet to load codes of the second applet and generating pages of the second applet;
acquiring a current page of the second applet through the data acquisition program, and acquiring data of each element in the current page; the data of the element includes a type of the element;
when the type of the element is an input type, performing simulation operation on the element of the input type through a second rendering layer code of the data acquisition program, acquiring input data corresponding to the input type, inputting the input data into the element, and checking the input data through a second logic layer code of the data acquisition program;
when the input data is checked to pass, generating a jump instruction;
jumping to the next page according to the jump instruction;
and taking the next page as a new current page, and executing the step of acquiring the data of each element in the current page until each page in the second applet is traversed, and acquiring the data of each element in each page.
2. The method of claim 1, wherein the code of the first applet comprises first render layer code and first logic layer code;
The step of injecting the code of the data acquisition program into the code of the first applet to generate a second applet, comprising:
injecting the second rendering layer code of the data acquisition program into the first rendering layer code of the first applet to obtain a rendering layer code of a second applet;
injecting the second logic layer code of the data acquisition program into the first logic layer code of the first applet to obtain a logic layer code of a second applet;
the second applet is generated based on the rendering layer code of the second applet and the logic layer code of the second applet.
3. The method of claim 2, wherein the invoking the load interface of the second applet loads code of the second applet, generating a page of the second applet, comprising:
invoking a loading interface of the second applet, loading a rendering layer code of the second applet, and checking logic of the rendering layer code of the second applet by adopting the logic layer code of the second applet;
and when the logic verification of the rendering layer code of the second applet passes, generating a page corresponding to the rendering layer code of the second applet.
4. The method according to claim 1, wherein the method further comprises:
analyzing the code of the first applet to obtain a loading interface of the first applet;
the hijacking the loading interface of the first applet during the running process of the first applet comprises:
monitoring an interface called by the first applet in the running process of the first applet;
when the first applet calls the loading interface, hijacking the loading interface of the first applet.
5. An applet data acquisition apparatus, said apparatus comprising:
the operation module is used for acquiring a first applet and operating the first applet;
the hijacking module is used for hijacking the loading interface of the first applet in the running process of the first applet;
the second applet generation module is used for acquiring a data acquisition program, injecting codes of the data acquisition program into codes of the first applet to generate a second applet; the code of the data acquisition program comprises a second rendering layer code and a second logic layer code, and the loading interface of the second applet is the same as the loading interface of the first applet;
The page generation module is used for calling a loading interface of the second applet to load codes of the second applet and generating a page of the second applet;
the data acquisition module is used for acquiring a current page of the second applet through the data acquisition program and acquiring data of each element in the current page; the data of the element includes a type of the element; when the type of the element is an input type, performing simulation operation on the element of the input type through a second rendering layer code of the data acquisition program, acquiring input data corresponding to the input type, inputting the input data into the element, and checking the input data through a second logic layer code of the data acquisition program; when the input data is checked to pass, generating a jump instruction; jumping to the next page according to the jump instruction; and taking the next page as a new current page, and executing the step of acquiring the data of each element in the current page until each page in the second applet is traversed, and acquiring the data of each element in each page.
6. The apparatus of claim 5, wherein the code of the first applet comprises first render layer code and first logic layer code; the second applet generating module is further configured to inject the second rendering layer code of the data acquisition program into the first rendering layer code of the first applet to obtain a rendering layer code of a second applet; injecting the second logic layer code of the data acquisition program into the first logic layer code of the first applet to obtain a logic layer code of a second applet; the second applet is generated based on the rendering layer code of the second applet and the logic layer code of the second applet.
7. The apparatus of claim 6, wherein the page generation module is further configured to invoke a loading interface of the second applet, load rendering layer code of the second applet, and verify logic of the rendering layer code of the second applet with the logic layer code of the second applet; and when the logic verification of the rendering layer code of the second applet passes, generating a page corresponding to the rendering layer code of the second applet.
8. The apparatus of claim 5, further comprising a parsing module configured to parse the code of the first applet to obtain a loading interface of the first applet; the hijacking module is also used for monitoring an interface called by the first applet in the running process of the first applet; when the first applet calls the loading interface, hijacking the loading interface of the first applet.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 4.
CN202010216892.4A 2020-03-25 2020-03-25 Method, device, computer equipment and storage medium for acquiring data of applet Active CN111414525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010216892.4A CN111414525B (en) 2020-03-25 2020-03-25 Method, device, computer equipment and storage medium for acquiring data of applet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010216892.4A CN111414525B (en) 2020-03-25 2020-03-25 Method, device, computer equipment and storage medium for acquiring data of applet

Publications (2)

Publication Number Publication Date
CN111414525A CN111414525A (en) 2020-07-14
CN111414525B true CN111414525B (en) 2024-01-02

Family

ID=71491416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010216892.4A Active CN111414525B (en) 2020-03-25 2020-03-25 Method, device, computer equipment and storage medium for acquiring data of applet

Country Status (1)

Country Link
CN (1) CN111414525B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112162871A (en) * 2020-09-25 2021-01-01 同程网络科技股份有限公司 Method, device and storage medium for data exchange between applet and webview

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021257A (en) * 2015-12-31 2016-10-12 广州华多网络科技有限公司 Method, device, and system for crawler to capture data supporting online programming
US10083108B1 (en) * 2017-12-18 2018-09-25 Clover Network, Inc. Automated stack-based computerized application crawler
CN108833264A (en) * 2018-06-25 2018-11-16 厦门理工学院 Data acquisition management system, method and application based on wechat small routine
CN109710831A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of network crawler system based on browser plug-in
CN110263266A (en) * 2019-05-20 2019-09-20 江苏大学 A kind of method for exhibiting data based on wechat small routine and crawler
CN110347562A (en) * 2018-04-08 2019-10-18 腾讯科技(深圳)有限公司 Collecting method, device, computer-readable medium and intelligent terminal
CN110750255A (en) * 2019-09-25 2020-02-04 支付宝(杭州)信息技术有限公司 Applet rendering method and device
CN110837473A (en) * 2019-11-07 2020-02-25 腾讯科技(深圳)有限公司 Application program debugging method, device, terminal and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198931A1 (en) * 2001-04-30 2002-12-26 Murren Brian T. Architecture and process for presenting application content to clients
US8065667B2 (en) * 2007-03-20 2011-11-22 Yahoo! Inc. Injecting content into third party documents for document processing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021257A (en) * 2015-12-31 2016-10-12 广州华多网络科技有限公司 Method, device, and system for crawler to capture data supporting online programming
US10083108B1 (en) * 2017-12-18 2018-09-25 Clover Network, Inc. Automated stack-based computerized application crawler
CN110347562A (en) * 2018-04-08 2019-10-18 腾讯科技(深圳)有限公司 Collecting method, device, computer-readable medium and intelligent terminal
CN108833264A (en) * 2018-06-25 2018-11-16 厦门理工学院 Data acquisition management system, method and application based on wechat small routine
CN109710831A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of network crawler system based on browser plug-in
CN110263266A (en) * 2019-05-20 2019-09-20 江苏大学 A kind of method for exhibiting data based on wechat small routine and crawler
CN110750255A (en) * 2019-09-25 2020-02-04 支付宝(杭州)信息技术有限公司 Applet rendering method and device
CN110837473A (en) * 2019-11-07 2020-02-25 腾讯科技(深圳)有限公司 Application program debugging method, device, terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
操金金 ; 何贞铭 ; 冯梦琪 ; 张金星 ; 王丹媛 ; .基于微信小程序的地质资料展示系统的设计与实现.电脑与信息技术.(第01期),第28-30页. *

Also Published As

Publication number Publication date
CN111414525A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
US10846402B2 (en) Security scanning method and apparatus for mini program, and electronic device
CN108595329B (en) Application testing method and device and computer storage medium
CN109376078B (en) Mobile application testing method, terminal equipment and medium
CN105940654B (en) Franchise static web application in trust
US20150012924A1 (en) Method and Device for Loading a Plug-In
US9762598B1 (en) Automatic dynamic vetting of browser extensions and web applications
US9262311B1 (en) Network page test system and methods
CN111737692B (en) Application program risk detection method and device, equipment and storage medium
US9443077B1 (en) Flagging binaries that drop malicious browser extensions and web applications
CN110244984A (en) Applied program processing method, device, storage medium and computer equipment
CN104115117A (en) Automatic synthesis of unit tests for security testing
CN111831538A (en) Debugging method, device and storage medium
CN108418797B (en) Webpage access method and device, computer equipment and storage medium
CN108399119B (en) Method and device for data processing and automatic testing of browsing service kernel engine
Solomos et al. The dangers of human touch: fingerprinting browser extensions through user actions
CN111177623A (en) Information processing method and device
GB2511329A (en) Web service black box testing
US9571557B2 (en) Script caching method and information processing device utilizing the same
CN113326539B (en) Method, device and system for private data leakage detection aiming at applet
CN111414525B (en) Method, device, computer equipment and storage medium for acquiring data of applet
CN111563260B (en) Android application program-oriented Web injection code execution vulnerability detection method and system
CN112182561B (en) Rear door detection method and device, electronic equipment and medium
CN109492144B (en) Association relation analysis method, device and storage medium for software system
US9965744B1 (en) Automatic dynamic vetting of browser extensions and web applications
CN113672826B (en) Page jump method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant