CN112395347B - APP Wrapper construction method - Google Patents
APP Wrapper construction method Download PDFInfo
- Publication number
- CN112395347B CN112395347B CN202110051477.2A CN202110051477A CN112395347B CN 112395347 B CN112395347 B CN 112395347B CN 202110051477 A CN202110051477 A CN 202110051477A CN 112395347 B CN112395347 B CN 112395347B
- Authority
- CN
- China
- Prior art keywords
- data
- page
- app
- swipe
- activity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a construction method of an APP wrapper, which comprises the following steps: 1. opening the target activity, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromat 2 framework to simulate clicking and opening; 2. rolling processing, namely simulating a screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe; 3. establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one; 4. establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; through the process, the aim of extracting relevant information by using the android app-based data analysis mechanism is achieved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a construction method of an APP Wrapper.
Background
Along with the popularization of smart phones, data information generated by a mobile phone end is greatly changed, a large amount of data information is interactively used at the mobile phone end, the great significance is provided for the research of the large amount of database information, at present, a large amount of fields such as artificial intelligence, data mining, database and information retrieval are developed towards a mobile phone system, and the mining and extraction of a large amount of information through a convenient mobile phone system become a technical problem which is overcome by numerous technical personnel. In the past, in the aspect of information extraction, the wrapper for information extraction is mainly used for automatically extracting various fields from a webpage, for example, for a book website, the title and price of a book, the author and other fields are mainly extracted, then the wrapper is generally established based on xpath, and then the wrapper is automatically adjusted through some machine learning methods, so as to realize the collection of the data of the whole website. However, the traditional method is completely ineffective for APP acquisition data, so that whether a new APP acquisition method can be invented or not is provided, and the wrapper is constructed based on the interface XML of the APP, so that the APP data acquisition is realized.
Disclosure of Invention
In order to solve the existing problems, the invention provides a construction method of an APP Wrapper, which is characterized in that an APP is designed to open a target activity, the opened target activity is subjected to rolling processing, a data acquisition path rule is established during data extraction after processing, a data extraction rule is established after the data acquisition path rule is established, and a data analysis mechanism based on an android APP is established, so that the construction method of the APP Wrapper is provided, and the construction method is characterized by comprising the following specific steps:
a construction method of an APP Wrapper is characterized in that: the method comprises the following specific steps:
(1) opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
the imported attribute of the activity is true or the intent-filter is defined, and the intent is opened through the directly constructed intent outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
(2) simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
the current screen is refreshed by scrolling the page, the page opened by the page is required to be converted into an xml structure of an android, and the screen swipe is simulated to a target interface step by step through a heuristic algorithm to extract data from the target interface;
(3) establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
establishing a data acquisition path rule on a target page, realizing traversal acquisition on an acquired page by directly assembling url for the imported activity, and realizing acquisition by simulating and clicking each item on the app index page one by one in other types without stopping rolling;
(4) establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
As a further improvement of the invention, the data obtained by the data extraction rule has a duplicate removal function, and the problem that the old content cannot be accurately slid out by simulating the swipe is solved.
As a further improvement of the invention, the data extraction rule is generated manually, and a strong relation is not set for the variable data extraction part, so that the problem that the real-time xml of the screen can change at any time is solved.
The invention provides a construction method of an APP Wrapper, which is specifically designed as follows:
1) the construction method of the APP Wrapper of the application is designed to open the target activity through the APP, the exported attribute of the activity is true or the internal-filter is defined, and the target activity is opened through the internal directly constructed outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
2) the construction method of the APP Wrapper is designed to refresh the current screen when the target activity after the APP is opened rolls, convert the opened page into an android xml structure, simulate the screen swipe to the target interface step by step through a heuristic algorithm, and extract data from the screen swipe;
3) according to the construction method of the APP Wrapper, a data acquisition path rule is established when data are extracted, the data acquisition path rule is established on a target page, traversal acquisition of the acquired page is realized by directly assembling url for the activity providing exported, and other types of data are acquired by simulating and clicking each item on the APP index page one by one and continuously rolling;
4) the construction method of the APP Wrapper of the application is designed to build a data extraction rule after a data acquisition path rule is built, and the extraction rule is set into three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
Drawings
FIG. 1 is a schematic diagram of an open target activity of the present invention;
FIG. 2 is a schematic view of the scrolling process of the present invention;
FIG. 3 is a schematic diagram illustrating rules for establishing data collection paths according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the invention provides a construction method of an APP Wrapper, which comprises the steps of opening a target activity by designing an APP, opening by adopting a directly constructed intent or manually observing, searching a click path from an application home page to the target activity, finding a corresponding widget on each activity interface through a uiautomator 2 frame, and simulating to send a click command to open; when the opened target activity rolls, a heuristic algorithm is designed, the screen swipe is simulated to a target interface step by step, data is extracted from the target interface, a data acquisition path rule is established when the data is extracted, a data extraction rule is established after the data acquisition path rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; through the process, a data analysis mechanism based on the android app is established, so that the aim of extracting relevant information from the app is fulfilled.
The invention provides an APP Wrapper construction method as an embodiment of the invention, which is characterized by comprising the following specific steps:
step 1: opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
FIG. 1 is a schematic diagram of opening target activity, which is shown in the figure, and inputs a widget ID through an APP home page, opens the target activity through the widget ID, and opens through an external directly constructed intent if the exported attribute of the activity is true or defines intent-filter; if the exported attribute of the activity is false or the access authority is specified, manually observing, searching a click path from an application home page to the target activity to open the ID of the widget, finding a corresponding widget on each activity interface through the uiautoromator 2 frame to simulate and send a click instruction to open the activity, and realizing the purpose of obtaining the target activity by the APP through the process.
Step 2: simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
fig. 2 is a schematic view of the scrolling process, which shows that after the target activity is opened, when the interface is rolled and slid, the program performs xml change detection on the interface, returns to the scrolling 5step program after the change is detected, enters the acquisition path rule when the change does not occur, and ends the program. When the opened target activity is rolling, the current screen is refreshed, the opened page is changed into an android xml structure, the screen swipe is simulated to the target interface step by step through a heuristic algorithm, and data is extracted from the screen swipe.
And step 3: establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
FIG. 3 is a schematic diagram illustrating the establishment of rules for data acquisition paths, where the program acquires the latest XML page from the screen at the beginning, establishes the rules for data acquisition paths on the target page, and implements traversal acquisition of the acquisition page by directly assembling url for the activity providing exported; acquiring detail click buttons on the latest XML page in other types, after the buttons are acquired, clicking each item on the app index page in a simulation mode one by one, acquiring data, returning values after acquisition is completed, judging whether the page is the bottommost page through an algorithm after acquisition, entering rolling 5step for acquisition, and returning to a program for acquiring the detail click buttons if the page is not the bottommost page, and performing simulated click on the buttons in the page; the page can judge whether the page is finished or not in the process of entering rolling 5step acquisition, if the page is not finished, the page is continuously refreshed, the refreshed page is continuously detected in the page, if the page is finished, the path of the page is completed, a data acquisition path rule is established to judge and analyze the latest XML acquired by the screen, if a next-level page acquisition button link exists, each item on the app index page is clicked in a simulated mode one by one to carry out the bottommost XML webpage on each acquisition path, and therefore full-coverage acquisition of the page is achieved.
And 4, step 4: establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swap, the data obtained by the data extraction rule has a duplicate removal function, and the problem that the old content cannot be accurately slid out by the simulation of the swap is solved. The data extraction rule is generated manually, and a strong relation is not set for a variable data extraction part, so that the problem that the real-time xml of the screen can change at any time is solved. A data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.
Claims (2)
1. A construction method of an APP Wrapper is characterized in that: the method comprises the following specific steps:
(1) opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
the imported attribute of the activity is true or the intent-filter is defined, and the intent is opened through the directly constructed intent outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
(2) simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
the current screen is refreshed by scrolling the page, the page opened by the page is required to be converted into an xml structure of an android, and the screen swipe is simulated to a target interface step by step through a heuristic algorithm to extract data from the target interface;
(3) establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
establishing a data acquisition path rule on a target page, realizing traversal acquisition on an acquired page by directly assembling url for the imported activity, and realizing acquisition by simulating and clicking each item on the app index page one by one in other types without stopping rolling;
(4) establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
2. The method of claim 1, wherein the method comprises the following steps: and the data obtained by the data extraction rule has a duplicate removal function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051477.2A CN112395347B (en) | 2021-01-15 | 2021-01-15 | APP Wrapper construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051477.2A CN112395347B (en) | 2021-01-15 | 2021-01-15 | APP Wrapper construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112395347A CN112395347A (en) | 2021-02-23 |
CN112395347B true CN112395347B (en) | 2021-04-09 |
Family
ID=74624893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110051477.2A Active CN112395347B (en) | 2021-01-15 | 2021-01-15 | APP Wrapper construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112395347B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115292571B (en) * | 2022-08-08 | 2023-03-28 | 烟台中科网络技术研究所 | App data acquisition method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101078987A (en) * | 2006-05-24 | 2007-11-28 | 国际商业机器公司 | Method and system for establishing customizable wrappers for web applications |
CN110865851A (en) * | 2019-11-18 | 2020-03-06 | 中国民航信息网络股份有限公司 | Automatic Android application data acquisition method and system |
CN112035112A (en) * | 2020-09-02 | 2020-12-04 | 北京思明启创科技有限公司 | Application program development method, system, medium and electronic device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170060891A1 (en) * | 2015-08-26 | 2017-03-02 | Quixey, Inc. | File-Type-Dependent Query System |
-
2021
- 2021-01-15 CN CN202110051477.2A patent/CN112395347B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101078987A (en) * | 2006-05-24 | 2007-11-28 | 国际商业机器公司 | Method and system for establishing customizable wrappers for web applications |
CN110865851A (en) * | 2019-11-18 | 2020-03-06 | 中国民航信息网络股份有限公司 | Automatic Android application data acquisition method and system |
CN112035112A (en) * | 2020-09-02 | 2020-12-04 | 北京思明启创科技有限公司 | Application program development method, system, medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN112395347A (en) | 2021-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8874542B2 (en) | Displaying browse sequence with search results | |
CN105117474B (en) | The method and apparatus of recommendation information load are carried out in the reading model of webpage | |
CN103942212B (en) | The character detecting method and device of a kind of user interface | |
CN109190049B (en) | Keyword recommendation method, system, electronic device and computer readable medium | |
CN107133345A (en) | Exchange method and device based on artificial intelligence | |
CN102035883B (en) | Method and device for optimizing webpage in network equipment | |
CN105138558B (en) | The real time individual information collecting method of content is accessed based on user | |
WO2019024755A1 (en) | Webpage information extraction method, apparatus and system, and electronic device | |
CN107562939A (en) | Vertical domain news recommendation method and device and readable storage medium | |
CN107368550A (en) | Information acquisition method, device, medium, electronic equipment, server and system | |
Xiang et al. | Web page segmentation based on gestalt theory | |
CN112395347B (en) | APP Wrapper construction method | |
CN104881428A (en) | Information graph extracting and retrieving method and device for information graph webpages | |
CN108959204A (en) | Internet monetary items information extraction method and system | |
CN113407678B (en) | Knowledge graph construction method, device and equipment | |
Bako et al. | Streamlining Visualization Authoring in D3 Through User-Driven Templates | |
CN114064913A (en) | Knowledge graph-based document retrieval method and system | |
CN116663495B (en) | Text standardization processing method, device, equipment and medium | |
CN104268246B (en) | Generation accesses the method and access method and device of internet sites command script | |
CN117371950A (en) | Robot flow automation method, device, all-in-one machine and storage medium | |
CN103810243A (en) | Innovative hotspot pre-warning recognition system and method | |
Xu et al. | Estimating similarity of rich internet pages using visual information | |
JP5380874B2 (en) | Information retrieval method, program and apparatus | |
CN110543468A (en) | Automatic construction method for big data knowledge base in public security field | |
CN106446198A (en) | Recommending method and device of news based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |