CN112395347B - APP Wrapper construction method - Google Patents

APP Wrapper construction method Download PDF

Info

Publication number
CN112395347B
CN112395347B CN202110051477.2A CN202110051477A CN112395347B CN 112395347 B CN112395347 B CN 112395347B CN 202110051477 A CN202110051477 A CN 202110051477A CN 112395347 B CN112395347 B CN 112395347B
Authority
CN
China
Prior art keywords
data
page
app
swipe
activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110051477.2A
Other languages
Chinese (zh)
Other versions
CN112395347A (en
Inventor
邹睿泓
桂文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinling Institute of Technology
Original Assignee
Jinling Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinling Institute of Technology filed Critical Jinling Institute of Technology
Priority to CN202110051477.2A priority Critical patent/CN112395347B/en
Publication of CN112395347A publication Critical patent/CN112395347A/en
Application granted granted Critical
Publication of CN112395347B publication Critical patent/CN112395347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a construction method of an APP wrapper, which comprises the following steps: 1. opening the target activity, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromat 2 framework to simulate clicking and opening; 2. rolling processing, namely simulating a screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe; 3. establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one; 4. establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; through the process, the aim of extracting relevant information by using the android app-based data analysis mechanism is achieved.

Description

APP Wrapper construction method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a construction method of an APP Wrapper.
Background
Along with the popularization of smart phones, data information generated by a mobile phone end is greatly changed, a large amount of data information is interactively used at the mobile phone end, the great significance is provided for the research of the large amount of database information, at present, a large amount of fields such as artificial intelligence, data mining, database and information retrieval are developed towards a mobile phone system, and the mining and extraction of a large amount of information through a convenient mobile phone system become a technical problem which is overcome by numerous technical personnel. In the past, in the aspect of information extraction, the wrapper for information extraction is mainly used for automatically extracting various fields from a webpage, for example, for a book website, the title and price of a book, the author and other fields are mainly extracted, then the wrapper is generally established based on xpath, and then the wrapper is automatically adjusted through some machine learning methods, so as to realize the collection of the data of the whole website. However, the traditional method is completely ineffective for APP acquisition data, so that whether a new APP acquisition method can be invented or not is provided, and the wrapper is constructed based on the interface XML of the APP, so that the APP data acquisition is realized.
Disclosure of Invention
In order to solve the existing problems, the invention provides a construction method of an APP Wrapper, which is characterized in that an APP is designed to open a target activity, the opened target activity is subjected to rolling processing, a data acquisition path rule is established during data extraction after processing, a data extraction rule is established after the data acquisition path rule is established, and a data analysis mechanism based on an android APP is established, so that the construction method of the APP Wrapper is provided, and the construction method is characterized by comprising the following specific steps:
a construction method of an APP Wrapper is characterized in that: the method comprises the following specific steps:
(1) opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
the imported attribute of the activity is true or the intent-filter is defined, and the intent is opened through the directly constructed intent outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
(2) simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
the current screen is refreshed by scrolling the page, the page opened by the page is required to be converted into an xml structure of an android, and the screen swipe is simulated to a target interface step by step through a heuristic algorithm to extract data from the target interface;
(3) establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
establishing a data acquisition path rule on a target page, realizing traversal acquisition on an acquired page by directly assembling url for the imported activity, and realizing acquisition by simulating and clicking each item on the app index page one by one in other types without stopping rolling;
(4) establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
As a further improvement of the invention, the data obtained by the data extraction rule has a duplicate removal function, and the problem that the old content cannot be accurately slid out by simulating the swipe is solved.
As a further improvement of the invention, the data extraction rule is generated manually, and a strong relation is not set for the variable data extraction part, so that the problem that the real-time xml of the screen can change at any time is solved.
The invention provides a construction method of an APP Wrapper, which is specifically designed as follows:
1) the construction method of the APP Wrapper of the application is designed to open the target activity through the APP, the exported attribute of the activity is true or the internal-filter is defined, and the target activity is opened through the internal directly constructed outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
2) the construction method of the APP Wrapper is designed to refresh the current screen when the target activity after the APP is opened rolls, convert the opened page into an android xml structure, simulate the screen swipe to the target interface step by step through a heuristic algorithm, and extract data from the screen swipe;
3) according to the construction method of the APP Wrapper, a data acquisition path rule is established when data are extracted, the data acquisition path rule is established on a target page, traversal acquisition of the acquired page is realized by directly assembling url for the activity providing exported, and other types of data are acquired by simulating and clicking each item on the APP index page one by one and continuously rolling;
4) the construction method of the APP Wrapper of the application is designed to build a data extraction rule after a data acquisition path rule is built, and the extraction rule is set into three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
Drawings
FIG. 1 is a schematic diagram of an open target activity of the present invention;
FIG. 2 is a schematic view of the scrolling process of the present invention;
FIG. 3 is a schematic diagram illustrating rules for establishing data collection paths according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the invention provides a construction method of an APP Wrapper, which comprises the steps of opening a target activity by designing an APP, opening by adopting a directly constructed intent or manually observing, searching a click path from an application home page to the target activity, finding a corresponding widget on each activity interface through a uiautomator 2 frame, and simulating to send a click command to open; when the opened target activity rolls, a heuristic algorithm is designed, the screen swipe is simulated to a target interface step by step, data is extracted from the target interface, a data acquisition path rule is established when the data is extracted, a data extraction rule is established after the data acquisition path rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; through the process, a data analysis mechanism based on the android app is established, so that the aim of extracting relevant information from the app is fulfilled.
The invention provides an APP Wrapper construction method as an embodiment of the invention, which is characterized by comprising the following specific steps:
step 1: opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
FIG. 1 is a schematic diagram of opening target activity, which is shown in the figure, and inputs a widget ID through an APP home page, opens the target activity through the widget ID, and opens through an external directly constructed intent if the exported attribute of the activity is true or defines intent-filter; if the exported attribute of the activity is false or the access authority is specified, manually observing, searching a click path from an application home page to the target activity to open the ID of the widget, finding a corresponding widget on each activity interface through the uiautoromator 2 frame to simulate and send a click instruction to open the activity, and realizing the purpose of obtaining the target activity by the APP through the process.
Step 2: simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
fig. 2 is a schematic view of the scrolling process, which shows that after the target activity is opened, when the interface is rolled and slid, the program performs xml change detection on the interface, returns to the scrolling 5step program after the change is detected, enters the acquisition path rule when the change does not occur, and ends the program. When the opened target activity is rolling, the current screen is refreshed, the opened page is changed into an android xml structure, the screen swipe is simulated to the target interface step by step through a heuristic algorithm, and data is extracted from the screen swipe.
And step 3: establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
FIG. 3 is a schematic diagram illustrating the establishment of rules for data acquisition paths, where the program acquires the latest XML page from the screen at the beginning, establishes the rules for data acquisition paths on the target page, and implements traversal acquisition of the acquisition page by directly assembling url for the activity providing exported; acquiring detail click buttons on the latest XML page in other types, after the buttons are acquired, clicking each item on the app index page in a simulation mode one by one, acquiring data, returning values after acquisition is completed, judging whether the page is the bottommost page through an algorithm after acquisition, entering rolling 5step for acquisition, and returning to a program for acquiring the detail click buttons if the page is not the bottommost page, and performing simulated click on the buttons in the page; the page can judge whether the page is finished or not in the process of entering rolling 5step acquisition, if the page is not finished, the page is continuously refreshed, the refreshed page is continuously detected in the page, if the page is finished, the path of the page is completed, a data acquisition path rule is established to judge and analyze the latest XML acquired by the screen, if a next-level page acquisition button link exists, each item on the app index page is clicked in a simulated mode one by one to carry out the bottommost XML webpage on each acquisition path, and therefore full-coverage acquisition of the page is achieved.
And 4, step 4: establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swap, the data obtained by the data extraction rule has a duplicate removal function, and the problem that the old content cannot be accurately slid out by the simulation of the swap is solved. The data extraction rule is generated manually, and a strong relation is not set for a variable data extraction part, so that the problem that the real-time xml of the screen can change at any time is solved. A data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims (2)

1. A construction method of an APP Wrapper is characterized in that: the method comprises the following specific steps:
(1) opening the target activity through the APP, opening through an intent directly constructed from the outside or finding a corresponding widget on each activity interface through a uiautoromator 2 frame to simulate clicking and opening;
the imported attribute of the activity is true or the intent-filter is defined, and the intent is opened through the directly constructed intent outside; the exported attribute of the activity is false or the access authority is regulated, a click path from an application home page to a target activity is searched through manual observation, and a corresponding widget is found on each activity interface through a ui automation 2 frame to simulate and send a click instruction to open;
(2) simulating to transfer the screen swipe to a target interface through a heuristic algorithm, and extracting data from the screen swipe;
the current screen is refreshed by scrolling the page, the page opened by the page is required to be converted into an xml structure of an android, and the screen swipe is simulated to a target interface step by step through a heuristic algorithm to extract data from the target interface;
(3) establishing a data acquisition path rule, and realizing traversal acquisition of a page by directly assembling url or clicking each item on the app index page by simulating one by one;
establishing a data acquisition path rule on a target page, realizing traversal acquisition on an acquired page by directly assembling url for the imported activity, and realizing acquisition by simulating and clicking each item on the app index page one by one in other types without stopping rolling;
(4) establishing a data extraction rule, wherein the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen;
after a data acquisition path rule is established, a data extraction rule is established, and the extraction rule is set as three parts: step number of the scrolled swipe, frame Xpath and data Xpath with fixed screen; step number of the Swipe describes how many steps of the simulated sliding can reach the analysis target, and analysis is started; the screen-invariant framework XPath describes the invariant rule part; the data xpath is a final rule and is mainly used for analyzing data, the data is a changing part which changes in real-time xml along with the simulation of the swipe, and a data analysis mechanism based on the android app is established through the steps, so that the aim of extracting relevant information from the app is fulfilled.
2. The method of claim 1, wherein the method comprises the following steps: and the data obtained by the data extraction rule has a duplicate removal function.
CN202110051477.2A 2021-01-15 2021-01-15 APP Wrapper construction method Active CN112395347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110051477.2A CN112395347B (en) 2021-01-15 2021-01-15 APP Wrapper construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110051477.2A CN112395347B (en) 2021-01-15 2021-01-15 APP Wrapper construction method

Publications (2)

Publication Number Publication Date
CN112395347A CN112395347A (en) 2021-02-23
CN112395347B true CN112395347B (en) 2021-04-09

Family

ID=74624893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110051477.2A Active CN112395347B (en) 2021-01-15 2021-01-15 APP Wrapper construction method

Country Status (1)

Country Link
CN (1) CN112395347B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292571B (en) * 2022-08-08 2023-03-28 烟台中科网络技术研究所 App data acquisition method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101078987A (en) * 2006-05-24 2007-11-28 国际商业机器公司 Method and system for establishing customizable wrappers for web applications
CN110865851A (en) * 2019-11-18 2020-03-06 中国民航信息网络股份有限公司 Automatic Android application data acquisition method and system
CN112035112A (en) * 2020-09-02 2020-12-04 北京思明启创科技有限公司 Application program development method, system, medium and electronic device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060891A1 (en) * 2015-08-26 2017-03-02 Quixey, Inc. File-Type-Dependent Query System

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101078987A (en) * 2006-05-24 2007-11-28 国际商业机器公司 Method and system for establishing customizable wrappers for web applications
CN110865851A (en) * 2019-11-18 2020-03-06 中国民航信息网络股份有限公司 Automatic Android application data acquisition method and system
CN112035112A (en) * 2020-09-02 2020-12-04 北京思明启创科技有限公司 Application program development method, system, medium and electronic device

Also Published As

Publication number Publication date
CN112395347A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
US8874542B2 (en) Displaying browse sequence with search results
CN105117474B (en) The method and apparatus of recommendation information load are carried out in the reading model of webpage
CN103942212B (en) The character detecting method and device of a kind of user interface
CN109190049B (en) Keyword recommendation method, system, electronic device and computer readable medium
CN107133345A (en) Exchange method and device based on artificial intelligence
CN102035883B (en) Method and device for optimizing webpage in network equipment
CN105138558B (en) The real time individual information collecting method of content is accessed based on user
WO2019024755A1 (en) Webpage information extraction method, apparatus and system, and electronic device
CN107562939A (en) Vertical domain news recommendation method and device and readable storage medium
CN107368550A (en) Information acquisition method, device, medium, electronic equipment, server and system
Xiang et al. Web page segmentation based on gestalt theory
CN112395347B (en) APP Wrapper construction method
CN104881428A (en) Information graph extracting and retrieving method and device for information graph webpages
CN108959204A (en) Internet monetary items information extraction method and system
CN113407678B (en) Knowledge graph construction method, device and equipment
Bako et al. Streamlining Visualization Authoring in D3 Through User-Driven Templates
CN114064913A (en) Knowledge graph-based document retrieval method and system
CN116663495B (en) Text standardization processing method, device, equipment and medium
CN104268246B (en) Generation accesses the method and access method and device of internet sites command script
CN117371950A (en) Robot flow automation method, device, all-in-one machine and storage medium
CN103810243A (en) Innovative hotspot pre-warning recognition system and method
Xu et al. Estimating similarity of rich internet pages using visual information
JP5380874B2 (en) Information retrieval method, program and apparatus
CN110543468A (en) Automatic construction method for big data knowledge base in public security field
CN106446198A (en) Recommending method and device of news based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant