WO2020155765A1 - 移动终端爬取数据的方法、装置、移动终端和存储介质 - Google Patents

移动终端爬取数据的方法、装置、移动终端和存储介质 Download PDF

Info

Publication number
WO2020155765A1
WO2020155765A1 PCT/CN2019/118169 CN2019118169W WO2020155765A1 WO 2020155765 A1 WO2020155765 A1 WO 2020155765A1 CN 2019118169 W CN2019118169 W CN 2019118169W WO 2020155765 A1 WO2020155765 A1 WO 2020155765A1
Authority
WO
WIPO (PCT)
Prior art keywords
mobile terminal
data
server
operation program
target data
Prior art date
Application number
PCT/CN2019/118169
Other languages
English (en)
French (fr)
Inventor
吴壮伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155765A1 publication Critical patent/WO2020155765A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Definitions

  • This application relates to the field of data crawling, and in particular to a method, device, mobile terminal and storage medium for a mobile terminal to crawl data.
  • Web crawlers are programs or scripts that automatically crawl World Wide Web information according to certain rules. They are widely used in Internet search engines or other similar websites, and can automatically collect all the page content they can access. , In order to obtain or update the content and retrieval methods of these websites. In terms of function, crawlers are generally divided into three parts: data collection, processing, and storage.
  • the traditional crawler starts from the URL of one or several initial webpages and obtains the URL on the initial webpage. In the process of crawling the webpage, it continuously extracts new URLs from the current page and puts them in the queue until a certain stopping condition of the system is met.
  • the workflow of focused crawlers is more complicated. It is necessary to filter links that have nothing to do with the subject according to a certain web analysis algorithm, keep useful links and put them in the URL queue waiting to be crawled. Then, it will select the URL of the next web page to be crawled from the queue according to a certain search strategy, and repeat the above process until it stops when a certain condition of the system is reached.
  • all web pages crawled by crawlers will be stored by the system for certain analysis, filtering, and indexing for later query and retrieval; for focused crawlers, the analysis results obtained in this process are still possible Give feedback and guidance on the subsequent crawling process.
  • the lack of a method for crawling data from a mobile terminal in the prior art is mainly because the mobile terminal data has its own encryption function, so it is difficult to obtain it. How to crawl the data of the mobile terminal is a problem that needs to be solved.
  • the main purpose of this application is to provide a method, device, mobile terminal and storage medium for a mobile terminal to crawl data, aiming to solve the problem that it is difficult to crawl data of a mobile terminal with its own encryption function.
  • this application proposes a method for crawling data from a mobile terminal, in which a crawler application for crawling data from the mobile terminal is installed, and the method includes:
  • the first mobile terminal acquires a simulation operation program, where the simulation operation program is a program that simulates a user to control the operation of the mobile terminal;
  • the present application also provides a device for crawling data by a mobile terminal, wherein a crawler application program for crawling mobile terminal data is installed in the mobile terminal, and the device includes:
  • the obtaining unit is configured to obtain a simulation operation program by the first mobile terminal, wherein the simulation operation program is a program for simulating a user to control the operation of the mobile terminal;
  • An acquiring storage unit configured to acquire response data when the simulation operation program is executed through the Fiddler tool, and save the response data in a designated database;
  • the sending unit is used to send the target data to a designated server.
  • the present application also provides a mobile terminal, including a memory and a processor, the memory stores computer readable instructions, and the processor implements the steps of any one of the above methods when the computer readable instructions are executed.
  • the present application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of any one of the methods described above are implemented.
  • a crawler application for crawling data is installed on each mobile terminal, and related data of the mobile terminal is crawled from inside the mobile terminal.
  • Fast speed no need to consider the encryption function of the mobile terminal, it is easier to obtain the data you want; the mobile terminal can be set up in a distributed manner, and the scalability is stronger; you can configure different simulation operation programs to obtain data of different heights, so that There are more scenarios for its use.
  • FIG. 1 is a schematic flowchart of a method for crawling data by a mobile terminal according to an embodiment of this application;
  • FIG. 2 is a schematic block diagram of the structure of an apparatus for data crawling by a mobile terminal according to an embodiment of the application;
  • FIG. 3 is a schematic block diagram of the structure of a mobile terminal according to an embodiment of the application.
  • an embodiment of the present application provides a method for crawling data by a mobile terminal, and a crawler application program for crawling data of the mobile terminal is installed in the mobile terminal.
  • the aforementioned crawler application refers to applications that can be run on the mobile terminal, such as a crawler APP installed on a mobile terminal.
  • the above method of crawling data includes the steps:
  • the first mobile terminal acquires a simulation operation program, where the simulation operation program is a program that simulates a user to control the operation of the mobile terminal;
  • the above mobile terminal is a smart mobile device, including a smart phone, a tablet computer, etc.; the first is only to distinguish it from other mobile terminals that appear later.
  • the above-mentioned simulation operation program refers to a program used to simulate a user to control the operation of the mobile terminal.
  • the simulation operation program can simulate the operation of the user in the designated operation process of the mobile phone.
  • the first step is to open program A
  • the second step is to control program A to send B request and other preset steps.
  • Fiddler is an http protocol debugging proxy tool, which can record and check the http communication between all users’ computers and the Internet, set breakpoints, and view all "in and out" Fiddler data (referring to cookie , html, js, css and other files).
  • Fiddler is simpler than other network debuggers, because it not only exposes http communication, but also provides a user-friendly format.
  • the simulation operation program is executed, various request data will be sent out, and response data for various request data will be received, for example, request data for various websites, request data for different apps installed by mobile terminals, etc. Each website and each APP will feed back corresponding response data according to the request data. These response data will be obtained by the Fiddler tool and then stored in the designated database for later use.
  • the above target data is the data extracted according to the specified requirements. Specifically, extract usage log data of a certain APP, extract data related to geographic location in the database, and so on.
  • the above designated server is a server for receiving the target data extracted by the mobile terminal.
  • the server is generally docked with multiple mobile terminals, and the multiple mobile terminals are respectively installed with crawler applications, so that data can be crawled separately through multiple mobile phones to obtain more data. For example, it is necessary to conduct a data survey on a certain APP. Because different people have different habits of using the APP, the data generated about the APP is different, so the data of the APP on multiple mobile phones is crawled out and then aggregated Go to the above server to analyze the APP and so on better and more comprehensively.
  • step S1 of the first mobile terminal obtaining the preset simulation operation program includes:
  • the first mobile terminal receives the connection request of the server
  • the above-mentioned simulation operation program is pushed by the server to the mobile terminal.
  • a network connection is first established with the first mobile terminal.
  • the mobile terminal has an encryption function, so the mobile terminal user needs to take the initiative Control "allow server connection request" to receive the simulated operation program pushed by the server.
  • the method before the step of receiving the simulated operation program pushed by the server, the method includes:
  • a system identification code is sent to the server, wherein the system identification code is used by the server to identify the simulation operation program corresponding to the system installed on the mobile terminal.
  • the systems currently installed on the mobile terminal include several camps, such as Android system, Apple system, Symbian system, Microsoft system, etc.
  • the corresponding version of the simulation operating program needs to be installed.
  • the same functional content is pre-stored in the server, but the simulation operation program for multiple versions of different mobile phone systems.
  • the system identification code uploaded by the mobile terminal is obtained, the system installed on the mobile terminal is identified, and then the mobile terminal is called The simulation operation program of the corresponding version of the installed system is pushed to the first mobile terminal, which can ensure that the first mobile terminal can run the simulation operation program normally.
  • step S2 of executing the simulation operation program the method includes:
  • the simulation operation program is modified by using the modified parameter.
  • the crawler application installed by each mobile terminal is the same. If the simulated operation program is the same, then the crawled data is basically the same, only the data generated by the mobile terminal running various APPs There are differences, so it will reduce the efficiency of crawling data. Therefore, before executing the simulation operation program, the server randomly generates some parameters and sends it to the mobile terminal to modify the parameters of the simulation operation program, thereby making the simulation operation program of each mobile terminal When changes occur, different data can be crawled.
  • the method before the step of receiving the modified parameter sent by the server, the method includes:
  • the above-mentioned APP status information includes how many APPs are installed in the first mobile terminal, which APPs are running, and the historical usage frequency of each APP, etc., which will affect the selection of parameters. For example, if there are 5 APPs currently running, the parameters can be the parameters of cyclic access to these 5 APPs; for another example, if the usage frequency of the APP installed on the first mobile terminal is different, then the parameters can be divided into different frequencies to access different APPs And so on; if an APP is not updated or used, the frequency of accessing the APP can be lower, etc., to obtain data in a targeted manner and reduce unnecessary access.
  • step S5 of sending the target data to the designated server includes:
  • the first near-field interaction signal is received, acquiring the first data amount of target data in the second mobile terminal that sent the first near-field interaction signal;
  • the first mobile terminal sends the target data in the machine together with the received target data to the server.
  • a second mobile terminal that also has a crawler application is found, and then the target data with a small amount of target data is transmitted to the mobile terminal with a large amount of target data through the near-field interaction. , And then uniformly send the target data to the server by the mobile terminal with a large amount of target data.
  • the process of searching for the second mobile terminal is to determine whether the crawler application identifier in the first interaction signal is the same as the crawler application identifier installed in the first mobile terminal. If they are the same, it means that the first mobile terminal and the second mobile terminal are the same. The same crawler application is installed on the terminal.
  • the crawler application identifier is the unique identifier of the crawler application, such as the unique package name of the crawler application.
  • the integration of target data is performed on a mobile terminal and then uniformly sent to the server, which can save the amount of interfaces for the server to receive data.
  • step S5 of sending the target data to the designated server includes:
  • the first near-field interaction signal with the crawler application identifier is received, acquiring the network environment of the second mobile terminal that sent the first near-field interaction signal and the network environment of the first mobile terminal;
  • the target data of the first mobile terminal is sent to the second mobile terminal through the near-field interaction, and the second mobile terminal The terminal sends the target data of the two mobile terminals to the server together;
  • the first mobile terminal obtains the target data sent by the second mobile terminal through the near-field interaction, and passes it through the first mobile terminal Sending the target data of the two mobile terminals to the server together;
  • the first mobile terminal and the second mobile terminal are not in the wifi environment, compare the signal quality of the two mobile terminals, and send the target data in the mobile terminal with poor signal quality to the good quality by means of near-field interaction.
  • the target data in the two mobile terminals are uploaded to the server through the mobile terminal with good signal quality.
  • the mobile terminal in the wifi environment is first selected to upload the target data, which can save traffic costs; if none is in the wifi environment, a mobile terminal with good signal quality is selected to upload the target data.
  • the free traffic of the two mobile terminals can be analyzed, and then the mobile terminal with more free traffic can be used to upload target data, etc.
  • a crawler application for crawling data is installed on each mobile terminal, and related data of the mobile terminal is crawled from the inside of the mobile terminal.
  • the data crawling speed is fast;
  • the terminal's own encryption function makes it easier to obtain the desired data;
  • the mobile terminal can be set up in a distributed manner, which is more extensible; you can configure different simulation operation programs to obtain data of different heights, so that it can be used in more scenarios .
  • an embodiment of the present application provides an apparatus for crawling data by a mobile terminal, and a crawler application program for crawling data of the mobile terminal is installed in the mobile terminal.
  • the aforementioned crawler application refers to applications that can be run on the mobile terminal, such as a crawler APP installed on a mobile terminal.
  • the above device for crawling data includes the steps:
  • the obtaining unit 10 is configured to obtain a simulation operation program by the first mobile terminal, where the simulation operation program is a program for simulating a user to control the operation of the mobile terminal;
  • the execution unit 20 is configured to execute the simulation operation program
  • the acquiring storage unit 30 is configured to acquire response data when the simulation operation program is executed through the Fiddler tool, and save the response data in a designated database;
  • the extracting unit 40 is configured to extract the target data specified and required from the database
  • the sending unit 50 is configured to send the target data to a designated server.
  • the above-mentioned mobile terminal is a smart mobile device, including a smart phone, a tablet computer, etc.; the first is only to distinguish it from other mobile terminals that appear later.
  • the above-mentioned simulation operation program refers to a program used to simulate a user to control the operation of the mobile terminal.
  • the simulation operation program can simulate the operation of the user in the designated operation process of the mobile phone. For example, the first step is to open program A, and the second step is to control program A to send B request and other preset steps.
  • the execution unit 20 it is a device that runs the above-mentioned simulation operation program.
  • Fiddler is an http protocol debugging proxy tool, which can record and check the http communication between all users’ computers and the Internet, set breakpoints, and view all "in and out" Fiddler data (referring to cookie , html, js, css and other files). Fiddler is simpler than other network debuggers, because it not only exposes http communication, but also provides a user-friendly format.
  • the simulation operation program When the simulation operation program is executed, various request data will be sent out, and response data for various request data will be received, for example, request data for various websites, request data for different apps installed by mobile terminals, etc. Each website and each APP will feed back corresponding response data according to the request data. These response data will be obtained by the Fiddler tool and then stored in the designated database for later use.
  • the above target data is the data extracted according to the specified requirements. Specifically, extract usage log data of a certain APP, extract data related to geographic location in the database, and so on.
  • the aforementioned designated server is a server for receiving target data extracted by the mobile terminal.
  • the server is generally docked with multiple mobile terminals, and the multiple mobile terminals are respectively installed with crawler applications, so that data can be crawled separately through multiple mobile phones to obtain more data. For example, it is necessary to conduct a data survey on a certain APP, because different people have different habits of using the APP, so the data generated about the APP is different, so the data of the APP on multiple mobile phones is crawled out, and then aggregated Go to the above server to analyze the APP and so on better and more comprehensively.
  • the above-mentioned obtaining unit 10 includes:
  • the first receiving module is configured to receive a connection request from the server
  • An establishment module which is used to establish a network connection with the server if a command to allow a connection request input by the user is received;
  • the second receiving module is configured to receive the simulation operation program pushed by the server.
  • the above-mentioned simulation operation program is pushed by the server to the mobile terminal.
  • a network connection is first established with the first mobile terminal.
  • the mobile terminal has an encryption function, so the mobile terminal user needs to take the initiative Control "allow server connection request" to receive the simulated operation program pushed by the server.
  • the above-mentioned obtaining unit 10 further includes:
  • the sending identification code module is configured to send a system identification code to the server after establishing a connection with the server, wherein the system identification code is used for the server to identify the simulation operation program corresponding to the system installed on the mobile terminal.
  • the systems currently installed on the mobile terminal include several camps, such as Android system, Apple system, Symbian system, Microsoft system, etc.
  • the corresponding version of the simulation operating program needs to be installed.
  • the same functional content is pre-stored in the server, but the simulation operation program for multiple versions of different mobile phone systems.
  • the system identification code uploaded by the mobile terminal is obtained, the system installed on the mobile terminal is identified, and then the mobile terminal is called The simulation operation program of the corresponding version of the installed system is pushed to the first mobile terminal, which can ensure that the first mobile terminal can run the simulation operation program normally.
  • the above device for crawling data by a mobile terminal further includes:
  • a receiving unit configured to receive the modified parameter sent by the server
  • the modification unit is used to modify the simulation operation program by using the modification parameters.
  • the crawler application installed by each mobile terminal is the same. If the simulation operation program is the same, then the crawled data is basically the same, only the data generated by the mobile terminal running various APPs There are differences, so it will reduce the efficiency of crawling data. Therefore, before executing the simulation operation program, the server can randomly generate some parameters and send it to the mobile terminal to modify the parameters of the simulation operation program, so as to make the simulation operation of each mobile terminal The program changes, and then different data can be crawled.
  • the above device for crawling data by a mobile terminal further includes:
  • the sending status unit is configured to send the status information of the current APP of the first mobile terminal to the server, wherein the server calls the corresponding modification parameters according to the status information of the current APP of the first mobile terminal.
  • the above-mentioned APP status information includes how many APPs are installed in the first mobile terminal, which APPs are running, and the historical usage frequency of each APP, etc., which will affect the selection of parameters. For example, if there are 5 APPs currently running, the parameters can be the parameters of cyclic access to these 5 APPs; for another example, if the usage frequency of the APP installed on the first mobile terminal is different, then the parameters can be divided into different frequencies to access different APPs And so on; if an APP is not updated or used, the frequency of accessing the APP can be lower, etc., to obtain data in a targeted manner and reduce unnecessary access.
  • the foregoing sending unit 50 includes:
  • the first judgment module is configured to judge whether the first near-field interaction signal with the crawler application identifier is received
  • the first acquisition module is configured to, if the first near-field interaction signal is received, acquire the first data amount of target data in the second mobile terminal that sends the first near-field interaction signal;
  • the second judgment module is configured to judge whether the second data amount of the target data in the first mobile terminal is greater than the first data amount
  • a near-field receiving module configured to receive target data in the second mobile terminal in a near-field interactive manner if the second data amount is greater than the first data amount
  • the integrated sending module is used for the first mobile terminal to send the target data in the machine and the received target data to the server together.
  • a second mobile terminal that also has a crawler application is found, and then the target data with a small amount of target data is transmitted to the mobile terminal with a large amount of target data through the near-field interaction. , And then uniformly send the target data to the server by the mobile terminal with a large amount of target data.
  • the process of searching for the second mobile terminal is to determine whether the crawler application identifier in the first interaction signal is the same as the crawler application identifier installed in the first mobile terminal. If they are the same, it means that the first mobile terminal and the second mobile terminal are the same. The same crawler application is installed on the terminal.
  • the crawler application identifier is the unique identifier of the crawler application, such as the unique package name of the crawler application.
  • the integration of target data is performed on a mobile terminal and then uniformly sent to the server, which can save the amount of interfaces for the server to receive data.
  • the foregoing sending unit 50 includes:
  • the first judgment module is used to judge whether the first near-field interaction signal with the crawler application identifier is received
  • the second acquisition module is configured to, if the first near-field interaction signal with the crawler application identifier is received, acquire the network environment of the second mobile terminal that issued the first near-field interaction signal and the network environment of the first mobile terminal ;
  • the third acquisition module is configured to send the target data of the first mobile terminal to the second mobile terminal through near-field interaction if the second mobile terminal is in the wifi environment and the first mobile terminal is not in the wifi environment Terminal, and send the target data of the two mobile terminals to the server through the second mobile terminal;
  • the fourth acquiring module is configured to, if the second mobile terminal is not in the wifi environment, and the first mobile terminal is in the wifi environment, the first mobile terminal acquires the target data sent by the second mobile terminal through the near-field interaction , And send the target data of the two mobile terminals to the server through the first mobile terminal;
  • the fifth acquisition module is used to compare the signal quality of the two mobile terminals if neither the first mobile terminal nor the second mobile terminal is in the wifi environment, and compare the target in the mobile terminal with poor signal quality through near-field interaction
  • the data is sent to the mobile terminal with good washing quality, and the target data in the two mobile terminals are uploaded to the server through the mobile terminal with good signal quality.
  • the mobile terminal in the wifi environment is first selected to upload the target data, which can save traffic costs; if none is in the wifi environment, a mobile terminal with good signal quality is selected to upload the target data.
  • the free traffic of the two mobile terminals can be analyzed, and then the mobile terminal with more free traffic can be used to upload target data, etc.
  • a crawler application for crawling data is installed on each mobile terminal, and the relevant data of the mobile terminal is crawled from the inside of the mobile terminal.
  • the data crawling speed is fast;
  • the terminal's own encryption function makes it easier to obtain the desired data; the mobile terminal can be set up in a distributed manner, and the scalability is stronger; you can configure different simulation operation programs to obtain data of different heights, so that it can be used in more scenarios .
  • an embodiment of the present application also provides a mobile terminal.
  • the mobile terminal may be a server, and its internal structure may be as shown in FIG. 3.
  • the mobile terminal includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the mobile terminal includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the memory provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium.
  • the database of the mobile terminal is used to store crawler applications and data crawled by crawler applications.
  • the network interface of the mobile terminal is used to communicate with an external terminal through a network connection.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the mobile terminal to which the solution of the present application is applied.
  • a crawler application that crawls data is installed on each mobile terminal, and the relevant data of the mobile terminal is crawled from the inside of the mobile terminal.
  • the crawling speed of the data is fast; no need to consider the encryption function of the mobile terminal , It is easier to get the data you want; mobile terminals can be set up in a distributed manner, and the scalability is stronger; you can configure different simulation operation programs to obtain data of different heights, so that it can be used in more scenarios.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile readable storage medium or a volatile readable storage medium on which computer-readable instructions are stored.
  • the computer-readable instructions are executed by the processor, the method for data crawling by the mobile terminal in any of the foregoing embodiments is implemented.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请揭示了一种移动终端爬取数据的方法、装置、移动终端和存储介质,其中移动终端中安装有爬虫应用程序,方法,包括:第一移动终端获取模拟操作程序,其中,模拟操作程序是模拟用户控制移动终端操作的程序;执行模拟操作程序;通过Fiddler工具获取执行模拟操作程序时的响应数据,并将响应数据保存至指定的数据库中;从数据库中提取指定要求的目标数据;将目标数据发送至指定的服务器。本申请从移动终端内部爬取移动终端的相关数据,爬取数据的速度快;无需考虑移动终端的自身加密功能,更容易获取到想要的数据;可以分布式的设置移动终端,扩展性更强;可以通过配置不同的模拟操作程序,获取不同高的数据,使其使用的场景更多。

Description

移动终端爬取数据的方法、装置、移动终端和存储介质
本申请要求于2019年1月31日提交中国专利局、申请号为2019101004125,申请名称为“移动终端爬取数据的方法、装置、移动终端和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到数据爬取领域,特别是涉及到一种移动终端爬取数据的方法、装置、移动终端和存储介质。
背景技术
网络爬虫(Web crawler),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本,它们被广泛用于互联网搜索引擎或其他类似网站,可以自动采集所有其能够访问到的页面内容,以获取或更新这些网站的内容和检索方式。从功能上来讲,爬虫一般分为数据采集,处理,储存三个部分。
传统爬虫从一个或若干初始网页的URL开始,获得初始网页上的URL,在抓取网页的过程中,不断从当前页面上抽取新的URL放入队列,直到满足系统的一定停止条件。聚焦爬虫的工作流程较为复杂,需要根据一定的网页分析算法过滤与主题无关的链接,保留有用的链接并将其放入等待抓取的URL队列。然后,它将根据一定的搜索策略从队列中选择下一步要抓取的网页URL,并重复上述过程,直到达到系统的某一条件时停止。另外,所有被爬虫抓取的网页将会被系统存贮,进行一定的分析、过滤,并建立索引,以便之后的查询和检索;对于聚焦爬虫来说,这一过程所得到的分析结果还可能对以后的抓取过程给出反馈和指导。
现有技术中缺少从移动终端爬取数据的方法,主要是因为移动终端数据具有自身加密功能,因此会出现获取难的情况,如何爬取移动终端的数据,是一种需要解决的问题。
技术问题
本申请的主要目的为提供一种移动终端爬取数据的方法、装置、移动终端和存储介质,旨在解决难以爬取具有自身加密功能的移动终端的数据的问题。
技术解决方案
为了实现上述发明目的,本申请提出一种移动终端爬取数据的方法,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,所述方法包括:
第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
执行所述模拟操作程序;
通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
从所述数据库中提取指定要求的目标数据;
将所述目标数据发送至指定的服务器。
本申请还提供一种移动终端爬取数据的装置,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,所述装置,包括:
获取单元,用于第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
执行单元,用于执行所述模拟操作程序;
获取存储单元,用于通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
提取单元,用于从所述数据库中提取指定要求的目标数据;
发送单元,用于将所述目标数据发送至指定的服务器。
本申请还提供一种移动终端,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述任一项所述方法的步骤。
本申请还提供一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述任一项所述的方法的步骤。
有益效果
本申请的移动终端爬取数据的方法、装置、移动终端和存储介质,将爬取数据的爬虫应用程序安装在各个移动终端上,从移动终端内部爬取移动终端的相关数据,爬取数据的速度快;无需考虑移动终端的自身加密功能,更容易获取到想要的数据;可以分布式的设置移动终端,扩展性更强;可以通过配置不同的模拟操作程序,获取不同高的数据,使其使用的场景更多。
附图说明
图1 为本申请一实施例的移动终端爬取数据的方法的流程示意图;
图2 为本申请一实施例的移动终端爬取数据的装置的结构示意框图;
图3 为本申请一实施例的移动终端的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,本申请实施例提供一种移动终端爬取数据的方法,所述移动终端中安装有爬取移动终端数据的爬虫应用程序。上述爬虫应用程序是指安装在移动终端上的爬虫APP等可以在移动终端上运行的应用程序。上述爬取数据的方法,包括步骤:
S1、第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
S2、执行所述模拟操作程序;
S3、通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
S4、从所述数据库中提取指定要求的目标数据;
S5、将所述目标数据发送至指定的服务器。
如上述步骤S1所述,上述移动终端即为智能移动设备,包括智能手机、平板电脑等;第一仅为与后面出现的其它移动终端进行区分。上述模拟操作程序是指用于模拟用户控制移动终端操作的程序。模拟操作程序可以模拟用户对手机进行指定操作过程的操作,比如,第一步打开A程序,第二步为控制A程序发送B请求等预设的步骤。
如上述步骤S2所述,即为运行上述模拟操作程序。
如上述步骤S3所述,上述Fiddler是一个http协议调试代理工具,它能够记录并检查所有用户的电脑和互联网之间的http通讯,设置断点,查看所有的"进出"Fiddler的数据(指cookie,html,js,css等文件)。 Fiddler 要比其他的网络调试器更加简单,因为它不仅仅暴露http通讯,还提供了一个用户友好的格式。在执行模拟操作程序时,会发出各种请求数据,并接收到针对各种请求数据的响应数据,比如,请求各种不同网站的请求数据、请求移动终端安装的不同APP的请求数据等,而各网站和各APP会根据请求数据反馈对应的响应数据,这些响应数据会被Fiddler工具获取,然后存储到指定的数据库中,以便后期使用。
如上述步骤S4所述,因为数据库中存储的数据较多,数据类型、数据内容各不相同,所以需要对数据进行提取。上述目标数据即为根据指定要求提取出的数据。具体的,提取某个APP的使用日志数据、提取数据库中与地理位置相关的数据等等。
如上述步骤S5所述,上述指定的服务器是用于接收移动终端提取出的目标数据的服务器。该服务器一般对接有多个移动终端,多个移动终端分别安装有爬虫应用程序,以便于通过多个手机分别进行数据爬取,得到更多的数据。比如,需要对某个APP进行数据调查,因为不同的人使用该APP的习惯不同,所以其产生的关于该APP的数据不同,所以将多个手机上的该APP的数据爬取出来,然后汇总到上述服务器中,可以更好、更全面的分析该APP等。
在一个实施例中,上述第一移动终端获取预设的模拟操作程序的步骤S1,包括:
第一移动终端接收所述服务器的连接请求;
若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
接收所述服务器推送的所述模拟操作程序。
在本实施例中,上述模拟操作程序是服务器推送给移动终端的,在推送上述模拟操作程序之前,首先会与第一移动终端建立网络连接,而移动终端具有加密功能,所以需要移动终端用户主动控制“允许服务器的连接请求”,才会接收服务器推送的模拟操作程序。
在一个实施例中,上述接收所述服务器推送的模拟操作程序的步骤之前,包括:
与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
在本实施例中,目前移动终端安装的系统包括几个阵营,如安卓系统、苹果系统、塞班系统、微软系统等,不同的系统,其需要安装对应的版本的模拟操作程序。在服务器中预先存储有同样功能内容,但是针对不同手机系统的多个版本的模拟操作程序,当获取到移动终端上传的系统识别码后,识别出移动终端安装的系统,然后调取与移动终端安装的系统相对应版本的模拟操作程序推送给第一移动终端,可以保证第一移动终端可以正常运行模拟操作程序。
在一个实施例中,上述执行所述模拟操作程序的步骤S2之前,包括:
接收所述服务器发送的修改参数;
利用所述修改参数修改所述模拟操作程序。
在本实施例中,各个移动终端安装的爬虫应用程序是相同的,其运行的模拟操作程序如果相同,那么其爬取到的数据也基本相同,只有移动终端因自身运行各种APP产生的数据存在不同,所以会降低爬取数据的效率,因此,在执行模拟操作程序之前,服务器随机生成一些参数发送给移动终端,以对模拟操作程序的参数进行修改,进而使各移动终端的模拟操作程序发生变化,进而可以爬取到不同的数据。
在一个实施例中,上述接收所述服务器发送的修改参数的步骤之前,包括:
发送所述第一移动终端当前APP的状态信息给所述服务器,其中,所述服务器根据所述第一移动终端当前APP的状态信息调取对应的所述修改参数。
在本实施例中,上述APP的状态信息包括第一移动终端安装有多少APP,哪些APP是在运行的,各APP的历史使用频率等,这些都会影响参数的选择。比如,当前运行的APP存在5个,那么参数可以为循环访问这5个APP的参数;又比如,第一移动终端安装的APP的使用频率不同,那么参数可以为分为不同频率访问不同的APP等;再如某个APP没有更新也没有使用,那么访问该APP的频率可以低一些等,有针对性的获取数据,减少不必要的访问。
在一个实施例中,上述将所述目标数据发送至指定的服务器的步骤S5,包括:
判断是否接收到具有所述爬虫应用程序标识的第一近场交互信号;
若接收到所述第一近场交互信号,则获取发出第一近场交互信号的第二移动终端中目标数据的第一数据量;
判断所述第一移动终端中目标数据的第二数据量是否大于所述第一数据量;
如果大于,则以近场交互的方式接收所述第二移动终端中的目标数据;
第一移动终端将本机中的目标数据和接收到的目标数据一起发送给所述服务器。
在本实施例中,通过近场交互信号,查找到同样安装有爬虫应用程序的第二移动终端,然后将目标数据量小的目标数据通过近场交互的方式传递给目标数据量大的移动终端,然后统一由目标数据量大的移动终端将目标数据发送给服务器。查找第二移动终端的过程即为判断第一交互信号中的爬虫应用程序标识,与第一移动终端中安装的爬虫应用程序的标识是否相同,如果相同,则说明第一移动终端和第二移动终端安装有同样的爬虫应用程序。爬虫应用程序标识是爬虫应用程序的唯一标识,如爬虫应用程序的唯一包名等。本实施例中,在一个移动终端进行目标数据的整合,然后统一发送给服务器,可以节省服务器接收数据的接口量。
在一个实施例中,上述将所述目标数据发送至指定的服务器的步骤S5,包括:
判断是否接收到具有爬虫应用程序标识的第一近场交互信号;
若接收到具有爬虫应用程序标识的第一近场交互信号,则获取发出第一近场交互信号的第二移动终端的网络环境中,以及第一移动终端的网络环境;
若所述第二移动终端处于wifi环境中,而第一移动终端未处于wifi环境中,则将第一移动终端的目标数据通过近场交互的方式发送给第二移动终端,并通过第二移动终端将两个移动终端的目标数据一起发送给所述服务器;
若所述第二移动终端未处于wifi环境中,而第一移动终端处于wifi环境中,则第一移动终端获取第二移动终端通过近场交互的方式发送的目标数据,并通过第一移动终端将两个移动终端的目标数据一起发送给所述服务器;
若第一移动终端和第二移动终端均未处于wifi环境中,则比较两个移动终端的信号质量,通过近场交互的方式将信号质量差的移动终端中的目标数据发送给洗好质量好的移动终端中,通过信号质量好的移动终端将两个移动终端中的目标数据上传给所述服务器。
在本实施例中,先选择wifi环境的移动终端进行上传目标数据,可以节约流量费用;如果均不处于wifi环境,则选择信号质量好的移动终端上传目标数据。在其它实施例中,当两个移动终端均不处于wifi环境中时,可以分析两个手机终端的免费流量,然后使用免费流量多的移动终端进行上传目标数据等。
本申请实施例的移动终端爬取数据的方法,将爬取数据的爬虫应用程序安装在各个移动终端上,从移动终端内部爬取移动终端的相关数据,爬取数据的速度快;无需考虑移动终端的自身加密功能,更容易获取到想要的数据;可以分布式的设置移动终端,扩展性更强;可以通过配置不同的模拟操作程序,获取不同高的数据,使其使用的场景更多。
参照图2,本申请实施例提供一种移动终端爬取数据的装置,所述移动终端中安装有爬取移动终端数据的爬虫应用程序。上述爬虫应用程序是指安装在移动终端上的爬虫APP等可以在移动终端上运行的应用程序。上述爬取数据的装置,包括步骤:
获取单元10,用于第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
执行单元20,用于执行所述模拟操作程序;
获取存储单元30,用于通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
提取单元40,用于从所述数据库中提取指定要求的目标数据;
发送单元50,用于将所述目标数据发送至指定的服务器。
如上述获取单元10,上述移动终端即为智能移动设备,包括智能手机、平板电脑等;第一仅为与后面出现的其它移动终端进行区分。上述模拟操作程序是指用于模拟用户控制移动终端操作的程序。模拟操作程序可以模拟用户对手机进行指定操作过程的操作,比如,第一步打开A程序,第二步为控制A程序发送B请求等预设的步骤。
如上述执行单元20,即为运行上述模拟操作程序的装置。
如上述获取存储单元30,上述Fiddler是一个http协议调试代理工具,它能够记录并检查所有用户的电脑和互联网之间的http通讯,设置断点,查看所有的"进出"Fiddler的数据(指cookie,html,js,css等文件)。 Fiddler 要比其他的网络调试器更加简单,因为它不仅仅暴露http通讯,还提供了一个用户友好的格式。在执行模拟操作程序时,会发出各种请求数据,并接收到针对各种请求数据的响应数据,比如,请求各种不同网站的请求数据、请求移动终端安装的不同APP的请求数据等,而各网站和各APP会根据请求数据反馈对应的响应数据,这些响应数据会被Fiddler工具获取,然后存储到指定的数据库中,以便后期使用。
如上述提取单元40,因为数据库中存储的数据较多,数据类型、数据内容各不相同,所以需要对数据进行提取。上述目标数据即为根据指定要求提取出的数据。具体的,提取某个APP的使用日志数据、提取数据库中与地理位置相关的数据等等。
如上述发送单元50,上述指定的服务器是用于接收移动终端提取出的目标数据的服务器。该服务器一般对接有多个移动终端,多个移动终端分别安装有爬虫应用程序,以便于通过多个手机分别进行数据爬取,得到更多的数据。比如,需要对某个APP进行数据调查,因为不同的人使用该APP的习惯不同,所以其产生的关于该APP的数据不同,所以将多个手机上的该APP的数据爬取出来,然后汇总到上述服务器中,可以更好、更全面的分析该APP等。
在一个实施例中,上述获取单元10,包括:
第一接收模块,用于接收所述服务器的连接请求;
建立模块,用于若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
第二接收模块,用于接收所述服务器推送的所述模拟操作程序。
在本实施例中,上述模拟操作程序是服务器推送给移动终端的,在推送上述模拟操作程序之前,首先会与第一移动终端建立网络连接,而移动终端具有加密功能,所以需要移动终端用户主动控制“允许服务器的连接请求”,才会接收服务器推送的模拟操作程序。
在一个实施例中,上述获取单元10,还包括:
发送识别码模块,用于与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
在本实施例中,目前移动终端安装的系统包括几个阵营,如安卓系统、苹果系统、塞班系统、微软系统等,不同的系统,其需要安装对应的版本的模拟操作程序。在服务器中预先存储有同样功能内容,但是针对不同手机系统的多个版本的模拟操作程序,当获取到移动终端上传的系统识别码后,识别出移动终端安装的系统,然后调取与移动终端安装的系统相对应版本的模拟操作程序推送给第一移动终端,可以保证第一移动终端可以正常运行模拟操作程序。
在一个实施例中,上述移动终端爬取数据的装置,还包括:
接收单元,用于接收所述服务器发送的修改参数;
修改单元,用于利用所述修改参数修改所述模拟操作程序。
在本实施例中,各个移动终端安装的爬虫应用程序是相同的,其运行的模拟操作程序如果相同,那么其爬取到的数据也基本相同,只有移动终端因自身运行各种APP产生的数据存在不同,所以会降低爬取数据的效率,因此,在执行模拟操作程序之前,服务器可以随机生成一些参数发送给移动终端,以对模拟操作程序的参数进行修改,进而使各移动终端的模拟操作程序发生变化,进而可以爬取到不同的数据。
在一个实施例中,上述移动终端爬取数据的装置,还包括:
发送状态单元,用于发送所述第一移动终端当前APP的状态信息给所述服务器,其中,所述服务器根据所述第一移动终端当前APP的状态信息调取对应的所述修改参数。
在本实施例中,上述APP的状态信息包括第一移动终端安装有多少APP,哪些APP是在运行的,各APP的历史使用频率等,这些都会影响参数的选择。比如,当前运行的APP存在5个,那么参数可以为循环访问这5个APP的参数;又比如,第一移动终端安装的APP的使用频率不同,那么参数可以为分为不同频率访问不同的APP等;再如某个APP没有更新也没有使用,那么访问该APP的频率可以低一些等,有针对性的获取数据,减少不必要的访问。
在一个实施例中,上述发送单元50,包括:
第一判断模块,用于判断是否接收到具有所述爬虫应用程序标识的第一近场交互信号;
第一获取模块,用于若接收到所述第一近场交互信号,则获取发出第一近场交互信号的第二移动终端中目标数据的第一数据量;
第二判断模块,用于判断所述第一移动终端中目标数据的第二数据量是否大于所述第一数据量;
近场接收模块,用于如果所述第二数据量大于所述第一数据量,则以近场交互的方式接收所述第二移动终端中的目标数据;
综合发送模块,用于第一移动终端将本机中的目标数据和接收到的目标数据一起发送给所述服务器。
在本实施例中,通过近场交互信号,查找到同样安装有爬虫应用程序的第二移动终端,然后将目标数据量小的目标数据通过近场交互的方式传递给目标数据量大的移动终端,然后统一由目标数据量大的移动终端将目标数据发送给服务器。查找第二移动终端的过程即为判断第一交互信号中的爬虫应用程序标识,与第一移动终端中安装的爬虫应用程序的标识是否相同,如果相同,则说明第一移动终端和第二移动终端安装有同样的爬虫应用程序。爬虫应用程序标识是爬虫应用程序的唯一标识,如爬虫应用程序的唯一包名等。本实施例中,在一个移动终端进行目标数据的整合,然后统一发送给服务器,可以节省服务器接收数据的接口量。
在一个实施例中,上述发送单元50,包括:
第一判断模块,用于判断是否接收到具有爬虫应用程序标识的第一近场交互信号;
第二获取模块,用于若接收到具有爬虫应用程序标识的第一近场交互信号,则获取发出第一近场交互信号的第二移动终端的网络环境中,以及第一移动终端的网络环境;
第三获取模块,用于若所述第二移动终端处于wifi环境中,而第一移动终端未处于wifi环境中,则将第一移动终端的目标数据通过近场交互的方式发送给第二移动终端,并通过第二移动终端将两个移动终端的目标数据一起发送给所述服务器;
第四获取模块,用于若所述第二移动终端未处于wifi环境中,而第一移动终端处于wifi环境中,则第一移动终端获取第二移动终端通过近场交互的方式发送的目标数据,并通过第一移动终端将两个移动终端的目标数据一起发送给所述服务器;
第五获取模块,用于若第一移动终端和第二移动终端均未处于wifi环境中,则比较两个移动终端的信号质量,通过近场交互的方式将信号质量差的移动终端中的目标数据发送给洗好质量好的移动终端中,通过信号质量好的移动终端将两个移动终端中的目标数据上传给所述服务器。
在本实施例中,先选择wifi环境的移动终端进行上传目标数据,可以节约流量费用;如果均不处于wifi环境,则选择信号质量好的移动终端上传目标数据。在其它实施例中,当两个移动终端均不处于wifi环境中时,可以分析两个手机终端的免费流量,然后使用免费流量多的移动终端进行上传目标数据等。
本申请实施例的移动终端爬取数据的装置,将爬取数据的爬虫应用程序安装在各个移动终端上,从移动终端内部爬取移动终端的相关数据,爬取数据的速度快;无需考虑移动终端的自身加密功能,更容易获取到想要的数据;可以分布式的设置移动终端,扩展性更强;可以通过配置不同的模拟操作程序,获取不同高的数据,使其使用的场景更多。
参照图3,本申请实施例中还提供一种移动终端,该移动终端可以是服务器,其内部结构可以如图3所示。该移动终端包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该移动终端的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该移动终端的数据库用于存储爬虫应用程序、爬虫应用程序爬取的数据等。该移动终端的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现上述任一实施例中的移动终端爬取数据的方法。
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的移动终端的限定。
本申请实施例的移动终端,将爬取数据的爬虫应用程序安装在各个移动终端上,从移动终端内部爬取移动终端的相关数据,爬取数据的速度快;无需考虑移动终端的自身加密功能,更容易获取到想要的数据;可以分布式的设置移动终端,扩展性更强;可以通过配置不同的模拟操作程序,获取不同高的数据,使其使用的场景更多。
本申请一实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以是非易失性可读存储介质,也可以是易失性可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现上述任一实施例中的移动终端爬取数据的方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种移动终端爬取数据的方法,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,其特征在于,所述方法包括:
    第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
    执行所述模拟操作程序;
    通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
    从所述数据库中提取指定要求的目标数据;
    将所述目标数据发送至指定的服务器。
  2. 根据权利要求1所述的移动终端爬取数据的方法,其特征在于,所述第一移动终端调用预设的模拟操作程序的步骤,包括:
    所述第一移动终端接收所述服务器的连接请求;
    若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
    接收所述服务器推送的所述模拟操作程序。
  3. 根据权利要求2所述的移动终端爬取数据的方法,其特征在于,所述接收所述服务器推送的模拟操作程序的步骤之前,包括:
    与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
  4. 根据权利要求3所述的移动终端爬取数据的方法,其特征在于,所述执行所述模拟操作程序的步骤之前,包括:
    接收所述服务器发送的修改参数;
    利用所述修改参数修改所述模拟操作程序。
  5. 根据权利要求4所述的移动终端爬取数据的方法,其特征在于,所述接收所述服务器发送的修改参数的步骤之前,包括:
    发送所述第一移动终端当前APP的状态信息给所述服务器,其中,所述服务器根据所述第一移动终端当前APP的状态信息调取对应的所述修改参数。
  6. 根据权利要求1所述的移动终端爬取数据的方法,其特征在于,所述将所述目标数据发送至指定的服务器的步骤,包括:
    判断是否接收到具有所述爬虫应用程序标识的第一近场交互信号;
    若接收到所述第一近场交互信号,则获取发出第一近场交互信号的第二移动终端中目标数据的第一数据量;
    判断所述第一移动终端中目标数据的第二数据量是否大于所述第一数据量;
    如果大于,则以近场交互的方式接收所述第二移动终端中的目标数据;
    第一移动终端将本机中的目标数据和接收到的目标数据一起发送给所述服务器。
  7. 根据权利要求1所述的移动终端爬取数据的方法,其特征在于,所述将所述目标数据发送至指定的服务器的步骤,包括:
    判断是否接收到具有爬虫应用程序标识的第一近场交互信号;
    若接收到具有爬虫应用程序标识的第一近场交互信号,则获取发出第一近场交互信号的第二移动终端的网络环境中,以及第一移动终端的网络环境;
    若所述第二移动终端处于wifi环境中,而第一移动终端未处于wifi环境中,则将第一移动终端的目标数据通过近场交互的方式发送给第二移动终端,并通过第二移动终端将两个移动终端的目标数据一起发送给所述服务器;
    若所述第二移动终端未处于wifi环境中,而第一移动终端处于wifi环境中,则第一移动终端获取第二移动终端通过近场交互的方式发送的目标数据,并通过第一移动终端将两个移动终端的目标数据一起发送给所述服务器;
    若第一移动终端和第二移动终端均未处于wifi环境中,则比较两个移动终端的信号质量,通过近场交互的方式将信号质量差的移动终端中的目标数据发送给洗好质量好的移动终端中,通过信号质量好的移动终端将两个移动终端中的目标数据上传给所述服务器。
  8. 一种移动终端爬取数据的装置,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,其特征在于,所述装置包括:
    获取单元,用于第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
    执行单元,用于执行所述模拟操作程序;
    获取存储单元,用于通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
    提取单元,用于从所述数据库中提取指定要求的目标数据;
    发送单元,用于将所述目标数据发送至指定的服务器。
  9. 根据权利要求8所述的移动终端爬取数据的装置,其特征在于,所述获取单元,包括:
    第一接收模块,用于接收所述服务器的连接请求;
    建立模块,用于若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
    第二接收模块,用于接收所述服务器推送的所述模拟操作程序。
  10. 根据权利要求9所述的移动终端爬取数据的装置,其特征在于,所述获取单元,包括:
    发送识别码模块,用于与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
  11. 根据权利要求10所述的移动终端爬取数据的装置,其特征在于,还包括:
    接收单元,用于接收所述服务器发送的修改参数;
    修改单元,用于利用所述修改参数修改所述模拟操作程序。
  12. 根据权利要求10所述的移动终端爬取数据的装置,其特征在于,还包括:
    发送状态单元,用于发送所述第一移动终端当前APP的状态信息给所述服务器,其中,所述服务器根据所述第一移动终端当前APP的状态信息调取对应的所述修改参数。
  13. 根据权利要求8所述的移动终端爬取数据的装置,其特征在于,所述发送单元,包括:
    第一判断模块,用于判断是否接收到具有所述爬虫应用程序标识的第一近场交互信号;
    第一获取模块,用于若接收到所述第一近场交互信号,则获取发出第一近场交互信号的第二移动终端中目标数据的第一数据量;
    第二判断模块,用于判断所述第一移动终端中目标数据的第二数据量是否大于所述第一数据量;
    近场接收模块,用于如果所述第二数据量大于所述第一数据量,则以近场交互的方式接收所述第二移动终端中的目标数据;
    综合发送模块,用于第一移动终端将本机中的目标数据和接收到的目标数据一起发送给所述服务器。
  14. 根据权利要求8所述的移动终端爬取数据的装置,其特征在于,所述发送单元,包括:
    第一判断模块,用于判断是否接收到具有爬虫应用程序标识的第一近场交互信号;
    第二获取模块,用于若接收到具有爬虫应用程序标识的第一近场交互信号,则获取发出第一近场交互信号的第二移动终端的网络环境中,以及第一移动终端的网络环境;
    第三获取模块,用于若所述第二移动终端处于wifi环境中,而第一移动终端未处于wifi环境中,则将第一移动终端的目标数据通过近场交互的方式发送给第二移动终端,并通过第二移动终端将两个移动终端的目标数据一起发送给所述服务器;
    第四获取模块,用于若所述第二移动终端未处于wifi环境中,而第一移动终端处于wifi环境中,则第一移动终端获取第二移动终端通过近场交互的方式发送的目标数据,并通过第一移动终端将两个移动终端的目标数据一起发送给所述服务器;
    第五获取模块,用于若第一移动终端和第二移动终端均未处于wifi环境中,则比较两个移动终端的信号质量,通过近场交互的方式将信号质量差的移动终端中的目标数据发送给洗好质量好的移动终端中,通过信号质量好的移动终端将两个移动终端中的目标数据上传给所述服务器。
  15. 一种移动终端,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现一种移动终端爬取数据的方法,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,所述方法包括:
    第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
    执行所述模拟操作程序;
    通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
    从所述数据库中提取指定要求的目标数据;
    将所述目标数据发送至指定的服务器。
  16. 根据权利要求15所述的移动终端,其特征在于,所述第一移动终端调用预设的模拟操作程序的步骤,包括:
    所述第一移动终端接收所述服务器的连接请求;
    若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
    接收所述服务器推送的所述模拟操作程序。
  17. 根据权利要求16所述的移动终端,其特征在于,所述接收所述服务器推送的模拟操作程序的步骤之前,包括:
    与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
  18. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现一种移动终端爬取数据的方法,所述移动终端中安装有爬取移动终端数据的爬虫应用程序,所述方法包括:
    第一移动终端获取模拟操作程序,其中,所述模拟操作程序是模拟用户控制移动终端操作的程序;
    执行所述模拟操作程序;
    通过Fiddler工具获取执行所述模拟操作程序时的响应数据,并将所述响应数据保存至指定的数据库中;
    从所述数据库中提取指定要求的目标数据;
    将所述目标数据发送至指定的服务器。
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述第一移动终端调用预设的模拟操作程序的步骤,包括:
    所述第一移动终端接收所述服务器的连接请求;
    若接收到用户输入的允许连接请求的命令,则与所述服务器建立网络连接;
    接收所述服务器推送的所述模拟操作程序。
  20. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述接收所述服务器推送的模拟操作程序的步骤之前,包括:
    与所述服务器建立连接后,发送系统识别码给所述服务器,其中,所述系统识别码用于服务器识别与所述移动终端安装的系统对应的模拟操作程序。
PCT/CN2019/118169 2019-01-31 2019-11-13 移动终端爬取数据的方法、装置、移动终端和存储介质 WO2020155765A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910100412.5 2019-01-31
CN201910100412.5A CN109918553A (zh) 2019-01-31 2019-01-31 移动终端爬取数据的方法、装置、移动终端和存储介质

Publications (1)

Publication Number Publication Date
WO2020155765A1 true WO2020155765A1 (zh) 2020-08-06

Family

ID=66961287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118169 WO2020155765A1 (zh) 2019-01-31 2019-11-13 移动终端爬取数据的方法、装置、移动终端和存储介质

Country Status (2)

Country Link
CN (1) CN109918553A (zh)
WO (1) WO2020155765A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918553A (zh) * 2019-01-31 2019-06-21 平安科技(深圳)有限公司 移动终端爬取数据的方法、装置、移动终端和存储介质
CN111400722B (zh) * 2020-03-25 2023-04-07 深圳市腾讯网域计算机网络有限公司 扫描小程序的方法、装置、计算机设备和存储介质
CN112100473A (zh) * 2020-09-21 2020-12-18 工业互联网创新中心(上海)有限公司 基于应用接口的爬虫方法、终端及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170185678A1 (en) * 2015-12-28 2017-06-29 Le Holdings (Beijing) Co., Ltd. Crawler system and method
CN107256276A (zh) * 2017-08-01 2017-10-17 北京合天智汇信息技术有限公司 一种基于云平台的移动App内容安全获取方法及设备
CN108875368A (zh) * 2017-05-10 2018-11-23 北京金山云网络技术有限公司 一种安全检测方法、装置及系统
CN109918553A (zh) * 2019-01-31 2019-06-21 平安科技(深圳)有限公司 移动终端爬取数据的方法、装置、移动终端和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108089967A (zh) * 2017-12-12 2018-05-29 成都睿码科技有限责任公司 一种爬取安卓手机App数据的方法
CN108804559B (zh) * 2018-05-22 2022-07-12 清华大学 一种移动应用内容获取方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170185678A1 (en) * 2015-12-28 2017-06-29 Le Holdings (Beijing) Co., Ltd. Crawler system and method
CN108875368A (zh) * 2017-05-10 2018-11-23 北京金山云网络技术有限公司 一种安全检测方法、装置及系统
CN107256276A (zh) * 2017-08-01 2017-10-17 北京合天智汇信息技术有限公司 一种基于云平台的移动App内容安全获取方法及设备
CN109918553A (zh) * 2019-01-31 2019-06-21 平安科技(深圳)有限公司 移动终端爬取数据的方法、装置、移动终端和存储介质

Also Published As

Publication number Publication date
CN109918553A (zh) 2019-06-21

Similar Documents

Publication Publication Date Title
US11074087B2 (en) System and method for identifying, indexing, and navigating to deep states of mobile applications
CN109766262B (zh) 接口数据处理方法、自动化测试方法、装置、设备和介质
US10412176B2 (en) Website access method, apparatus, and website system
CN106503134B (zh) 浏览器跳转至应用程序的数据同步方法及装置
CN108293081B (zh) 通过用户界面事件的程序重放深度链接到移动应用状态
US11538046B2 (en) Page data acquisition method, apparatus, server, electronic device and computer readable medium
CN102393857B (zh) 一种用网页进行本地调用的方法和系统
CN110651252A (zh) 内容管理系统扩展
WO2020155765A1 (zh) 移动终端爬取数据的方法、装置、移动终端和存储介质
US9497248B2 (en) System for enabling rich network applications
CN109688280A (zh) 请求处理方法、请求处理设备、浏览器及存储介质
CN107391775A (zh) 一种通用的网络爬虫模型实现方法及系统
CN104317570B (zh) 动态解析Web应用的装置和方法
US11210198B2 (en) Distributed web page performance monitoring methods and systems
CN107370628B (zh) 基于埋点的日志处理方法及系统
CN103607454A (zh) Android系统浏览器设置私有代理服务器的方法
US20140074814A1 (en) Method and apparatus for switching search engine to repeat search
JPH09325906A (ja) コンピュータシステム
US11115462B2 (en) Distributed system
CN114253441B (zh) 目标功能的启用方法及装置、存储介质、电子装置
CN111338928A (zh) 基于chrome浏览器测试的方法及装置
RU2595763C2 (ru) Способ и устройство менеджмента загрузки на базе браузера android
CN110837612A (zh) 统一资源标识符uri数据的获取方法和装置、存储介质
CN114090936B (zh) 一种从任意系统获取cookie数据并分析与存储的方法及装置
WO2021226954A1 (zh) 信息爬取方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913848

Country of ref document: EP

Kind code of ref document: A1