CN112256959A - Method for analyzing information collected by WeChat public number small program - Google Patents

Method for analyzing information collected by WeChat public number small program Download PDF

Info

Publication number
CN112256959A
CN112256959A CN202011044049.9A CN202011044049A CN112256959A CN 112256959 A CN112256959 A CN 112256959A CN 202011044049 A CN202011044049 A CN 202011044049A CN 112256959 A CN112256959 A CN 112256959A
Authority
CN
China
Prior art keywords
module
interface
information
click
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011044049.9A
Other languages
Chinese (zh)
Other versions
CN112256959B (en
Inventor
窦禹
王一宇
易立
陆希玉
王云荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Publication of CN112256959A publication Critical patent/CN112256959A/en
Application granted granted Critical
Publication of CN112256959B publication Critical patent/CN112256959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The invention provides a method for analyzing information collected by a WeChat public number small program, belonging to the technical field of network data analysis. The invention adopts an automatic information acquisition tool to acquire user information, wherein the tool comprises an automatic simulation click module, an interface identification module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module. The invention adopts a simulator and interface layout recognition mode to automatically simulate operation and login of the WeChat, click and crawl all events and interfaces, recognize and analyze the interfaces and acquire the condition of collecting user information. The invention realizes the automatic analysis and processing of the public number and the small program collected information, can save a large amount of human resources, and can efficiently and accurately classify the data and find the collected information.

Description

Method for analyzing information collected by WeChat public number small program
Technical Field
The invention belongs to the technical field of network data analysis, and relates to a method for analyzing information collected by a WeChat public number small program.
Background
At present, along with the popularization of networks, various fields are changed greatly, particularly, informatization transformation is carried out in various industries such as education, traffic, medical treatment, news, government affairs and the like, social development and social revolution are promoted, and various applications are produced while a large number of enterprises research and develop various business applications to provide services for users. In order to efficiently analyze application-collected information, perform cluster analysis on the information, and tag attributes of applications, a technique for analyzing application-collected information is required.
Most of the prior markets analyze the collected user information based on websites and rely on mature crawler technology for analysis, but the WeChat public numbers and small programs rely on WeChat public platforms for service, the prior crawler technology cannot directly perform crawling analysis, and therefore, in the aspect of analyzing the information collected by the WeChat small programs, the method needs to be realized by combining automatic simulation application and flow acquisition analysis. At present, simulators suitable for various application systems exist in the prior art, but a technology for automatically collecting information of a wechat public number and a small program needs to be further explored.
Disclosure of Invention
In order to solve the problems, the invention provides a method for analyzing information collected by a WeChat public number small program, which is realized on the basis of the existing tool and marks application information by actively discovering the application and analyzing and acquiring the information collected by the application.
The invention provides a method for analyzing information collected by a WeChat public number small program, which comprises an automatic information acquisition tool, wherein the tool comprises an automatic simulation click module, an interface recognition module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module. The method starts an automatic information acquisition tool to collect information, and comprises the following steps:
(1) the automatic click simulation module executes: starting an android system simulation environment application program, identifying and starting a WeChat application program in the simulation environment application program, and starting a packet capturing tool; the packet capturing tool starts to capture packets of the network traffic;
(2) recording the name of the WeChat public number or the applet to be analyzed in a text of a preset path, and reading the text by an automatic simulated click module to obtain the name of the WeChat public number or the applet;
(3) the automatic simulation clicking module sends the read name to the interface identification module; the interface identification module identifies a WeChat search box in the opened WeChat application program, inputs the received name and acquires a search result list; the interface identification module identifies the search result, finds out the public number or the applet with the corresponding name and sends the positioning information to the automatic simulation click module;
(4) the automatic simulated clicking module carries out simulated clicking according to the positioning information, enters a public number attention interface, then calls an interface identification module to identify the attention interface, and simulates and clicks an attention public number or a small program;
(5) the interface identification module identifies and acquires the menus on a public number or a small program main interface, and calls an automatic simulation click module to click each menu;
(6) the interface analysis module analyzes interface elements in the clicked functional interface, finds an event possibly containing user information, and calls the automatic simulation click module to click a trigger event; the interface analysis module identifies and analyzes the interface of the triggered event, collects user information and sends the user information to the collected information analysis module;
(7) judging whether each menu of the main interface of the public number or the small program is clicked, if so, continuing to execute the step (8), otherwise, clicking the next menu and continuing to execute the step (6);
(8) the flow capturing and analyzing module analyzes the captured network flow in real time, extracts element information in the link interface and information contained in the link interface source code and sends the information to the collected information analyzing module;
(9) the collected information analysis module cleans and integrates the received information, determines the attribute classification of the data, and outputs the user information collected in the public numbers or the applets classified according to the attributes. The user information comprises the geographic position of the user, the registered name and the like.
In the step (6), when the interface analysis module analyzes the geographical position authorization interface, the geographical position of the user is acquired and sent to the collected information analysis module; when the interface analysis module analyzes the registration or login interface of the public number or the small program, the simulation login module is called, the form is automatically filled, the filled form information is recorded, and the filled form information is sent to the collected information analysis module.
Compared with the prior art, the invention has the advantages and positive effects that: the invention realizes the automatic analysis and processing of the public number and the small program collected information, acquires the automatic flow tool for the application collected information, automatically discovers the relevant application through the tool, acquires and analyzes the information contained in the application and the information in the flow, and marks and classifies the data, thereby saving a large amount of human resources, and efficiently and accurately classifying the data and discovering the information collected by the application.
Drawings
FIG. 1 is a flow chart of a method of analyzing information collected by a WeChat public Small program according to the present invention.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and examples.
At present, information collection of the WeChat public numbers or the small programs is carried out manually, time and labor are wasted, and efficiency is low. The invention realizes more comprehensive acquisition of user information, and can dig out the user information hidden and collected by the micro-message public number or the small program so as to further analyze the micro-message public number or the small program.
The invention provides a method for analyzing information collected by a WeChat public number small program, which is realized by designing an automatic information collection tool, calling a simulator in the prior art, adopting an interface layout recognition mode to automatically simulate operation and login for the WeChat, clicking and crawling all events and interfaces in the simulated small program or the public number, such as registration, login and other operations, identifying and analyzing the interfaces and acquiring the information condition of collected users. Meanwhile, the invention captures the network flow generated by the WeChat public number or the applet in the simulation operation, obtains the relevant page code and identifies the user information field. And integrating the collected user information to finally obtain the condition of collecting the user information by the WeChat applet or the public number.
The embodiment of the invention takes the analysis of the collected information of the public number of the credit card center of the safe bank as an example to explain the realization of the method for analyzing the collected information of the small program of the WeChat public number. When the method is realized, the invention designs an automatic information acquisition tool which comprises an automatic simulation click module, an interface identification module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module, a collected information analyzing module and the like, and finally realizes automatic analysis and collection of user information by means of the tool. An automatic information acquisition tool is installed on an intelligent machine to which the method of the invention is applied, and then the following steps are executed to realize the method of the invention, and the realization flow is shown in fig. 1.
Step 1, an automatic click simulation module calls an android system simulation environment application program, a night android simulator is started in the embodiment of the invention, a WeChat application program is identified and started in the simulator, and meanwhile, a packet capturing tool is started in the simulator to start packet capturing on network traffic.
In the embodiment of the invention, the automatic click simulation module is realized by a command line tool provided in an android system basic development tool, and after the automatic information acquisition tool is started, the automatic click simulation module executes a command line to realize the function of starting each application program. In addition, the starting of the android simulator and the WeChat application program can also be operated by manually using a mouse, and the automatic simulation click module is not used at the moment.
In the embodiment of the invention, a packet capturing tool Fiddler is started in a simulator, and a private HTTPS agent is constructed so as to capture an interface of an HTTPS request and request parameters.
Step 2, the automatic click simulation module automatically reads the public number name of the external data: a secure bank credit card center.
The external data is recorded in the text of the preset path, and the automatic click simulation module automatically reads the content in the text according to the preset path.
And 3, the automatic simulated click module sends the read name to the interface identification module, the interface identification module identifies a WeChat search box in the opened WeChat program, and inputs the received name, namely the center of the safe bank credit card, to obtain a search result list.
And 4, identifying the search result by the interface identification module, acquiring characters corresponding to the search result, finding information with the public number of 'safe bank credit card center', and sending accurate positioning information to the automatic click simulation module.
The interface identification module is a layout identification tool, such as implemented by using an element positioning tool uiautomatatorviewer of the android system.
And 5, performing simulated clicking by the automatic simulated clicking module according to the positioning information, and entering a public number attention interface.
And 6, calling an element positioning tool of the android system by the automatic click simulation module to identify the concerned interface, searching a concerned public number button, carrying out click simulation, and concerning the public number.
In the embodiment of the invention, a command line adb tool is used for acquiring the current layout file of the android application, and a target module is optimized and positioned on a component file by adopting a deep learning-based method; obtaining key attribute values in the interface; simulated clicks are made using the adb tool.
And 7, identifying and acquiring the menu of the public number at the main interface of the public number by the interface identification module.
And 8, clicking each menu by the automatic click simulation module, wherein the embodiment of the invention takes clicking (online card transaction) to enter a corresponding functional interface as an example.
Step 9, the interface analysis module analyzes elements in the interface in the opened functional interface, finds an event possibly containing user information, such as finding an [ apply ] button, calls an automatic click simulation module to perform simulated click, and enters a card transaction application interface;
step 10, after entering, a geographical position authorization interface appears, and the interface analysis module finds and records and collects geographical position information of the user through identification and analysis of the interface. Calling an automatic click simulation module to simulate clicking (confirm) buttons; a secure bank credit card application interface appears.
Step 11, identifying and analyzing the interface, analyzing data information in the interface, and analyzing and storing the data; and calling a simulation login module to automatically register, filling the form elements according to a specified format, clicking an application button through an automatic simulation click module, storing the filled form information, and sending the form information to a collected information analysis module.
And the click event enters an interface containing an embedded Html5, the interface content is identified through identifying and monitoring the layout of the Html interface, elements on the interface are analyzed by an interface analysis module by adopting an interface identification analysis algorithm, and the public information in the application is discovered.
And step 12, repeating the processes from step 8 to step 11 for all menus in the WeChat public number to discover other events and user information contained in the interface.
And step 13, analyzing the captured traffic in real time by the traffic capturing and analyzing module, acquiring linked information by adopting a self-built https agent aiming at the http or https link, such as a security bank credit card application link, analyzing the interface for the source code containing the interface html, and extracting the name (name) and the corresponding text (text) element in the input element of the interface. The flow capturing and analyzing module also extracts information data contained in the interface source code.
And 14, integrating and analyzing the information collected in the step by the collected information analysis module, and sorting out detailed data information related to the public number. The data detail information is information classified by attribute and output according to each attribute type and corresponding information data.
The method can obviously improve the data acquisition capability. If the information data of the public number applet of 10 ten thousand enterprise names are acquired, if the information data are processed in a manual mode, 4 persons are needed, 300 persons/day and about three months are needed, the method can be used for processing the information data by distributing tasks, cooperatively crawling by multiple devices and 1000 persons/day according to 10 terminals, and the information data can be processed in about 10 days. Therefore, the method can greatly improve the data acquisition speed and capacity, and can carry out customized transformation according to the needs to meet diversified requirements.

Claims (3)

1. A method for analyzing information collected by a WeChat public number small program is characterized in that an automatic information collection tool is started to collect information, and the tool comprises an automatic simulation click module, an interface recognition module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module; the method comprises the following steps:
step 1, starting an android system simulation environment application program by an automatic simulation click module, identifying and starting a WeChat application program in a simulation environment, and starting a packet capturing tool; the packet capturing tool starts to capture packets of the network traffic;
step 2, recording the name of the WeChat public number or the applet to be analyzed in a text of a preset path, and reading the text by an automatic simulation click module to obtain the name of the WeChat public number or the applet;
step 3, the automatic click simulation module sends the read name to the interface identification module; the interface identification module identifies a WeChat search box in the opened WeChat program, inputs the received name and acquires a search result list; the interface identification module identifies the search result, finds out the public number or the applet with the corresponding name and sends the positioning information to the automatic simulation click module;
step 4, the automatic click simulation module carries out click simulation according to the positioning information, enters a public number attention interface, then calls an interface identification module to identify the attention interface, and simulates and clicks the attention public number or the small program;
step 5, in the main interface of the public number or the small program, an interface identification module identifies and acquires the menus, and an automatic simulated click module is called to click each menu;
step 6, the interface analysis module analyzes elements in the interface in the clicked functional interface, finds an event containing user information, and calls an automatic simulation click module to click a trigger event; the interface analysis module identifies and analyzes the interface of the triggered event, collects user information and sends the user information to the collected information analysis module;
step 7, judging whether each menu of the public number or the small program main interface is clicked, if so, continuing to execute the step 8, otherwise, clicking the next menu and continuing to execute the step 6;
step 8, analyzing the captured network traffic in real time by the traffic capturing and analyzing module, extracting element information in the link interface and information contained in the link interface source code, and sending the information to the collected information analyzing module;
and 9, the collected information analysis module cleans and integrates the received information and outputs user information collected in public numbers or applets.
2. The method according to claim 1, wherein in step 6, when the interface analysis module analyzes the "geographic location authorization" interface, the geographic location of the user is acquired and sent to the collected information analysis module; when the interface analysis module analyzes the registration or login interface of the public number or the small program, the simulation login module is called, the form is automatically filled, the filled form information is recorded, and the filled form information is sent to the collected information analysis module.
3. The method of claim 1, wherein in step 8, the traffic capture parsing module parses the link interface, extracts the name and text content in the input element in the interface, and sends the extracted name and text content to the collected information analysis module.
CN202011044049.9A 2020-06-11 2020-09-28 Method for analyzing information collected by WeChat public number small program Active CN112256959B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020105303275 2020-06-11
CN202010530327 2020-06-11

Publications (2)

Publication Number Publication Date
CN112256959A true CN112256959A (en) 2021-01-22
CN112256959B CN112256959B (en) 2022-11-08

Family

ID=74233334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011044049.9A Active CN112256959B (en) 2020-06-11 2020-09-28 Method for analyzing information collected by WeChat public number small program

Country Status (1)

Country Link
CN (1) CN112256959B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822036A (en) * 2021-09-28 2021-12-21 百度在线网络技术(北京)有限公司 Privacy policy content generation method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078924A (en) * 1998-01-30 2000-06-20 Aeneid Corporation Method and apparatus for performing data collection, interpretation and analysis, in an information platform
CN105320740A (en) * 2015-09-22 2016-02-10 清华大学 WeChat article and official account acquisition method and acquisition system
CN106384249A (en) * 2016-09-13 2017-02-08 四川长虹电器股份有限公司 WeChat official account platform management system
CN108833264A (en) * 2018-06-25 2018-11-16 厦门理工学院 Data acquisition management system, method and application based on wechat small routine
CN110177139A (en) * 2019-05-23 2019-08-27 中国搜索信息科技股份有限公司 A kind of ostensible mobile APP data grab method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078924A (en) * 1998-01-30 2000-06-20 Aeneid Corporation Method and apparatus for performing data collection, interpretation and analysis, in an information platform
CN105320740A (en) * 2015-09-22 2016-02-10 清华大学 WeChat article and official account acquisition method and acquisition system
CN106384249A (en) * 2016-09-13 2017-02-08 四川长虹电器股份有限公司 WeChat official account platform management system
CN108833264A (en) * 2018-06-25 2018-11-16 厦门理工学院 Data acquisition management system, method and application based on wechat small routine
CN110177139A (en) * 2019-05-23 2019-08-27 中国搜索信息科技股份有限公司 A kind of ostensible mobile APP data grab method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822036A (en) * 2021-09-28 2021-12-21 百度在线网络技术(北京)有限公司 Privacy policy content generation method and device and electronic equipment
CN113822036B (en) * 2021-09-28 2022-07-12 百度在线网络技术(北京)有限公司 Privacy policy content generation method and device and electronic equipment

Also Published As

Publication number Publication date
CN112256959B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN103546343B (en) The network traffics methods of exhibiting of network traffic analysis system and system
CN102968494B (en) The system and method for transport information is gathered by microblogging
CN109656792A (en) Applied performance analysis method, apparatus, computer equipment and storage medium based on network call log
CN107894889A (en) Bury point methods, equipment and computer-readable recording medium
CN114817968B (en) Method, device and equipment for tracing path of featureless data and storage medium
WO2021114985A1 (en) Companionship object identification method and apparatus, server and system
CN111628896A (en) IT operation and maintenance management method, device, equipment and computer storage medium
CN112256959B (en) Method for analyzing information collected by WeChat public number small program
CN111355628B (en) Model training method, service identification method, device and electronic device
CN108429747A (en) A kind of extensive Web server information collecting method
CN103399968B (en) A kind of micro-blog information acquisition method and system
CN111882368B (en) On-line advertisement DPI encryption buried point and transparent transmission tracking method
CN116049808B (en) Equipment fingerprint acquisition system and method based on big data
CN111917848A (en) Data processing method based on edge computing and cloud computing cooperation and cloud server
CN111581067A (en) Data acquisition method and device
CN111966339A (en) Method and device for recording buried point parameters, computer equipment and storage medium
CN115296892B (en) Data information service system
CN115309802A (en) User distribution thermodynamic diagram acquisition method and device, electronic equipment and storage medium
CN115357656A (en) Information processing method and device based on big data and storage medium
CN104376021A (en) File recommending system and method
CN112528104A (en) Traceability system and traceability method based on sensitive data
CN113190458A (en) Method and device for automatically analyzing buried point data, computer equipment and storage medium
CN107295087B (en) System and method for realizing data aggregation between network systems
CN112182462A (en) Intelligent network information acquisition system and acquisition method
CN110336777A (en) The communication interface acquisition method and device of Android application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant