CN116244161B

CN116244161B - Data acquisition method based on depth simulation operation

Info

Publication number: CN116244161B
Application number: CN202310530049.7A
Authority: CN
Inventors: 魏传强; 矫娟; 宋耀; 徐哲; 司君波
Original assignee: Shandong Qilu Yidian Media Co ltd
Current assignee: Shandong Qilu Yidian Media Co ltd
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-08-11
Anticipated expiration: 2043-05-12
Also published as: CN116244161A

Abstract

The application provides a data acquisition method based on depth simulation operation, which comprises the following steps: collecting a data set of complete operation behaviors of a plurality of APP; training a simulated user operation model established by using a DQN algorithm by using a data set; the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process. According to the application, the deep reinforcement learning algorithm model training is performed by collecting a large amount of behavior data of different APP operated by different mobile phones, and the information acquisition by simulating the operation of a real person is realized by an improved training method.

Description

Data acquisition method based on depth simulation operation

Technical Field

The application belongs to the technical field of data acquisition, and particularly relates to a data acquisition method based on depth simulation operation.

Background

The arrival of the mobile internet era has caused great changes in the habit, mode, scene and channel of news reading. On the one hand, the time for people to browse news in a fixed place is reduced, and the time for acquiring news by using the fragmentation time is increased, so that the typical trend of mobility, fragmentation and convenience is presented. On the other hand, users prefer the news content of 'short, flat and fast', and attach importance to their own participation in the process of reading news, so that the mobile news information platform becomes one of important channels for users to receive news.

In order to ensure normal use of the APP, further research is required to be performed on the APP and collect relevant data in the test process, for example, whether each APP in the same model of mobile phone can normally operate or not is researched, and whether the same APP can normally operate in multiple models of mobile phones or not is researched. At present, a mobile phone development enterprise usually has a plurality of types of mobile phones under the flag, the mobile phone types are various, the APP interface is not uniform, the automatic acquisition difficulty of the traditional APP data acquisition scheme for data is high, programs with poor flexibility are required to be customized for different mobile phones and APP, and the technical problems of poor universality, single operation and low acquisition efficiency exist. Secondly, the current automatic acquisition operation is single, and basically the automatic acquisition operation is performed according to a set program, and the acquired test data does not have authenticity and universality.

Disclosure of Invention

Aiming at the defects in the prior art, the application provides a data acquisition method based on depth simulation operation to solve the technical problems.

The application provides a data acquisition method based on depth simulation operation, which comprises the following steps:

collecting a data set of complete operation behaviors of a plurality of APP;

training a simulated user operation model established by using a DQN algorithm by using a data set;

the simulated user operation model comprises two DQN network models, namely a behavior strategy network and a target strategy network; defining a control operated by a user as an action a, and displaying an interface as a state s after the action is executed;

the behavior policy network is used for evaluating the current state s _t Next each action a _t And then selects action a with the largest Q ()'s by greedy method _t The environment receives action a _t Will give a prize r _t And next state s _t+1 Obtaining a state transition array { current state s } of the user operation APP at each time step t _t Current state s _t Action a generated _t Action a _t Generated rewards r _t The next state s after the action is performed _t+1 }；

The target policy network is used for, according to the current state s _t Generating action a to be executed at the current moment _t Representing determination of the interface to be treated based on the current interfaceControl operated according to the next state s _t+1 Generating action a to be executed at the next moment _t+1 Representing determining a control to be operated in the next step according to the interface to be jumped;

the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process.

Further, the dataset is divided into a plurality of subsets according to the APP-formulation, i.e. the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs, APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control;

further, the method further comprises the following steps: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.

Further, the training function of Q () is:；/>q value representing behavior policy network output, +.>For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->，/>Is a target parameter of the Q value; t is the time step, a _t Action at time t, s _t A is the state at the time t, a _t+1 For the next moment _t+1 Action s of (a) _t+1 For the next moment _t+1 Is a state of (2).

Further, training an interface content identification module for simulating the user operation model; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.

Further, the user operation simulation model performs interface recognition and user operation simulation on the target APP, and includes:

after entering the APP primary page, randomly clicking a control of the primary page;

entering a secondary page corresponding to one control after clicking the control each time, and analyzing the data of the secondary page by adopting different acquisition modes according to the type of the control; executing a sliding operation once after the secondary page data analysis is completed, judging whether the UI structure of the interface before sliding is consistent with the UI structure of the interface after sliding each time, if not, judging that the sliding does not slide to the bottom of the interface, and continuing to automatically simulate clicking to perform repeated operation; if the current interface is consistent, judging that the current interface slides to the bottom, and clicking to return to the previous interface;

searching a comment module, an article module and a photo module in the secondary page, and respectively performing simulation operation of corresponding actions in interfaces of the comment module, the article module and the photo module;

recording the clicking condition of the control of the primary page, acquiring the data of the current interface, continuing to slide downwards, and repeating the steps.

The application has the beneficial effects that: according to the APP data acquisition method based on the depth simulation user operation, the behavior data of a large number of different mobile phones for operating different APP are collected to carry out the deep reinforcement learning algorithm model training, and the information acquisition through the mode of simulating the operation of a real person is realized through the improved training method, so that the problems of high difficulty and low automation degree in the process of adopting customized acquisition in the existing mobile phone APP test are solved.

The method comprises the steps of establishing a deep anthropomorphic system, recording misoperation behaviors such as screen sliding shake, click deviation, accidental false clicks and the like in real person operation, analyzing and summarizing rules of misoperation types and frequency, avoiding sealing inhibition caused by excessive mechanical operation, and therefore solving the problem of single operation.

The clicking times of the control are recorded, the data are ensured to be not missed in the automatic acquisition process, the possibility of missing acquisition and wrong acquisition is avoided, and the problem of poor universality is solved.

The method can automatically identify whether the interface reaches the bottom of the page, identify different news types, timely find out and make the next operation, avoid repeated clicking, continuous sliding, no pertinence in collection of different types of articles and the like, and solve the problem of low collection efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a method of one embodiment of the application.

FIG. 2 is a schematic diagram of a model internal pass state determination action of one embodiment of the present application.

FIG. 3 is a schematic flow diagram of a news-like APP anthropomorphic operation with a model in one embodiment of the application.

Detailed Description

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

The following explains key terms appearing in the present application.

The application uses the existing DQN algorithm model, namely a Q-learning algorithm based on deep learning, mainly combines a value function approximation (Value Function Approximation) and neural network technology, and adopts a target network and a playback experience method to train the network.

The embodiment of the application provides a data acquisition method based on depth simulation operation, which comprises the following steps:

collecting a data set of complete operation behaviors of a plurality of APP;

the behavior policy network is used for evaluating the current state s _t Next each action a _t And then selects action a with the largest Q ()'s by greedy method _t The environment receives action a _t Will give a prize r _t And next state s _t+1 Obtaining a state transition array { current state s } of the user operation APP at each time step t _t Current state s _t Generated motionAct as a _t Action a _t Generated rewards r _t The next state s after the action is performed _t+1 }；

The target policy network is used for, according to the current state s _t Generating action a to be executed at the current moment _t Representing that the control to be operated is determined according to the current interface, and the control to be operated is determined according to the next state s _t+1 Generating action a to be executed at the next moment _t+1 Representing determining a control to be operated in the next step according to the interface to be jumped;

Alternatively, as an embodiment of the present application, the dataset is divided into a plurality of subsets according to the APP-formulation, i.e. the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs and APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control.

The real person operation adopted by the application is a special tester, the content of data acquisition does not comprise personal information of a user, and the content is only operation data stored in the background after APP operation, such as operation time and reaction time of a control, and published interface information of the APP, such as comments and pictures.

The manner of capturing the packet or python crawler by the APP commonly used in the art is adopted in the data acquisition process, and detailed description is omitted.

Optionally, as an embodiment of the present application, further includes: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.

All operations performed on the APP are set to be corresponding standard operations, for example, a sliding screen standard operation with the sliding time length reaching a standard value corresponding to 'sliding screen shake', a clicking control operation with the accurate position corresponding to 'clicking deviation and accidental clicking error' and the clicking time exceeding the standard value, and if the identification method of misoperation is that the corresponding operation cannot reach the standard value of the corresponding standard operation, the misoperation is judged. All relevant data of the identified misoperations are recorded in a specific database.

Alternatively, as one embodiment of the present application, the training function of Q () is:；/>representing the Q value of the behavioural policy network output,

for updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->，/>Is a target parameter of the Q value; t is the time step, a _t Action at time t, s _t A is the state at the time t, a _t+1 For the next moment _t+1 Action s of (a) _t+1 For the next moment _t+1 Is a state of (2).

According to the embodiment of the application, the algorithm model of the depth anthropomorphic is obtained through repeated model iterative training, so that a balance point between the model simulation real person and the real user operation is realized, and the failure caused by the over-mechanized operation is reduced.

Optionally, as an embodiment of the present application, the interface content recognition module that simulates the user operation model is trained; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.

Optionally, as an embodiment of the present application, the simulation user operation model performs interface identification and simulation user operation on the target APP, including:

Specifically, analyzing the two-level page data by adopting different acquisition modes according to the type of the control, for example, the type of the control is a button, and the acquisition modes are single click and double click; the type of the control is a sliding block, and the acquisition mode is sliding; the control is of a picture type and the acquisition mode is image capturing.

In order to facilitate understanding of the present application, the APP data acquisition method based on the depth simulation user operation provided by the present application is further described below by combining the data acquisition process of the news APP in the embodiment with the principle of the APP data acquisition method based on the depth simulation user operation.

As shown in fig. 1, the method includes:

s1, operating a plurality of APP by a real person, collecting real user operation data, and constructing a data set;

s2, training a simulation user operation model established by using a DQN algorithm by using a data set;

s3, simulating a target APP to be acquired by a real person to realize data acquisition by using a trained simulated user operation model;

and S4, cleaning the acquired data, uniformly packaging and storing the acquired data.

Specifically, in S1, it is necessary to collect behavior data of an original real person operating APP, combine the user operation behavior data and contribution type data on the APP into an original data set trained by a user operation model, and define, package and store each data. The application can operate APP installed on multiple mobile phones, and solves the influence of different mobile phone brands, models and sizes on data acquisition

The real person obtains real person behavior data by carrying out M times of clicking access behaviors on N common news information APP, the real person enters one APP from clicking to reading access to finish exiting the APP is a group of complete operation behaviors, and the effectiveness, the comprehensiveness and the universality of an acquisition method of the behavior data can be ensured.

The APP set includes the acquired Name, icon and user behavior data, for example, the APP set is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs, APPn represents the nth APP, N e [1, N ]; the representation of the user operation behavior is exemplified by APP 1: APP 1_action= { Action1, action2, …, action M, …, action M }, where M represents the total number of times a complete set of operation actions is performed, action M represents the mth operation Action, M e [1, M ]. The user behavior data includes: interface sliding length, duration, type, position and duration of the control, and interface change data after each time of sliding and clicking the control, namely how long a real person has used to slide on which page or click on which control.

Meanwhile, the user behavior data records the real person operation behaviors, the situations of screen sliding shake, click deviation and accidental false click can occur, and analysis and summary rules are carried out on the types and the frequency of misoperation.

In addition, the collected data also comprises manuscript types, including normal manuscripts with conventional image-text manuscripts and special video, audio and live manuscripts, and abnormal manuscripts with thematic, external links, advertisements and the like.

In S2, a real person operation mobile phone APP is deeply simulated based on an DQN algorithm, and data used for simulating user operation model training is transitions: { current state S _t Action a generated _t The action generates a prize r _t The next state s after the action is performed _t+1 As shown in fig. 2, 4 features may be used to represent the state of the current operation APP, such as control position, click speed, sliding distance, sliding speed.

The improved training is carried out by adopting a deep reinforcement learning algorithm model, and the specific training needs the following formulas:。

q value representing behavior policy network output, +.>For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->，/>Is a target parameter of the Q value; t is the time step, a _t Action at time t, s _t A is the state at the time t, a _t+1 For the next moment _t+1 Action s of (a) _t+1 For the next moment _t+1 Is a state of (2).

The simulation user operation model realized by the depth anthropomorphic algorithm can identify the interface, on one hand, the interface identifies and judges the UI structure information before each sliding and is consistent with the UI structure information after sliding, so that the user can slide to the bottom, and invalid sliding is avoided, so that the time is saved;

in another aspect, the article identifies content types, such as identifying image text manuscripts, video manuscripts, audio manuscripts, live broadcast, thematic, external link, advertisement, and different manuscripts adopt different clicking and collecting modes, thereby realizing targeted data collection and improving data collection efficiency.

And through repeated model iterative training, a depth anthropomorphic algorithm model is obtained, a balance point between the actual human and actual user operation is simulated by the model, and failure caused by over-mechanized operation is reduced.

In S3, based on the automatic operation module of the user operation model, the APP is automatically operated, and all elements of the interface are traversed by using the identification interface and the simulated click control, and the mobile phone is operated by using the existing Python third party library UIAutomator2 tool in this embodiment, as shown in fig. 3, which specifically includes the following procedures.

Firstly, inputting the name of the APP to be acquired at the moment, clicking the APP to enter, and automatically giving necessary permission, agreeing to a protocol request, skipping open screen advertisements, popup windows and other permission requirements by utilizing the existing watch technology.

And acquiring the number of articles in the current list through the UIAutometer 2 on the article list page, and then randomly clicking the title of the current list, wherein clicking of the title is realized through 'd.xpath (). Click ()' until all the articles in the current list are clicked, so that full-coverage clicking of the title is realized.

And (3) entering an article detail page, judging news attributes according to a user operation model, acquiring contents by adopting different acquisition modes, and analyzing the detail page. Taking graphic news as an example, the method specifically comprises the following steps:

firstly clicking a sharing button to acquire a sharing link; then acquiring article detail data such as titles, release time, article details and the like through sharing links; and finally clicking to enter the comment page.

Entering a comment interface, judging whether the interface has comments according to a user operation model, and analyzing a comment page; and the analysis page acquires useful data such as comment content, comment time and the like. Performing sliding operation after the current interface comment is obtained, wherein the sliding operation is realized through 'd.swipe ()', the interface is identified, and when the interface identification is performed, whether the UI structure information before each sliding is consistent with the UI structure information after the sliding is judged; if the sliding is inconsistent, judging that the sliding does not slide to the bottom of the interface, continuing to automatically simulate clicking, and repeating the operation; if so, the user can be regarded as sliding to the bottom, complete the comment interface traversal and return to the article list page.

And recording the clicking times of the page control through the user operation model, judging the articles which are not clicked according to the clicking times, selecting the articles which are not clicked on the article list page to click, and improving the content acquisition efficiency. If the list page does not click on the article, acquiring one screen of data of the article list, continuing to slide downwards, and repeating the steps.

In S4, the collected data are classified and cleaned according to different news category attributes, the cleaned data are uniformly packaged, and the packaged data are stored in a database to obtain a final news APP general content collection set.

The collected data are stored in a special database, the specific data classification comprises the steps of respectively storing the collected data in different data tables according to the classification of the data types, and the data cleaning comprises the operations of outlier processing, unification processing, missing value processing, repeated value deleting and the like, wherein the data cleaning adopts the processing functions commonly used in the field at present and is not described in detail herein.

Although the present application has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present application is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present application by those skilled in the art without departing from the spirit and scope of the present application, and it is intended that all such modifications and substitutions be within the scope of the present application/be within the scope of the present application as defined by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The data acquisition method based on the depth simulation operation is characterized by comprising the following steps of:

collecting a data set of complete operation behaviors of a plurality of APP;

2. The method of claim 1, wherein the dataset is divided into a plurality of subsets according to the APP-based formulation, i.e., the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs and APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control.

3. The method as recited in claim 2, further comprising: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.

4. The method of claim 1, wherein the training function of Q () is:；/>representing behavioural policy networksQ value of output->For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->，，/>Is a target parameter of the Q value; t is the time step, a _t Action at time t, s _t A is the state at the time t, a _t+1 For the next moment _t+1 Action s of (a) _t+1 For the next moment _t+1 Is a state of (2).

5. The method of claim 1, wherein the interface content recognition module that simulates a user operation model is trained; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.

6. The method of claim 1, wherein the simulated user operation model performs interface recognition and simulated user operation on the target APP, comprising: