CN116244161B - Data acquisition method based on depth simulation operation - Google Patents

Data acquisition method based on depth simulation operation Download PDF

Info

Publication number
CN116244161B
CN116244161B CN202310530049.7A CN202310530049A CN116244161B CN 116244161 B CN116244161 B CN 116244161B CN 202310530049 A CN202310530049 A CN 202310530049A CN 116244161 B CN116244161 B CN 116244161B
Authority
CN
China
Prior art keywords
action
interface
app
data
user operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310530049.7A
Other languages
Chinese (zh)
Other versions
CN116244161A (en
Inventor
魏传强
矫娟
宋耀
徐哲
司君波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Qilu Yidian Media Co ltd
Original Assignee
Shandong Qilu Yidian Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Qilu Yidian Media Co ltd filed Critical Shandong Qilu Yidian Media Co ltd
Priority to CN202310530049.7A priority Critical patent/CN116244161B/en
Publication of CN116244161A publication Critical patent/CN116244161A/en
Application granted granted Critical
Publication of CN116244161B publication Critical patent/CN116244161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0483Interaction with page-structured environments, e.g. book metaphor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a data acquisition method based on depth simulation operation, which comprises the following steps: collecting a data set of complete operation behaviors of a plurality of APP; training a simulated user operation model established by using a DQN algorithm by using a data set; the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process. According to the application, the deep reinforcement learning algorithm model training is performed by collecting a large amount of behavior data of different APP operated by different mobile phones, and the information acquisition by simulating the operation of a real person is realized by an improved training method.

Description

Data acquisition method based on depth simulation operation
Technical Field
The application belongs to the technical field of data acquisition, and particularly relates to a data acquisition method based on depth simulation operation.
Background
The arrival of the mobile internet era has caused great changes in the habit, mode, scene and channel of news reading. On the one hand, the time for people to browse news in a fixed place is reduced, and the time for acquiring news by using the fragmentation time is increased, so that the typical trend of mobility, fragmentation and convenience is presented. On the other hand, users prefer the news content of 'short, flat and fast', and attach importance to their own participation in the process of reading news, so that the mobile news information platform becomes one of important channels for users to receive news.
In order to ensure normal use of the APP, further research is required to be performed on the APP and collect relevant data in the test process, for example, whether each APP in the same model of mobile phone can normally operate or not is researched, and whether the same APP can normally operate in multiple models of mobile phones or not is researched. At present, a mobile phone development enterprise usually has a plurality of types of mobile phones under the flag, the mobile phone types are various, the APP interface is not uniform, the automatic acquisition difficulty of the traditional APP data acquisition scheme for data is high, programs with poor flexibility are required to be customized for different mobile phones and APP, and the technical problems of poor universality, single operation and low acquisition efficiency exist. Secondly, the current automatic acquisition operation is single, and basically the automatic acquisition operation is performed according to a set program, and the acquired test data does not have authenticity and universality.
Disclosure of Invention
Aiming at the defects in the prior art, the application provides a data acquisition method based on depth simulation operation to solve the technical problems.
The application provides a data acquisition method based on depth simulation operation, which comprises the following steps:
collecting a data set of complete operation behaviors of a plurality of APP;
training a simulated user operation model established by using a DQN algorithm by using a data set;
the simulated user operation model comprises two DQN network models, namely a behavior strategy network and a target strategy network; defining a control operated by a user as an action a, and displaying an interface as a state s after the action is executed;
the behavior policy network is used for evaluating the current state s t Next each action a t And then selects action a with the largest Q ()'s by greedy method t The environment receives action a t Will give a prize r t And next state s t+1 Obtaining a state transition array { current state s } of the user operation APP at each time step t t Current state s t Action a generated t Action a t Generated rewards r t The next state s after the action is performed t+1 };
The target policy network is used for, according to the current state s t Generating action a to be executed at the current moment t Representing determination of the interface to be treated based on the current interfaceControl operated according to the next state s t+1 Generating action a to be executed at the next moment t+1 Representing determining a control to be operated in the next step according to the interface to be jumped;
the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process.
Further, the dataset is divided into a plurality of subsets according to the APP-formulation, i.e. the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs, APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control;
further, the method further comprises the following steps: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.
Further, the training function of Q () is:;/>q value representing behavior policy network output, +.>For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->,/>Is a target parameter of the Q value; t is the time step, a t Action at time t, s t A is the state at the time t, a t+1 For the next moment t+1 Action s of (a) t+1 For the next moment t+1 Is a state of (2).
Further, training an interface content identification module for simulating the user operation model; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.
Further, the user operation simulation model performs interface recognition and user operation simulation on the target APP, and includes:
after entering the APP primary page, randomly clicking a control of the primary page;
entering a secondary page corresponding to one control after clicking the control each time, and analyzing the data of the secondary page by adopting different acquisition modes according to the type of the control; executing a sliding operation once after the secondary page data analysis is completed, judging whether the UI structure of the interface before sliding is consistent with the UI structure of the interface after sliding each time, if not, judging that the sliding does not slide to the bottom of the interface, and continuing to automatically simulate clicking to perform repeated operation; if the current interface is consistent, judging that the current interface slides to the bottom, and clicking to return to the previous interface;
searching a comment module, an article module and a photo module in the secondary page, and respectively performing simulation operation of corresponding actions in interfaces of the comment module, the article module and the photo module;
recording the clicking condition of the control of the primary page, acquiring the data of the current interface, continuing to slide downwards, and repeating the steps.
The application has the beneficial effects that: according to the APP data acquisition method based on the depth simulation user operation, the behavior data of a large number of different mobile phones for operating different APP are collected to carry out the deep reinforcement learning algorithm model training, and the information acquisition through the mode of simulating the operation of a real person is realized through the improved training method, so that the problems of high difficulty and low automation degree in the process of adopting customized acquisition in the existing mobile phone APP test are solved.
The method comprises the steps of establishing a deep anthropomorphic system, recording misoperation behaviors such as screen sliding shake, click deviation, accidental false clicks and the like in real person operation, analyzing and summarizing rules of misoperation types and frequency, avoiding sealing inhibition caused by excessive mechanical operation, and therefore solving the problem of single operation.
The clicking times of the control are recorded, the data are ensured to be not missed in the automatic acquisition process, the possibility of missing acquisition and wrong acquisition is avoided, and the problem of poor universality is solved.
The method can automatically identify whether the interface reaches the bottom of the page, identify different news types, timely find out and make the next operation, avoid repeated clicking, continuous sliding, no pertinence in collection of different types of articles and the like, and solve the problem of low collection efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a method of one embodiment of the application.
FIG. 2 is a schematic diagram of a model internal pass state determination action of one embodiment of the present application.
FIG. 3 is a schematic flow diagram of a news-like APP anthropomorphic operation with a model in one embodiment of the application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
The following explains key terms appearing in the present application.
The application uses the existing DQN algorithm model, namely a Q-learning algorithm based on deep learning, mainly combines a value function approximation (Value Function Approximation) and neural network technology, and adopts a target network and a playback experience method to train the network.
The embodiment of the application provides a data acquisition method based on depth simulation operation, which comprises the following steps:
collecting a data set of complete operation behaviors of a plurality of APP;
training a simulated user operation model established by using a DQN algorithm by using a data set;
the simulated user operation model comprises two DQN network models, namely a behavior strategy network and a target strategy network; defining a control operated by a user as an action a, and displaying an interface as a state s after the action is executed;
the behavior policy network is used for evaluating the current state s t Next each action a t And then selects action a with the largest Q ()'s by greedy method t The environment receives action a t Will give a prize r t And next state s t+1 Obtaining a state transition array { current state s } of the user operation APP at each time step t t Current state s t Generated motionAct as a t Action a t Generated rewards r t The next state s after the action is performed t+1 };
The target policy network is used for, according to the current state s t Generating action a to be executed at the current moment t Representing that the control to be operated is determined according to the current interface, and the control to be operated is determined according to the next state s t+1 Generating action a to be executed at the next moment t+1 Representing determining a control to be operated in the next step according to the interface to be jumped;
the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process.
Alternatively, as an embodiment of the present application, the dataset is divided into a plurality of subsets according to the APP-formulation, i.e. the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs and APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control.
The real person operation adopted by the application is a special tester, the content of data acquisition does not comprise personal information of a user, and the content is only operation data stored in the background after APP operation, such as operation time and reaction time of a control, and published interface information of the APP, such as comments and pictures.
The manner of capturing the packet or python crawler by the APP commonly used in the art is adopted in the data acquisition process, and detailed description is omitted.
Optionally, as an embodiment of the present application, further includes: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.
All operations performed on the APP are set to be corresponding standard operations, for example, a sliding screen standard operation with the sliding time length reaching a standard value corresponding to 'sliding screen shake', a clicking control operation with the accurate position corresponding to 'clicking deviation and accidental clicking error' and the clicking time exceeding the standard value, and if the identification method of misoperation is that the corresponding operation cannot reach the standard value of the corresponding standard operation, the misoperation is judged. All relevant data of the identified misoperations are recorded in a specific database.
Alternatively, as one embodiment of the present application, the training function of Q () is:;/>representing the Q value of the behavioural policy network output,
for updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->,/>Is a target parameter of the Q value; t is the time step, a t Action at time t, s t A is the state at the time t, a t+1 For the next moment t+1 Action s of (a) t+1 For the next moment t+1 Is a state of (2).
According to the embodiment of the application, the algorithm model of the depth anthropomorphic is obtained through repeated model iterative training, so that a balance point between the model simulation real person and the real user operation is realized, and the failure caused by the over-mechanized operation is reduced.
Optionally, as an embodiment of the present application, the interface content recognition module that simulates the user operation model is trained; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.
Optionally, as an embodiment of the present application, the simulation user operation model performs interface identification and simulation user operation on the target APP, including:
after entering the APP primary page, randomly clicking a control of the primary page;
entering a secondary page corresponding to one control after clicking the control each time, and analyzing the data of the secondary page by adopting different acquisition modes according to the type of the control; executing a sliding operation once after the secondary page data analysis is completed, judging whether the UI structure of the interface before sliding is consistent with the UI structure of the interface after sliding each time, if not, judging that the sliding does not slide to the bottom of the interface, and continuing to automatically simulate clicking to perform repeated operation; if the current interface is consistent, judging that the current interface slides to the bottom, and clicking to return to the previous interface;
searching a comment module, an article module and a photo module in the secondary page, and respectively performing simulation operation of corresponding actions in interfaces of the comment module, the article module and the photo module;
recording the clicking condition of the control of the primary page, acquiring the data of the current interface, continuing to slide downwards, and repeating the steps.
Specifically, analyzing the two-level page data by adopting different acquisition modes according to the type of the control, for example, the type of the control is a button, and the acquisition modes are single click and double click; the type of the control is a sliding block, and the acquisition mode is sliding; the control is of a picture type and the acquisition mode is image capturing.
In order to facilitate understanding of the present application, the APP data acquisition method based on the depth simulation user operation provided by the present application is further described below by combining the data acquisition process of the news APP in the embodiment with the principle of the APP data acquisition method based on the depth simulation user operation.
As shown in fig. 1, the method includes:
s1, operating a plurality of APP by a real person, collecting real user operation data, and constructing a data set;
s2, training a simulation user operation model established by using a DQN algorithm by using a data set;
s3, simulating a target APP to be acquired by a real person to realize data acquisition by using a trained simulated user operation model;
and S4, cleaning the acquired data, uniformly packaging and storing the acquired data.
Specifically, in S1, it is necessary to collect behavior data of an original real person operating APP, combine the user operation behavior data and contribution type data on the APP into an original data set trained by a user operation model, and define, package and store each data. The application can operate APP installed on multiple mobile phones, and solves the influence of different mobile phone brands, models and sizes on data acquisition
The real person obtains real person behavior data by carrying out M times of clicking access behaviors on N common news information APP, the real person enters one APP from clicking to reading access to finish exiting the APP is a group of complete operation behaviors, and the effectiveness, the comprehensiveness and the universality of an acquisition method of the behavior data can be ensured.
The APP set includes the acquired Name, icon and user behavior data, for example, the APP set is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs, APPn represents the nth APP, N e [1, N ]; the representation of the user operation behavior is exemplified by APP 1: APP 1_action= { Action1, action2, …, action M, …, action M }, where M represents the total number of times a complete set of operation actions is performed, action M represents the mth operation Action, M e [1, M ]. The user behavior data includes: interface sliding length, duration, type, position and duration of the control, and interface change data after each time of sliding and clicking the control, namely how long a real person has used to slide on which page or click on which control.
Meanwhile, the user behavior data records the real person operation behaviors, the situations of screen sliding shake, click deviation and accidental false click can occur, and analysis and summary rules are carried out on the types and the frequency of misoperation.
In addition, the collected data also comprises manuscript types, including normal manuscripts with conventional image-text manuscripts and special video, audio and live manuscripts, and abnormal manuscripts with thematic, external links, advertisements and the like.
In S2, a real person operation mobile phone APP is deeply simulated based on an DQN algorithm, and data used for simulating user operation model training is transitions: { current state S t Action a generated t The action generates a prize r t The next state s after the action is performed t+1 As shown in fig. 2, 4 features may be used to represent the state of the current operation APP, such as control position, click speed, sliding distance, sliding speed.
The improved training is carried out by adopting a deep reinforcement learning algorithm model, and the specific training needs the following formulas:
q value representing behavior policy network output, +.>For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->,/>Is a target parameter of the Q value; t is the time step, a t Action at time t, s t A is the state at the time t, a t+1 For the next moment t+1 Action s of (a) t+1 For the next moment t+1 Is a state of (2).
The simulation user operation model realized by the depth anthropomorphic algorithm can identify the interface, on one hand, the interface identifies and judges the UI structure information before each sliding and is consistent with the UI structure information after sliding, so that the user can slide to the bottom, and invalid sliding is avoided, so that the time is saved;
in another aspect, the article identifies content types, such as identifying image text manuscripts, video manuscripts, audio manuscripts, live broadcast, thematic, external link, advertisement, and different manuscripts adopt different clicking and collecting modes, thereby realizing targeted data collection and improving data collection efficiency.
And through repeated model iterative training, a depth anthropomorphic algorithm model is obtained, a balance point between the actual human and actual user operation is simulated by the model, and failure caused by over-mechanized operation is reduced.
In S3, based on the automatic operation module of the user operation model, the APP is automatically operated, and all elements of the interface are traversed by using the identification interface and the simulated click control, and the mobile phone is operated by using the existing Python third party library UIAutomator2 tool in this embodiment, as shown in fig. 3, which specifically includes the following procedures.
Firstly, inputting the name of the APP to be acquired at the moment, clicking the APP to enter, and automatically giving necessary permission, agreeing to a protocol request, skipping open screen advertisements, popup windows and other permission requirements by utilizing the existing watch technology.
And acquiring the number of articles in the current list through the UIAutometer 2 on the article list page, and then randomly clicking the title of the current list, wherein clicking of the title is realized through 'd.xpath (). Click ()' until all the articles in the current list are clicked, so that full-coverage clicking of the title is realized.
And (3) entering an article detail page, judging news attributes according to a user operation model, acquiring contents by adopting different acquisition modes, and analyzing the detail page. Taking graphic news as an example, the method specifically comprises the following steps:
firstly clicking a sharing button to acquire a sharing link; then acquiring article detail data such as titles, release time, article details and the like through sharing links; and finally clicking to enter the comment page.
Entering a comment interface, judging whether the interface has comments according to a user operation model, and analyzing a comment page; and the analysis page acquires useful data such as comment content, comment time and the like. Performing sliding operation after the current interface comment is obtained, wherein the sliding operation is realized through 'd.swipe ()', the interface is identified, and when the interface identification is performed, whether the UI structure information before each sliding is consistent with the UI structure information after the sliding is judged; if the sliding is inconsistent, judging that the sliding does not slide to the bottom of the interface, continuing to automatically simulate clicking, and repeating the operation; if so, the user can be regarded as sliding to the bottom, complete the comment interface traversal and return to the article list page.
And recording the clicking times of the page control through the user operation model, judging the articles which are not clicked according to the clicking times, selecting the articles which are not clicked on the article list page to click, and improving the content acquisition efficiency. If the list page does not click on the article, acquiring one screen of data of the article list, continuing to slide downwards, and repeating the steps.
In S4, the collected data are classified and cleaned according to different news category attributes, the cleaned data are uniformly packaged, and the packaged data are stored in a database to obtain a final news APP general content collection set.
The collected data are stored in a special database, the specific data classification comprises the steps of respectively storing the collected data in different data tables according to the classification of the data types, and the data cleaning comprises the operations of outlier processing, unification processing, missing value processing, repeated value deleting and the like, wherein the data cleaning adopts the processing functions commonly used in the field at present and is not described in detail herein.
Although the present application has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present application is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present application by those skilled in the art without departing from the spirit and scope of the present application, and it is intended that all such modifications and substitutions be within the scope of the present application/be within the scope of the present application as defined by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. The data acquisition method based on the depth simulation operation is characterized by comprising the following steps of:
collecting a data set of complete operation behaviors of a plurality of APP;
training a simulated user operation model established by using a DQN algorithm by using a data set;
the simulated user operation model comprises two DQN network models, namely a behavior strategy network and a target strategy network; defining a control operated by a user as an action a, and displaying an interface as a state s after the action is executed;
the behavior policy network is used for evaluating the current state s t Next each action a t And then selects action a with the largest Q ()'s by greedy method t The environment receives action a t Will give a prize r t And next state s t+1 Obtaining a state transition array { current state s } of the user operation APP at each time step t t Current state s t Action a generated t Action a t Generated rewards r t The next state s after the action is performed t+1 };
The target policy network is used for, according to the current state s t Generating action a to be executed at the current moment t Representing that the control to be operated is determined according to the current interface, and the control to be operated is determined according to the next state s t+1 Generating action a to be executed at the next moment t+1 Representing determining a control to be operated in the next step according to the interface to be jumped;
the simulation user operation model carries out interface identification and simulation user operation on the target APP, determines a corresponding operation type according to the content type identified by the interface, and simultaneously collects all data in the simulation user operation process.
2. The method of claim 1, wherein the dataset is divided into a plurality of subsets according to the APP-based formulation, i.e., the dataset is app_name= { APP1, APP2, …, APPn, …, APPn }, where N represents the total number of APPs and APPn represents the nth APP, N e [1, N ]; the APPn subset comprises APPn corresponding to user operation data APPn_Action, and each APP operation behavior subset is APPn_Action= { Action1, action2, …, action M, … and Action M }, wherein M represents the total number of times of complete operation behaviors in the APPn, action M represents the mth operation, and M E [1, M ]; the user operation data corresponding to the action comprises: the interface sliding length, the interface sliding duration, the type, the position and the duration of the control, and interface change data after each sliding and clicking of the control.
3. The method as recited in claim 2, further comprising: in the process of collecting the operation of a real person, automatically recording the misoperation behaviors of a user, and manually classifying and marking the types and the frequencies of the misoperation, wherein the misoperation comprises the following steps: screen swipe, click offset, accidental false clicks.
4. The method of claim 1, wherein the training function of Q () is:;/>representing behavioural policy networksQ value of output->For updated->Wherein I is the number of iterative updates, +.>For learning parameters, wherein->,,/>Is a target parameter of the Q value; t is the time step, a t Action at time t, s t A is the state at the time t, a t+1 For the next moment t+1 Action s of (a) t+1 For the next moment t+1 Is a state of (2).
5. The method of claim 1, wherein the interface content recognition module that simulates a user operation model is trained; the content identification comprises the steps of identifying the content of a page as a normal page or an abnormal page, wherein rewards of the normal page are positive, and rewards of the abnormal page are negative; the method comprises the steps that content types are to be identified in a normal page, the content types comprise articles, comments and pictures, and corresponding non-displayed actions are determined according to the content types; the actions corresponding to the article are: downslide, amplifying fonts and restoring fonts; the action corresponding to the comment is to develop comment details, comment praise and comment dislike; the action corresponding to the picture is to click to enlarge the picture and enlarge the slide-down picture.
6. The method of claim 1, wherein the simulated user operation model performs interface recognition and simulated user operation on the target APP, comprising:
after entering the APP primary page, randomly clicking a control of the primary page;
entering a secondary page corresponding to one control after clicking the control each time, and analyzing the data of the secondary page by adopting different acquisition modes according to the type of the control; executing a sliding operation once after the secondary page data analysis is completed, judging whether the UI structure of the interface before sliding is consistent with the UI structure of the interface after sliding each time, if not, judging that the sliding does not slide to the bottom of the interface, and continuing to automatically simulate clicking to perform repeated operation; if the current interface is consistent, judging that the current interface slides to the bottom, and clicking to return to the previous interface;
searching a comment module, an article module and a photo module in the secondary page, and respectively performing simulation operation of corresponding actions in interfaces of the comment module, the article module and the photo module;
recording the clicking condition of the control of the primary page, acquiring the data of the current interface, continuing to slide downwards, and repeating the steps.
CN202310530049.7A 2023-05-12 2023-05-12 Data acquisition method based on depth simulation operation Active CN116244161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310530049.7A CN116244161B (en) 2023-05-12 2023-05-12 Data acquisition method based on depth simulation operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310530049.7A CN116244161B (en) 2023-05-12 2023-05-12 Data acquisition method based on depth simulation operation

Publications (2)

Publication Number Publication Date
CN116244161A CN116244161A (en) 2023-06-09
CN116244161B true CN116244161B (en) 2023-08-11

Family

ID=86633479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310530049.7A Active CN116244161B (en) 2023-05-12 2023-05-12 Data acquisition method based on depth simulation operation

Country Status (1)

Country Link
CN (1) CN116244161B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541006B (en) * 2023-06-28 2024-01-26 壹仟零壹艺网络科技(北京)有限公司 Graphic processing method and device for computer man-machine interaction interface

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135487A (en) * 2019-05-09 2019-08-16 国网山东省电力公司滨州供电公司 A kind of computer user mouse Behavior modeling method
CN110334142A (en) * 2019-06-18 2019-10-15 北京红云融通技术有限公司 Intelligent data acquisition method, terminal, server and interactive system
CN111881620A (en) * 2020-07-15 2020-11-03 哈尔滨工业大学(威海) User software behavior simulation system based on reinforcement learning algorithm and GAN model and working method thereof
CN112579852A (en) * 2019-09-30 2021-03-30 厦门邑通软件科技有限公司 Interactive webpage data accurate acquisition method
CN112685318A (en) * 2021-01-07 2021-04-20 广州三星通信技术研究有限公司 Method and system for generating test script
CN113779540A (en) * 2021-08-17 2021-12-10 广东融合通信股份有限公司 Enterprise public notice information data acquisition method based on RPA
CN114139637A (en) * 2021-12-03 2022-03-04 哈尔滨工业大学(深圳) Multi-agent information fusion method and device, electronic equipment and readable storage medium
CN114328169A (en) * 2020-10-09 2022-04-12 福建天泉教育科技有限公司 Dynamic page testing method and system
CN114610639A (en) * 2022-03-22 2022-06-10 广州虎牙科技有限公司 Method, device and equipment for testing graphical user interface and storage medium
CN115292571A (en) * 2022-08-08 2022-11-04 烟台中科网络技术研究所 App data acquisition method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135487A (en) * 2019-05-09 2019-08-16 国网山东省电力公司滨州供电公司 A kind of computer user mouse Behavior modeling method
CN110334142A (en) * 2019-06-18 2019-10-15 北京红云融通技术有限公司 Intelligent data acquisition method, terminal, server and interactive system
CN112579852A (en) * 2019-09-30 2021-03-30 厦门邑通软件科技有限公司 Interactive webpage data accurate acquisition method
CN111881620A (en) * 2020-07-15 2020-11-03 哈尔滨工业大学(威海) User software behavior simulation system based on reinforcement learning algorithm and GAN model and working method thereof
CN114328169A (en) * 2020-10-09 2022-04-12 福建天泉教育科技有限公司 Dynamic page testing method and system
CN112685318A (en) * 2021-01-07 2021-04-20 广州三星通信技术研究有限公司 Method and system for generating test script
CN113779540A (en) * 2021-08-17 2021-12-10 广东融合通信股份有限公司 Enterprise public notice information data acquisition method based on RPA
CN114139637A (en) * 2021-12-03 2022-03-04 哈尔滨工业大学(深圳) Multi-agent information fusion method and device, electronic equipment and readable storage medium
CN114610639A (en) * 2022-03-22 2022-06-10 广州虎牙科技有限公司 Method, device and equipment for testing graphical user interface and storage medium
CN115292571A (en) * 2022-08-08 2022-11-04 烟台中科网络技术研究所 App data acquisition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Web数据的深度定向采集;夏天;;山东大学学报(理学版)(第05期);全文 *

Also Published As

Publication number Publication date
CN116244161A (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Gaumont et al. Reconstruction of the socio-semantic dynamics of political activist Twitter networks—Method and application to the 2017 French presidential election
Dougherty et al. Researcher engagement with web archives: State of the art
US10671245B2 (en) Collection and control of user activity set data and activity set user interface
Fogues et al. Open challenges in relationship-based privacy mechanisms for social network services
US7383505B2 (en) Information sharing device and information sharing method
US11640583B2 (en) Generation of user profile from source code
US8972498B2 (en) Mobile-based realtime location-sensitive social event engine
US20080313127A1 (en) Multidimensional timeline browsers for broadcast media
Smith Macrostructure from microstructure: Generating whole systems from ego networks
WO2012011092A1 (en) System, method and device for intelligent textual conversation system
Bishop A proposal for archiving context for secondary analysis
CN116244161B (en) Data acquisition method based on depth simulation operation
US20200334697A1 (en) Generating survey responses from unsolicited messages
CN110476162B (en) Controlling displayed activity information using navigation mnemonics
Truong et al. Analysis of collaboration networks in OpenStreetMap through weighted social multigraph mining
TWI791176B (en) Method, system, device and computer program carrier for automatically identifying effective data collection modules
Abad et al. Learn more, pay less! lessons learned from applying the wizard-of-oz technique for exploring mobile app requirements
CN116700839B (en) Task processing method, device, equipment, storage medium and program product
KR20180039019A (en) Methods and systems associated with a situation-specific recording framework
US10083240B2 (en) Method of data organization and data searching for constructing evidence-based beliefs within a social network
JP7510196B2 (en) Location-based topic launch method and system - Patents.com
CN112749313B (en) Label labeling method, label labeling device, computer equipment and storage medium
CN113900996A (en) File processing method and device, storage medium and terminal
Lord et al. Survival analysis within stack overflow: Python and r
Budzik Information access in context: experiences with the Watson system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant