CN109144837A - A kind of user behavior pattern recognition methods for supporting precisely to service push - Google Patents

A kind of user behavior pattern recognition methods for supporting precisely to service push Download PDF

Info

Publication number
CN109144837A
CN109144837A CN201811024517.9A CN201811024517A CN109144837A CN 109144837 A CN109144837 A CN 109144837A CN 201811024517 A CN201811024517 A CN 201811024517A CN 109144837 A CN109144837 A CN 109144837A
Authority
CN
China
Prior art keywords
data
time
user
indicates
moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811024517.9A
Other languages
Chinese (zh)
Other versions
CN109144837B (en
Inventor
窦睿涵
赵烜
戴海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811024517.9A priority Critical patent/CN109144837B/en
Publication of CN109144837A publication Critical patent/CN109144837A/en
Application granted granted Critical
Publication of CN109144837B publication Critical patent/CN109144837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a kind of user behavior pattern recognition methods for supporting precisely to service push, comprising: step 1, collects from Android mobile device using equipment energy consumption, cpu usage, memory service condition and the communication data package informatin in operational process;Step 2, maintenance data cleaning technique carries out data processing to the data being collected into, and removes noise data, and fill up AFR control present in it, is then grouped using sliding window technique to data;Step 3, classifier training is carried out using light-duty machine learning techniques random forests algorithm;Step 4, the service condition of application is speculated using the classifier of step 3 training;Step 5, personal behavior model is constructed according to the service condition of application;Step 6, user behavior is predicted according to personal behavior model to provide precisely service push.

Description

A kind of user behavior pattern recognition methods for supporting precisely to service push
Technical field
The present invention relates to user behavior analysis field in Android device more particularly to a kind of use for supporting precisely to service push Family behavior pattern recognition method.
Background technique
The fast development of mobile Internet results in the universal of international mobile equipment.Google and apple this two mobile applications The operating system Android and iOS of giant's research and development have almost monopolized global smart phone market.Gartner's studies have shown that cut Smart phone to the third season in 2016, the whole world 87.8% is equipped with android system.Due to the advantage of open source, Android System has attracted the concern of a large amount of software developers, and is applied to mobile device, tablet terminal, TV, digital phase Machine, the smart machines such as game machine.Program is widely applied to continue to bring out, function almost covers all clothes that user is contemplated that Business.By 2015, the number of applications in the Google Play application shop based on android system reached 1,430,000 to super IOS is got over.
Mobile device greatly changes people's lives.People are increasingly keen to complete using mobile applications Various tasks.In fact, the user of operation application program usually has certain regularity, can be used for analyzing user behavior.Example Such as, when user's spare time, he gets used to doing shopping using mobile applications, watches video, browses webpage, chat or object for appreciation trip Play;When user job, he gets used to checking text using mobile applications, checks Email, is edited using Office Document and inquiry relevant knowledge;When user's travelling, he gets used to completing payment task using mobile applications, uses map Obtain route and other services.Therefore, the use by analysis user to mobile applications, can know user behavior mould Formula.
The identification of user behavior pattern is for providing the push of service more timely and accurately and guidance exploit person for user Member, which improves service function, has positive effect.Analysis application program service condition can for construct personal behavior model provide according to According to.Do you so how to speculate the service condition of application? currently, main method is the application inferred by flow analysis sometime Program service condition, if H.F.Alan et al. is in " Can android applications be identified using Only tcp/ip headers of their launch time traffic? (In:ACM Conference on Security&Privacy in Wireless and Mobile Networks., 2016, pp.61-66.) pass through analysis in " Packet information speculates using situation;There are also certain methods can infer application program by analysis public resource Service condition, if Y.Chen et al. is in " Powerful:Mobile app fingerprinting via power Analysis. speculated by analysis power consumption using situation in (In IEEE INFOCOM, 2017, pp. 1-9.) "; X.Liu et al. is in " Understanding diverse usage patterns from large-scale appstore- Service profiles. (In:IEEE Transactions on Software Engineering, 2018, pp.384- 411.) speculated by analysis User Status using situation in ".However, current method usually only considers an item data (such as flow, power), this is for the result is that suboptimum.
Summary of the invention
Goal of the invention: the technical problem to be solved by the present invention is to be directed to the identification to user behavior pattern of the prior art Inaccurate deficiency provides a kind of user behavior pattern recognition methods for supporting precisely to service push.
To solve the above-mentioned problems, the invention discloses a kind of user behavior pattern identification sides for supporting precisely to service push Method, comprising the following steps:
Step 1, it collects from Android mobile device and is used using equipment energy consumption in the process, cpu usage, memory Situation and the communication data package informatin sent using network;
Step 2, data processing is carried out to the data that step 1 is collected into, removes noise data, and fill up sky present in it Data are lacked, then data are grouped using sliding window technique;
Step 3, classifier training is carried out;
Step 4, the service condition of practical application is speculated using the classifier of step 3 training;
Step 5, personal behavior model is constructed according to the service condition of application;
Step 6, user behavior is predicted according to personal behavior model to provide precisely service push.
Step 1 includes the following steps:
Step 1-1 passes through visit by the cpu usage at public resource collecting device each moment in Android device Tall and erect public documents of paying one's respects check current cpu busy percentage information, obtain a tuple (user, nice, system, idle, Iowait, irq, softirq, stealstolen, guest, guest_nice), wherein user indicates to consume in user mode Time, nice indicates that, in the time that the user mode of low priority consumes, system indicates time for consuming in system model, Idle indicates the time in idle task consumption, and iowait indicates the time for waiting input and output to complete, and irq indicates service disruption Time, softirq indicate the service traps time, in other operations when stealstolen indicates to run in virtualized environment The time spent in system, guest indicate to be spent under the control of linux kernel by client operating system operation virtual cpu Time, guest_nice indicate operation niced guest virtual machine the time it takes;Parameter in tuple is all from system Starting, which starts to accumulate, arrives current time;
Step 1-2 passes through the memory service condition at public resource collecting device each moment in Android device: can make AvailableMemory=free+buffers+cached is inside saved as, wherein free indicates remaining unused memory, Buffers indicates that used memory, cached indicate the memory for caching, then obtains memory usage by following formula MemoryUsage:
Wherein, totalMemorySize indicates that the total memory of Android device, availableMemory indicate that Android device is empty Not busy memory;After obtaining memory service condition, the memory data set being collected into Android device should be multiple tuples (timestamp1, memoryUsage), wherein timestamp1 be timestamp for label collection memory usage data when Between;
Step 1-3 obtains the voltage voltage and transient current of Android device by inquiry Android common resource file Current measured value, and power P ower is calculated according to the following formula:
Power=voltage*current,
The power consumption data collected in equipment at this time is multiple tuples (timestamp2, Power), and wherein timestamp2 is Timestamp is used for the time of label collection power data;
Step 1-4 creates VPN connection using VPNservice plug-in unit, so that all data packets will be by this company It connects, to be collected into the communication data packet of the mobile application of needs.
In step 1-1, cpu busy percentage is calculated with the following method:
Step 1-1-1, sampling time interval t short enough1, t2Cpu data (the generally setting t at two moment1, t2Interval 1 second), obtain t1Tuple (the user at moment1, nice1, system1, idle1, iowait1, irq1, softirq1, stealstolen1, guest1, guest_nice1) and t2Tuple (the user at moment2, nice2, system2, idle2, iowait2, irq2, softirq2, stealstolen2, guest2, guest_nice2),
user1It indicates to accumulate since system starting to t1The time that moment consumes in user mode, nice1It indicates from being System starting starts accumulation to t1The time that moment consumes in the user mode of low priority, system1It indicates since system starting Accumulate t1The time that moment consumes in system model, idle1It indicates to accumulate since system starting to t1Moment is in idle task The time of consumption, iowait1It indicates to accumulate since system starting to t1The time that moment waits input and output to complete, irq1Table Show and accumulates since system starting to t1Moment out of service time, sofiirq1It indicates to accumulate since system starting to t1When Carve service traps time, stealstolen1It indicates to accumulate since system starting to t1Moment runs in virtualized environment When time for being spent in other operating systems, guest1It indicates to accumulate since system starting to t1Moment is in linux kernel Control under be client operating system operation virtual cpu the time it takes, guest_nice1It indicates to tire out since system starting Product arrives t1Moment runs niced guest virtual machine the time it takes;
Step 1-1-2 calculates t using following formula1CPU time at moment CPUTime1And t2CPU time at moment CPUTome2:
CPUTime1=
user1+nice1+system1+idle1+iowait1+irq1+softirq1+stealstolen1+guest1+ guest_nice1,
CPUTime2=
user2+nice2+system2+idle2+iowait2+irq2+softirq2+stealstolen2+guest2+ guest_nice2
Step 1-1-3 calculates cpu busy percentage cpuUsage using following formula:
Obtain cpuUsage after, the data set in equipment about CPU should be multiple tuples (timestamp3, CpuUsage), the time that wherein timestamp3 uses for timestamp for label collection cpu busy percentage.
Step 1-4 includes the following steps:
Step 1-4-1, application program are sent corresponding data packet in live network equipment using socket;
Step 1-4-2, android system are forwarded all data packets using method for network address translation NAT by IP table Onto virtual network device;
Step 1-4-3, VPN program obtains all numbers for being saved in virtual network by reading the data in Android device According to packet;
Step 1-4-4, VPN program is collected data packet, obtains the timestamp in data packet, purpose IP address agreement Then type and destination host value send remote server, server solution by live network equipment for the data packet of collection The data packet is analysed, multiple tuples (timestamp, IP, protocol, host) is finally obtained, wherein timestamp is indicated The timestamp in the packet header http, IP indicate purpose IP address, protocol presentation protocol type, and host indicates host attribute value.
Step 2 includes the following steps:
Step 2-1, for the cpu busy percentage for a period of time collected in step 1, the data of memory usage and power Collection, uses D respectivelyCPU, DmemoryAnd DpowerIt indicates, the processing method of three kinds of data sets is identical, represents three kinds of data sets using D and carries out Data processing uses data cleansing technical treatment noise data first;
Step 2-2, use proximity data mean value method fill vacancy value: set wherein i-th of AFR control as ai, then ai's Calculation formula are as follows:
Step 2-3, for data setWherein di(i ∈ [1, n]) refers to i-th of data, and n is that data are total Number, application length is W on D, and offset distance is the sliding window of r to generate isometric sequence samples S1..., Sk, SkIt indicates K-th of sequence samples, wherein
Si=(D(i-1)rW+1..., D(i-1)rW+W),
For all i=1 ..., k andThen D(i-1)rW+1Indicate the sample S sampled from data set Di's First measured value, D(i-1)rW+WIndicate the sample S sampled from data set DiThe W measured value;0.1 is set simultaneously for r Selection W makes rW ∈ Z, feature is extracted from sequence samples, feature includes the average value S of sequence samplesavg, the 20th percentile S20pctl, the 50th percentile S50pctl, the 80th percentile S80pctl, standard deviation SSD, sequence samples maximum value SmaxAnd sequence The minimum value S of samplemin
Step 2-4 handles the Target IP in the packet header http for the packet data that application is communicated over time Address, protocol type and host attribute value.
Step 2-1 includes the following steps:
Step 2-1-1 calculates the population mean μ and variances sigma of each attribute value in data set tuple;
Step 2-1-2 calculates confidence interval with chebyshev's theoremWherein n is data Intensive data number;
Step 2-1-3 judges whether the data are noise data according to confidence interval, if not in confidence interval, for Noise data deletes the data;Otherwise retain.
It in step 3, will be never aggregated in training set with all feature vectors extracted in sample, feature vector includes logical The data tuple (timestamp, IP, protocol, host) and cpu busy percentage of letter data packet, memory usage and instantaneous Data tuple (the S of poweravg, S20pctl, S50pctl, S80pctl, SSD, Smax, Smin), it is gloomy at random by lightweight machine learning techniques Woods trains classifier, and carries out 10 times of cross validations.
In step 4, using the trained classifier of step 3 to the service condition of (generally one week) application in a period Speculated.
Step 5 includes: to construct personal behavior model App according to the service condition of applicationuser, AppuserIt is one and includes one The matrix model in 24 hours one day 7 days week:
Wherein aI, j(1≤i≤24,1≤j≤7) indicate that i-th hour most probable of the user user in jth day weekly makes Apply Names.
In step 6, it is every at one week to predict each user for the behavior model of each user according to obtained in step 5 The application most possibly used in one day each hour, the moment application will be arranged to message can push mode from And accurately service push is provided for user.
Compared with prior art, the invention has the advantages that:
(1) packet information, cpu usage, memory service condition and electric quantity consumption are fully considered, each is answered It is greatly improved with the accuracy of the supposition of service condition, provides basis for precisely service push;
(2) user behavior pattern is explored by the service condition of application, the fabulous relationship by user and application combines Come.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.
Fig. 1 is user behavior pattern recognition methods flow chart of the present invention.
Fig. 2 is the basic framework figure that personal behavior model constructs in the method for the present invention.
Fig. 3 is that certain user uses APP behavioral statistics figure.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
The invention discloses it is a kind of support precisely service push user behavior pattern recognition methods, this method flow chart and Frame diagram difference is as depicted in figs. 1 and 2, comprising the following steps:
Step 1, it collects from Android mobile device and is used using equipment energy consumption in the process, cpu usage, memory Situation and the communication data package informatin sent using network;
Step 2, maintenance data cleaning technique carries out data processing to the data being collected into, and removes noise data, and fill up Then AFR control present in it is grouped data using sliding window technique;
Step 3, classifier training is carried out using light-duty machine learning techniques random forests algorithm;
Step 4, the service condition of practical application is speculated using the classifier of step 3 training;
Step 5, personal behavior model is constructed according to the service condition of application;
Step 6, user behavior is predicted according to personal behavior model to provide precisely service push.
Step 1 includes the following steps:
Step 1-1 passes through the cpu usage at public resource collecting device each moment in Android device.Pass through visit Tall and erect public documents/the proc/stat that pays one's respects checks current cpu busy percentage information.An available tuple (user, nice, System, idle, iowait, irq, softirq, stealstolen, guest, guest_nice), the concrete meaning of parameter exists In table 1.These parameters are all to accumulate since system starting to current time.
Table 1 inquires cpu busy percentage parameter and meaning
The method for calculating cpu busy percentage is as follows:
(1) sampling time interval t short enough1, t2The cpu data at two moment;
(2) t is calculated using following formula1, t2CPU time at moment, respectively CPUTime1, CPUTime2
CPUtime=user+nice+system+idle+iowait+irq+softirq+stealst olen
+guest+guest_nice
(3) cpu busy percentage is calculated using formula, is denoted as cpuUsage,
After obtaining cpuUsage, the data set in equipment about CPU should be multiple tuples (timestamp, cpuUsage).
Step 1-2 passes through the memory service condition at public resource collecting device each moment in Android device.Due to The memory that each application program uses during executing has the function of oneself, will analysis memory service condition with infer some when Carve the application program used.By the data in access Android common resource file/proc/meminfo, equipment can be obtained Total memory and currently available memory, then can obtain the current memory service condition of equipment.About the thought of memory in Linux It is " making the best use of everything ", therefore it can be data cached to facilitate next use as much as possible.But when other application need using When memory, these cachings can be used immediately.So saving as availableMemory=free+buffers+ in available cached.Therefore, memory usage can be obtained by following formula.
Obtain memory service condition after, the memory data set being collected into equipment should be multiple tuples (timestamp, memoryusage)。
Step 1-3 will analyze power consumption since the power consumption of each application program in the process of running has its unique curve To infer in the application program sometime used.It can be by inquiring Android common resource file/sys/class/ power_ Supply/battery obtains voltage and instantaneous current measurements, and is calculated according to formula Power=voltage*current Power.After obtaining instantaneous power, the power consumption data collected in equipment should be multiple tuples (timestamp, Power).
Step 1-4 creates VPN connection using VPNservice plug-in unit so that all data packets will pass through this Connection, this can be collected into the communication data packet of the mobile application of needs.Specific implementation process is as follows:
(1) application program is sent corresponding data packet in live network equipment using Socket.
(2) android system is forwarded a packet to all data on TUN virtual network device using NAT by IP table.
(3) VPN program, which passes through opening/dev/tun equipment and reads the data in equipment, is saved in TUN virtual net to obtain All data packets of network.
(4) VPN program performs some processing data packet, and then by treated, data packet is sent out by live network equipment It sees off.
According to various data, most of communication protocols that mobile applications use include IP agreement, Transmission Control Protocol, UDP Agreement and http protocol, therefore mainly data packet is collected to remote server, Bu Huigeng from current device using these four agreements Change or steal packet data, user does not have to concern private data leakage.Certainly, on condition that obtaining user's authorization.The stream of collection Amount packet will be sent immediately, will not influence the online experience of user.After collecting data packet, the packet header http will be extracted on current device Timestamp, purpose IP address, protocol type and host attribute value, and send it to server and analyzed.Server is received The data on flows of collection should be multiple tuples (timestamp, IP, protocol, host).
The work that step 1 collects data will be integrated into a simple small-sized application program.The data being collected into will It can be transmitted to server end by network mode and carry out the follow-up works such as data processing and data mining.
The data being collected into are carried out processing and are used for follow-up work by step 2.The data being collected into include change over time and Generate the cpu busy percentage of change curve, memory usage and power consumption and over time using the data packet communicated.It is right In the processing that three data of change curve can be obtained in front include step 2-1,2-2 and 2-3;For over time using into The processing of the data packet of row communication is step 2-4.
Step 2-1, using data cleansing technical treatment noise data.Chebyshev's (Chebyshev) theorem will be for that will make an uproar Sound data come with normal data identification.Identification operation is divided into three steps:
(1) the population mean μ and variances sigma of each attribute value are calculated;
(2) confidence interval is calculated with chebyshev's theorem
(3) judge whether the data are noise data according to confidence interval.If so, the data are deleted;Otherwise retain.
Step 2-2 fills vacancy value using proximity data mean value method.The use of proximity data mean value method is close with vacancy value As data estimated.According in claim 2 to the modeling of electric power data, if the AFR control of ith attribute value is ai, then aiPredictor formula be
Step 2-3, for data setWherein di(i ∈ [1, n]) refers to ith measurement value, and n is to survey The sum of magnitude.Application length is W on D, and offset distance is the sliding window of r to generate isometric sequence samples S1..., Sk, wherein
Si=(D(i-1)rW+1..., D(i-1)rW+W),
For all i=1 ..., k and0.1 rule of thumb is set by r and W is selected to make rW ∈ Z. Next, extracting feature from sample includes average value, the 20th, the 50th and the 80th percentile, standard deviation, maximum value and sample Minimum value, respectively by Savg, S20pctl, S50pctl, S80pctl, SSD, SmaxAnd SminIt indicates.
Step 2-4 handles the Target IP in the packet header http for the packet data that application is communicated over time Address, protocol type and host value.Since many major companies (such as Alibaba and Tencent) possess a large amount of servers, and have more A IP address interval is associated with a company, therefore has collected the IP address section of well known server as much as possible.It will collect IP address be compared with IP address section.When IP address matching, replace IP address using corresponding server ID.For Processing host value, extracting can be using representative server or the keyword of company as host value.For example, can be collected using when QQ Http data packet, host value are www.tencent.com.It needs to extract Tencent as host value, belongs to because it can be represented The application program of Tencent is currently running.This facilitates range shorter to QQ, the application such as wechat or QQ mailbox.
In step 3, will never it be aggregated in training set with all feature vectors extracted in sample.Feature vector includes logical The data tuple (timestamp, IP, protocol, host) and cpu busy percentage of letter data packet, memory usage and instantaneous Data tuple (the S of poweravg, S20pctl, S50pctl, S80pctl, SSD, Smax, Smin).It is random by lightweight machine learning techniques Forest trains classifier, and carries out 10 times of cross validations.Random forest is exactly by the thought of integrated study that more trees are integrated A kind of algorithm, it be one include multiple decision trees classifier, and classification of its output is class by setting output individually Depending on other mode.
In step 4, the service condition applied in user's certain time period is pushed away using the classifier of step 3 training It surveys.
In step 5, personal behavior model is constructed according to the service condition of application, can clearly show user at one day The application that each period most probable uses.Personal behavior model ApPuserBuilding be actually construct one include one The matrix model in 24 hours one day 7 days week,
Wherein ai,j(1≤i≤24,1≤j≤7) indicate that i-th hour most probable of the user user in jth day weekly makes Apply Names
In step 6, user behavior is predicted according to personal behavior model to provide precisely service push.According in step 5 It is most possible within each hour of one week every day can to predict each user for the behavior model of obtained each user The application used.Therefore, the moment application will be set as message can push mode accurately take to being provided for user Business push.
Embodiment
The present embodiment has used 50 student volunteers of the city A school as experimental subjects.50 students include 35 Position boy student and 15 schoolgirls.Age distribution is between 18-28 years old.A Android application, main function are developed in the present embodiment Can obtain transient current, voltage, current network state, memory service condition, CPU usage, crawl network flow packet, then Server is transferred data to be analyzed.The application program of exploitation is mounted in the Android mobile phone of volunteer.Volunteer needs 6 different popular application programs are used in 60 minutes, and cannot be on backstage in an application program operational process Run other applications.
The data being collected into the Android mobile phone of volunteer are transmitted to server end by data line to be analyzed and processed. First data are handled, the data being collected into are got rid of noise data by maintenance data cleaning technique, and fill up and wherein exist AFR control, then data are grouped using sliding window technique;
Time identical data are merged according to timestamp.Then the number that feature vector includes communication data packet is obtained According to the data element of tuple (timestamp, IP, protocol, host) and cpu busy percentage, memory usage and instantaneous power Group (Savg, S20pctl, S50pctl, S80pctl, SSD, Smax, Smin).It carries out the data set that treated to randomly select 80% conduct Training set, 20% as test set progress accuracy test.Classified using light-duty machine learning techniques random forests algorithm Device training, and carry out 10 times of cross validations.Finally service condition of the trained classifier to practical application is speculated.
Then, the data that 50 student volunteers continue one month using application program are had collected.Using these data as Test set simultaneously infers the application program that each period uses using trained classifier.Can provide for every volunteer makes The mapping relations of application program and time are illustrated in figure 3 volunteer system the case where week age is using application Meter.
As the color of bar chart is constantly deepened in Fig. 3, the frequency of use of application program is also increasing.Wechat is in one week Most common application program, almost the daily most of the time is all using;Frequency is similarly used in QQ.User is in 10:00- 12:00 and 16:00-18:00 would generally use be hungry and order.12:00-14:00 and 18:00-24:00 focus more on into Row communication plays game and reads news.The two periods are all the quitting times.These behaviors and the behavior of student are than more consistent Also demonstrate the accuracy of the method for the present invention.
According to the behavior model for constructing the user, user can be predicted in 10:00-12:00 and 16:00-18:00 most probable The application used is wechat, Netease's cloud music and is hungry, therefore these three are applied and can allow to send service at the moment and push away It send for meeting user demand.It is stimulation battlefield, wechat in the application that 12:00-14:00 and 18:00-24:00 most probable uses And iqiyi.com, therefore these three are applied to allow to send at the moment and service push for meeting user demand.
The present invention provides a kind of user behavior pattern recognition methods for supporting precisely to service push, implement the technology There are many method and approach of scheme, the above is only a preferred embodiment of the present invention, it is noted that for the art Those of ordinary skill for, various improvements and modifications may be made without departing from the principle of the present invention, these change It also should be regarded as protection scope of the present invention into retouching.The available prior art of each component part being not known in the present embodiment adds To realize.

Claims (10)

1. a kind of user behavior pattern recognition methods for supporting precisely to service push, which comprises the following steps:
Step 1, it collects from Android mobile device using equipment energy consumption in the process, cpu usage, memory service condition With the communication data package informatin sent using network;
Step 2, data processing is carried out to the data that step 1 is collected into, removes noise data, and fill up vacancy number present in it According to being then grouped using sliding window technique to data;
Step 3, classifier training is carried out;
Step 4, the service condition of practical application is speculated using the classifier of step 3 training;
Step 5, personal behavior model is constructed according to the service condition of application;
Step 6, user behavior is predicted according to personal behavior model to provide precisely service push.
2. the method according to claim 1, wherein step 1 includes the following steps:
Step 1-1 is pacified by the cpu usage at public resource collecting device each moment in Android device by access Tall and erect public documents check current cpu busy percentage information, obtain a tuple (user, nice, system, idle, iowait, Irq, softirq, stealstolen, guest, guest_nice), wherein user indicates the time consumed in user mode, Nice indicates that, in the time that the user mode of low priority consumes, system indicates the time consumed in system model, idle table Showing the time in idle task consumption, iowait indicates the time for waiting input and output to complete, and irq indicates out of service time, Sofiirq indicates the service traps time, when stealstolen indicates to run in virtualized environment in other operating systems The time of cost, guest indicate to be that client operating system runs virtual cpu the time it takes under the control of linux kernel, Guest_nice indicates operation niced guest virtual machine the time it takes;Parameter in tuple is opened from system starting Begin to accumulate and arrives current time;
Step 1-2 passes through the memory service condition at public resource collecting device each moment in Android device: being able to use AvailableMemory=free+buffers+cached is inside saved as, wherein free indicates remaining unused memory, buffers It indicates that used memory, cached indicate the memory for caching, then memory usage is obtained by following formula MemoryUsage:
Wherein, totalMemorySize indicates that the total memory of Android device, availableMemory indicated in the Android device free time It deposits;Obtain memory service condition after, the memory data set being collected into Android device should be multiple tuples (timestamp1, MemoryUsage), wherein timestamp1 is time of the timestamp for label collection memory usage data;
Step 1-3 obtains the voltage voltage and transient current of Android device by inquiry Android common resource file Current measured value, and power P ower is calculated according to the following formula:
Power=voltage*current,
The power consumption data collected in equipment at this time is multiple tuples (timestamp2, Power), and wherein timestamp2 is the time Stamp is used for the time of label collection power data;
Step 1-4 creates VPN connection using VPNservice plug-in unit, so that all data packets will be connected by this, from And it is collected into the communication data packet of the mobile application of needs.
3. according to the method described in claim 2, it is characterized in that, calculating cpu busy percentage with the following method in step 1-1:
Step 1-1-1, sampling time interval t short enough1, t2The cpu data at two moment, obtains t1The tuple at moment (user1, nice1, system1, idle1, iowait1, irq1, softirq1, stealstolen1, guest1, guest_ nice1) and t2Tuple (the user at moment2, nice2, system2, idle2, iowait2, irq2, softirq2, stealstolen2, guest2, guest_nice2),
user1It indicates to accumulate since system starting to t1The time that moment consumes in user mode, nice1Expression is opened from system It is dynamic to start accumulation to t1The time that moment consumes in the user mode of low priority, system1It indicates to accumulate since system starting To t1The time that moment consumes in system model, idle1It indicates to accumulate since system starting to t1Moment consumes in idle task Time, iowait1It indicates to accumulate since system starting to t1The time that moment waits input and output to complete, irq1Indicate from System starting starts accumulation to t1Moment out of service time, softirq1It indicates to accumulate since system starting to t1Moment clothes It is engaged in the traps time, stealstolen1It indicates to accumulate since system starting to t1When moment runs in virtualized environment The time spent in other operating systems, guest1It indicates to accumulate since system starting to t1Control of the moment in linux kernel Virtual cpu the time it takes, guest_nice are run down for client operating system1It indicates to accumulate since system starting to t1 Moment runs niced guest virtual machine the time it takes;
Step 1-1-2 calculates t using following formula1CPU time at moment CPUTime1And t2CPU time at moment CPUTime2:
CPUTime1=
user1+nice1+system1+idle1+iowait1+irq1+softirq1+stealstolen1+guest1+guest_ nice1,
CPUTime2=
user2+nice2+system2+idle2+iowait2+irq2+softirq2+stealstolen2+guest2+guest_ nice2
Step 1-1-3 calculates cpu busy percentage cpuUsage using following formula:
After obtaining cpuUsage, the data set in equipment about CPU should be multiple tuples (timestamp3, cpuUsage), Middle timestamp3 is the time that timestamp is used for that label collection cpu busy percentage to use.
4. according to the method described in claim 3, it is characterized in that, step 1-4 includes the following steps:
Step 1-4-1, application program are sent corresponding data packet in live network equipment using socket;
All data are forwarded a packet to void using method for network address translation NAT by IP table by step 1-4-2, android system On the quasi- network equipment;
Step 1-4-3, VPN program obtains all data for being saved in virtual network by reading the data in Android device Packet;
Step 1-4-4, VPN program is collected data packet, obtains the timestamp in data packet, purpose IP address protocol type With destination host value, remote server is then sent by live network equipment by the data packet of collection, server parsing should Data packet finally obtains multiple tuples (timestamp, IP, protocol, host), and wherein timestamp indicates the packet header http Timestamp, IP indicate purpose IP address, protocol presentation protocol type, host indicate host attribute value.
5. according to the method described in claim 4, it is characterized in that, step 2 includes the following steps:
Step 2-1, for the cpu busy percentage for a period of time collected in step 1, the data set of memory usage and power divides D is not usedCPU, DmemoryAnd DpowerIt indicates, the processing method of three kinds of data sets is identical, represents three kinds of data sets using D and carries out data Processing uses data cleansing technical treatment noise data first;
Step 2-2, use proximity data mean value method fill vacancy value: set wherein i-th of AFR control as ai, then aiCalculating Formula are as follows:
Step 2-3, for data setWherein di(i ∈ [1, n]) refers to i-th of data, and n is data count, Application length is W on D, and offset distance is the sliding window of r to generate isometric sequence samples S1..., Sk, SkIndicate kth A sequence samples, wherein
Si=(D(i-1)rW+1..., D(i-1)rW+W),
For all i=1 ..., k andThen D(i-1)rW+1Indicate the sample S sampled from data set DiFirst A measured value, D(i-1)rW+WIndicate the sample S sampled from data set DiThe W measured value;0.1 is set by r and selects W So that rW ∈ Z, extracts feature from sequence samples, feature includes the average value S of sequence samplesavg, the 20th percentile S20pct1、 50th percentile S50pctl, the 80th percentile S80pctl, standard deviation SsD, sequence samples maximum value SmaxWith sequence samples Minimum value Smin
Step 2-4 handles the target ip address in the packet header http for the packet data that application is communicated over time, Protocol type and host attribute value.
6. according to the method described in claim 5, it is characterized in that, step 2-1 includes the following steps:
Step 2-1-1 calculates the population mean μ and variances sigma of each attribute value in data set tuple;
Step 2-1-2 calculates confidence interval with chebyshev's theoremWherein n is number in data set According to number;
Step 2-1-3 judges whether the data are noise data according to confidence interval, if not in confidence interval, for noise Data delete the data;Otherwise retain.
7., will be never with all features extracted in sample according to the method described in claim 6, it is characterized in that, in step 3 Vector combination into training set, feature vector include communication data packet data tuple (timestamp, IP, protocol, ) and the data tuple (S of cpu busy percentage, memory usage and instantaneous power hostavg, S20pctl, S50pctl, S80pctl, SSD, Smax, Smin), by lightweight machine learning techniques random forest training classifier, and carry out 10 times of cross validations.
8. the method according to the description of claim 7 is characterized in that the classifier trained using step 3 is to one in step 4 The service condition applied in period is speculated.
9. according to the method described in claim 8, it is characterized in that, step 5 includes: to construct user according to the service condition of application Behavior model Appuser, AppuserBe one include 24 hours one day 7 days one week matrix models:
Wherein aI, j(1≤i≤24,1≤j≤7) indicate what i-th hour most probable of the user user in jth day weekly used Apply Names.
10. according to the method described in claim 9, it is characterized in that, in step 6, each user according to obtained in step 5 Behavior model predicts the application that each user most possibly uses within each hour of one week every day, at the moment The application will be arranged to message can push mode to for user provide accurately service push.
CN201811024517.9A 2018-09-04 2018-09-04 User behavior pattern recognition method supporting accurate service push Active CN109144837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811024517.9A CN109144837B (en) 2018-09-04 2018-09-04 User behavior pattern recognition method supporting accurate service push

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811024517.9A CN109144837B (en) 2018-09-04 2018-09-04 User behavior pattern recognition method supporting accurate service push

Publications (2)

Publication Number Publication Date
CN109144837A true CN109144837A (en) 2019-01-04
CN109144837B CN109144837B (en) 2021-04-27

Family

ID=64826640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811024517.9A Active CN109144837B (en) 2018-09-04 2018-09-04 User behavior pattern recognition method supporting accurate service push

Country Status (1)

Country Link
CN (1) CN109144837B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109982295A (en) * 2019-03-21 2019-07-05 中国联合网络通信集团有限公司 The method for pushing of service template and the pusher of service template
CN110460502A (en) * 2019-09-10 2019-11-15 西安电子科技大学 Application rs traffic recognition methods under VPN based on distribution characteristics random forest
CN111597947A (en) * 2020-05-11 2020-08-28 浙江大学 Application program inference method for correcting noise based on power supply power factor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140100835A1 (en) * 2012-10-04 2014-04-10 Futurewei Technologies, Inc. User Behavior Modeling for Intelligent Mobile Companions
CN104063467A (en) * 2014-06-26 2014-09-24 北京工商大学 Intra-domain traffic flow pattern discovery method based on improved similarity search technology
CN105023175A (en) * 2015-07-24 2015-11-04 金鹃传媒科技股份有限公司 Online advertisement classified pushing method and system based on consumer behavior data analysis and classification technology
CN107302566A (en) * 2017-05-27 2017-10-27 冯小平 The method and apparatus of pushed information
CN108446176A (en) * 2018-02-07 2018-08-24 平安普惠企业管理有限公司 A kind of method for allocating tasks, computer readable storage medium and terminal device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140100835A1 (en) * 2012-10-04 2014-04-10 Futurewei Technologies, Inc. User Behavior Modeling for Intelligent Mobile Companions
CN104063467A (en) * 2014-06-26 2014-09-24 北京工商大学 Intra-domain traffic flow pattern discovery method based on improved similarity search technology
CN105023175A (en) * 2015-07-24 2015-11-04 金鹃传媒科技股份有限公司 Online advertisement classified pushing method and system based on consumer behavior data analysis and classification technology
CN107302566A (en) * 2017-05-27 2017-10-27 冯小平 The method and apparatus of pushed information
CN108446176A (en) * 2018-02-07 2018-08-24 平安普惠企业管理有限公司 A kind of method for allocating tasks, computer readable storage medium and terminal device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIMIN CHEN等: "POWERFUL:Mobile App Fingerpring via Power Analysis", 《IEEE INFOCOM 2017》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109982295A (en) * 2019-03-21 2019-07-05 中国联合网络通信集团有限公司 The method for pushing of service template and the pusher of service template
CN109982295B (en) * 2019-03-21 2021-10-15 中国联合网络通信集团有限公司 Service template pushing method and service template pusher
CN110460502A (en) * 2019-09-10 2019-11-15 西安电子科技大学 Application rs traffic recognition methods under VPN based on distribution characteristics random forest
CN110460502B (en) * 2019-09-10 2022-03-04 西安电子科技大学 Application program flow identification method under VPN based on distributed feature random forest
CN111597947A (en) * 2020-05-11 2020-08-28 浙江大学 Application program inference method for correcting noise based on power supply power factor

Also Published As

Publication number Publication date
CN109144837B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN103346957B (en) A kind of system and method according to contact person's message alteration contact head image expression
CN102035698B (en) HTTP tunnel detection method based on decision tree classification algorithm
CN104657428B (en) A kind of the Internet advertising method for pushing and device of unaware
IL275042A (en) Self-adaptive application programming interface level security monitoring
CN104301436B (en) Content to be displayed push, subscription, update method and its corresponding device
US20210035126A1 (en) Data processing method, system and computer device based on electronic payment behaviors
CN107515915B (en) User identification association method based on user behavior data
CN108353090A (en) Edge intelligence platform and internet of things sensors streaming system
US10984452B2 (en) User/group servicing based on deep network analysis
CN103840950A (en) Information pushing method and system
CN109144837A (en) A kind of user behavior pattern recognition methods for supporting precisely to service push
US20200213181A1 (en) System and method for network root cause analysis
CN104394211A (en) Design and implementation method for user behavior analysis system based on Hadoop
US20230232052A1 (en) Machine learning techniques for detecting surges in content consumption
CN110059223A (en) Circulation, image to video computer vision guide in machine
CN103905482B (en) Method, push server and the system of pushed information
TW201841137A (en) Arrangement and method for inferring demographics from application usage statistics
CN103248677A (en) Internet behavior analysis system and working method thereof
CN106559498A (en) Air control data collection platform and its collection method
Poltronieri et al. Phileas: A simulation-based approach for the evaluation of value-based fog services
CN102984242A (en) Automatic identification method and device of application protocols
CN108021607A (en) A kind of wireless city Audit data off-line analysis method based on big data platform
Zhao et al. TrCMP: A dependable app usage inference design for user behavior analysis through cyber-physical parameters
Pan et al. Iterative innovation design methods of internet products in the era of big data
CN102315991A (en) Data collecting method based on Internet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant