CN109669779B - Method and device for determining cleaning path of data and cleaning data - Google Patents

Method and device for determining cleaning path of data and cleaning data Download PDF

Info

Publication number
CN109669779B
CN109669779B CN201811587961.1A CN201811587961A CN109669779B CN 109669779 B CN109669779 B CN 109669779B CN 201811587961 A CN201811587961 A CN 201811587961A CN 109669779 B CN109669779 B CN 109669779B
Authority
CN
China
Prior art keywords
data
cleaning
path information
application
cleaning path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811587961.1A
Other languages
Chinese (zh)
Other versions
CN109669779A (en
Inventor
孔柏林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tanlan Network Technology Co ltd
Original Assignee
Shanghai Tanlan Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tanlan Network Technology Co ltd filed Critical Shanghai Tanlan Network Technology Co ltd
Priority to CN201811587961.1A priority Critical patent/CN109669779B/en
Publication of CN109669779A publication Critical patent/CN109669779A/en
Application granted granted Critical
Publication of CN109669779B publication Critical patent/CN109669779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a method and equipment for determining a cleaning path of data and cleaning the data. One embodiment of a method for determining a clean path for data includes: acquiring first behavior data generated by a user using an application installed on a client in a first time period; determining a first characteristic value based on the first row of data, wherein the first characteristic value is used for representing the use condition of the application; inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information; and sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information. According to the method and the device, the cleaning path is determined based on the behavior data of the user, applications installed on the client do not need to be scanned one by one, the time for determining the cleaning path is greatly shortened, and the data cleaning efficiency is improved.

Description

Method and device for determining cleaning path of data and cleaning data
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and equipment for determining a cleaning path of data and cleaning the data.
Background
The functions in the clients such as mobile phones are more and more, and convenience is provided for life and work of people. One can install a wide variety of applications on a client to support the implementation of different functions of the client. During the running process of the application installed on the client, some temporary data are generated and stored under a specified path in the storage space of the client. The long-term accumulation of temporary data can take up a significant amount of storage space in the client, resulting in slow running of the client. Therefore, the temporary data needs to be cleaned in time to release the storage space of the client.
Existing data cleansing methods typically utilize data cleansing applications installed on clients to scan applications installed on the clients one by one to determine storage paths and footprints of data generated by each application. And after the scanning is finished, the occupied space of the data generated by each application is presented to the user. When the user selects to clean up data generated by certain applications, the data stored under the storage path of the data generated by the applications is cleaned up.
Disclosure of Invention
The embodiment of the application provides a method and equipment for determining a cleaning path of data and cleaning the data.
In a first aspect, some embodiments of the present application provide a method for determining a cleaning path of data, which is applied to a server, and includes: acquiring first behavior data generated by a user using an application installed on a client in a first time period; determining a first characteristic value based on the first row of data, wherein the first characteristic value is used for representing the use condition of the application; inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information; and sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information.
In some embodiments, the method further comprises: receiving data cleaning information sent by a client after data cleaning is performed based on the first cleaning path information; determining a cleaning ratio based on the data cleaning information; and if the cleaning rate is greater than a preset rate threshold, the data cleaning model is optimized by taking the first characteristic value and the first cleaning path information as training samples, and the optimized data cleaning model is obtained.
In some embodiments, the method further comprises: if the cleaning rate is smaller than or equal to a preset rate threshold value, second behavior data generated by a user using the application in a second time period are obtained; combining the first behavior data and the second behavior data, and determining a combined characteristic value based on the combined behavior data; inputting the combined characteristic values into a data cleaning model to obtain combined cleaning path information; and sending the combined cleaning path information to the client so that the client can clean the data again based on the combined cleaning path information.
In some embodiments, the first row of data includes an identification of the application and a length of use.
In some embodiments, the first characteristic value comprises at least one of: the application data generation method comprises the steps of using frequency of an application, cleaning path of the application, speed of generating data by the application and occupied space of generating the data by the application.
In some embodiments, the first cleaning path information includes a cleaning path of the application, or includes a cleaning confidence of the application and the cleaning path.
In some embodiments, the data cleaning model is trained by: obtaining a training sample, wherein the training sample comprises a sample characteristic value and sample cleaning path information; constructing a linear regression equation by taking the sample characteristic value as an independent variable, wherein each independent variable of the linear regression equation corresponds to a weight coefficient; constructing a logistic regression function by taking a linear equation as an independent variable; and taking the sample characteristic value as input, taking sample cleaning path information as output, and training the logistic regression function to obtain a data cleaning model.
In some embodiments, the values of the weight coefficients are solved by a maximum likelihood estimation method.
In a second aspect, some embodiments of the present application provide a method for cleaning up data, applied to a client, including: collecting and reporting first behavior data generated by a user using an application installed on a client in a first time period; and in response to receiving first cleaning path information which is sent by the server and is determined based on the first row of data, cleaning the data stored in the first cleaning path corresponding to the first cleaning path information.
In some embodiments, after cleaning the data stored under the first cleaning path indicated by the first cleaning path information, further comprising: generating data cleaning information and sending the data cleaning information to a server.
In a third aspect, some embodiments of the present application provide an apparatus for determining a cleaning path of data, provided at a server, including: an acquisition unit configured to acquire first behavior data generated by a user using an application installed on a client in a first period of time; a determining unit configured to determine a first feature value based on the first row of data, wherein the first feature value is used for characterizing a use case of the application; the input unit is configured to input a first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information; and the sending unit is configured to send the first cleaning path information to the client so that the client can clean data based on the first cleaning path information.
In a fourth aspect, some embodiments of the present application provide an apparatus for cleaning data, provided at a client, including: the system comprises a reporting unit, a processing unit and a processing unit, wherein the reporting unit is configured to collect and report first behavior data generated by a user using an application installed on a client in a first time period; the cleaning unit is configured to respond to the received first cleaning path information which is sent by the server and is determined based on the first line of data, and clean the data stored in the first cleaning path corresponding to the first cleaning path information.
In a fifth aspect, some embodiments of the present application provide a computer device comprising: one or more processors; a storage device on which one or more programs are stored; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect or to implement the method as described in any of the implementations of the second aspect.
In a sixth aspect, some embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect or implements a method as described in any of the implementations of the second aspect.
The method and the device for determining the cleaning path of the data and cleaning the data provided by the embodiment of the application firstly acquire first behavior data generated by a user using an application installed on a client in a first time period; then determining a first characteristic value based on the first row of data; then inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information; and finally, sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information. And the cleaning path is determined based on the behavior data of the user, so that the time for determining the cleaning path is greatly shortened, and the data cleaning efficiency is improved. Meanwhile, the situation that in the prior art, a user needs to wait for a long time to interrupt scanning in the process of scanning an application installed on a client side, so that data cleaning failure is caused is avoided.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:
FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a method for determining a clean path for data according to the present application;
FIG. 3 is a flow chart of yet another embodiment of a method for determining a clean path for data according to the present application;
FIG. 4 is a flow chart of one embodiment of a method for cleaning data according to the present application;
fig. 5 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary system architecture 100 to which the methods of the present application for determining clean path information for data, cleaning data, may be applied.
As shown in fig. 1, system architecture 100 may include devices 101, 102 and a network 103. The network 103 is used to provide a medium for communication links between the devices 101, 102. The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The devices 101, 102 may be hardware devices or software that support network connections to provide various network services. When the device is hardware, it may be a variety of electronic devices including, but not limited to, smartphones, tablets, laptop portable computers, desktop computers, servers, and the like. In this case, the hardware device may be realized as a distributed device group composed of a plurality of devices, or may be realized as a single device. When the device is software, it can be installed in the above-listed devices. In this case, as software, it may be implemented as a plurality of software or software modules for providing distributed services, for example, or as a single software or software module. The present invention is not particularly limited herein.
In practice, a device may provide a corresponding network service by installing a corresponding client application or server application. After the device has installed the client application, it may be embodied as a client in network communication. Accordingly, after the server application is installed, it may be embodied as a server in network communications.
By way of example, in fig. 1, device 101 is embodied as a client and device 102 is embodied as a server. Specifically, device 101 may be a client that installs a data cleansing application, and device 103 may be a background server of the data cleansing application. The background server of the data cleaning application can acquire first behavior data generated by a user using an application installed on a client in a first time period; determining a first feature value based on the first row of data; inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information; and sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information.
It should be noted that, the method for determining the cleaning path of data provided in the embodiments of the present application may be performed by the device 102, and the method for cleaning data may be performed by the device 101.
It should be understood that the number of networks and devices in fig. 1 is merely illustrative. There may be any number of networks and devices as desired for an implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for determining a clean path for data according to the present application is shown. The method for determining the cleaning path of the data is applied to the server and comprises the following steps:
Step 201, obtaining first behavior data generated by a user using an application installed on a client in a first period of time.
In this embodiment, an execution subject (e.g., the device 102 shown in fig. 1) of the method for determining a clean path of data may acquire behavior data generated by a user using an application installed on a client (e.g., the device 101 shown in fig. 1). In some embodiments, the executing entity may acquire the first behavior data only when authorized by the user. In general, when a user needs to clean up temporary data stored in a client, a data cleaning request may be sent to the execution body using the client. For example, when a user clicks a data cleansing button in a data cleansing application installed on a client, the client may send a data cleansing request to the execution body. If a data cleaning request sent by the client is received, the executing body can be considered to be authorized by the user, and the first row of data can be acquired at the moment. Wherein the behavior data generated during the first time period may be the first behavior data. The first time period may be any time interval set in advance, for example, the first time period may be a time interval corresponding to one week before the current time.
In general, various applications may be installed on a client, such as a data cleansing application, a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like. For any one of the applications installed on the client, behavior data for the application may be generated when the application is used by the user. Wherein the behavior data of the application may include an identification of the application and a duration of use of the application. Thus, the first behavior data may include an identification of an application and a duration of use of the application used by the user during the first period of time.
In some embodiments, the client may collect behavior data in real-time and upload it to the executing entity in real-time or on a regular basis.
In some embodiments, the client may integrate an SDK (Software Development Kit ) module. Thus, every time the client collects the behavior data, the behavior data can be reported to the SDK platform in real time. The SDK platform can send the behavior data reported by the client to the data storage platform for storage in real time by providing the reported RESTful API. The execution body may obtain the behavior data through an interface or a message queue with the data storage platform. The RESTful API refers to REST (REpresentational State Transfer, presentation layer state transition) style API (Application Programming Interface ).
Step 202, a first characteristic value is determined based on the first row of data.
In this embodiment, the execution body may analyze the first behavior data to determine the first feature value. Wherein the first characteristic value may be used to characterize the use case of the application, including but not limited to at least one of: the frequency of use of the application, the cleaning path of the application, the speed at which the application generates data, the footprint of the application generating data, etc.
In general, the first behavior data may include an identification of an application used by the user during a first period of time and a duration of use. Here, for each application used by the user in the first period, the execution body may count the number of times the application is used and the total duration of use based on the first line data, thereby calculating the frequency of use of the application. The execution body can analyze a plurality of applications in advance, determine cleaning paths of the plurality of applications, data generating speed and occupied space of the generated data, and store the data. For each application used by the user in the first period, the execution body may determine the cleaning path of the application, the speed of generating the data, and the occupied space of the data from the cleaning paths, the speed of generating the data, and the occupied space of the data of the prestored applications.
Step 203, inputting the first feature value to a preset data cleaning model to obtain first cleaning path information.
In this embodiment, the execution body may input the first feature value to a preset data cleaning model, so as to obtain first cleaning path information. In some embodiments, the first cleaning path information may include only the cleaning path of the application. At this time, the cleaning path included in the first cleaning path information is typically a cleaning path of an application installed on the client and used by the user with a higher frequency in the first period of time. In some embodiments, the first cleaning path information may include both the cleaning path of the application and the cleaning confidence of the corresponding application. At this time, the first cleaning path information includes a cleaning path that is typically a cleaning path of an application installed on the client that is used by the user during the first period of time. The cleaning confidence of an application may be the probability that the data generated by the application needs to be cleaned. Generally, the higher the frequency of use of an application, the faster the speed of generating data, the greater the space occupied to generate data, and the greater the probability that it needs to be cleaned, and conversely, the less the probability that it will be cleaned.
Here, the data cleaning model may be used to determine cleaning path information, characterizing a correspondence between the feature values and the cleaning path information.
In some embodiments, the data cleaning model may be a table that is obtained by a person skilled in the art and stores a plurality of correspondence tables between feature values and corresponding cleaning path information. In this way, the executing body may calculate the similarity between the first feature value and each feature value in the correspondence table, and query the first cleaning path information from the correspondence table based on the similarity result. For example, the cleaning path information corresponding to the feature value with the highest similarity to the first feature value is queried from the correspondence table as the first cleaning path information.
In some embodiments, the data cleaning model may be obtained by supervised training of existing machine learning models (e.g., various neural networks) using various machine learning methods and training samples. For example, the data cleaning model may be trained by:
first, a training sample is obtained.
The training samples may include sample feature values and sample cleaning path information, among other things. The sample characteristic values may include, but are not limited to, at least one of: the frequency of use of the sample application, the clean-up path of the sample application, the speed at which the sample application generates data, the footprint of the sample application generating data, and so forth. The sample cleaning path information may include a cleaning confidence of the sample application and a cleaning path of the sample application.
Then, a linear regression equation is constructed by taking the sample characteristic value as an independent variable.
Wherein each argument of the linear regression equation may correspond to a weight coefficient. As an example, when the sample eigenvalue includes only one eigenvalue of the frequency of use of the sample application, the constructed linear regression equation may be: z=b+ax. Where x is the argument of the linear regression equation, characterizing the frequency of use of the application, a is the weight coefficient corresponding to x, and b is the weight coefficient. As another example, when the sample feature values include n feature values (n is an integer greater than 1) of the frequency of use of the sample application, the cleaning path of the sample application, the speed at which the sample application generates data, the footprint of the sample application generates data, etc., the constructed linear regression equation may be: z=β 01 ×x 12 ×x 23 ×x 3 +...+β n ×x n . Wherein x is 1 、x 2 、x 3 …x n Is the independent variable of a linear regression equation, and represents n characteristic values, beta 1 、β 2 、β 3 …β n Is with x 1 、x 2 、x 3 …x n Corresponding weight coefficient, beta 0 And also as a weighting coefficient.
Then, a logistic regression function is constructed with the linear equation as an argument.
Continuing with one example of the above steps, the logistic regression function may be:
Figure BDA0001919578210000081
continuing with another example in the above step, the logistic regression function may be: / >
Figure BDA0001919578210000082
Further data transformations may be written as: />
Figure BDA0001919578210000083
And finally, taking the sample characteristic value as input, taking sample cleaning path information as output, and training a logistic regression function to obtain a data cleaning model.
Here, the sample feature value may be used as the value of the argument, the sample cleaning path information may be used as the value of the argument and be carried into the logistic regression function, the numerical value of the weight coefficient in the logistic function may be obtained, and the numerical value of the obtained weight coefficient may be carried into the logistic regression function to obtain the data cleaning model. In some embodiments, the values of the weight coefficients may be solved by a maximum likelihood estimation method. Among these, maximum likelihood estimation (maximum likelihood estimation, MLE) is an important and popular method of estimating quantities. The maximum likelihood estimation method explicitly uses probability models, the goal of which is to find a phylogenetic tree that can produce observation data with higher probability. Here, the maximum likelihood estimation method is to make the probability of occurrence of data in the training samples as large as possible.
Step 204, the first cleaning path information is sent to the client, so that the client performs data cleaning based on the first cleaning path information.
In this embodiment, the execution body may send the first cleaning path information to the client, for example, send the first cleaning path information to the client through an API. Thus, the client can clean the data according to the first cleaning path information. In some embodiments, when the first cleaning path information includes only the cleaning path of the application, the client may clean data stored under the cleaning path of the application included in the first cleaning path information. In some embodiments, where the first cleaning path information includes both a cleaning path of the application and a cleaning confidence of the application, the client may select an application with a cleaning confidence greater than a preset cleaning confidence threshold (e.g., 60%) and clean data stored under the cleaning path of the selected application.
The method for determining a cleaning path of data provided in the above embodiment of the present application includes first obtaining first behavior data generated by a user using an application installed on a client in a first period of time; then determining a first characteristic value based on the first row of data; then inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information; and finally, sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information. And the cleaning path is determined based on the behavior data of the user, so that the time for determining the cleaning path is greatly shortened, and the data cleaning efficiency is improved. Meanwhile, the situation that in the prior art, a user needs to wait for a long time to interrupt scanning in the process of scanning an application installed on a client side, so that data cleaning failure is caused is avoided.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for determining a clean path for data according to the present application is shown. The method for determining the cleaning path of the data is applied to the server and comprises the following steps:
step 301, obtaining first behavior data generated by a user using an application installed on a client in a first period of time.
A first characteristic value is determined based on the first row of data, step 302.
Step 303, inputting the first feature value to a preset data cleaning model to obtain first cleaning path information.
Step 304, the first cleaning path information is sent to the client, so that the client performs data cleaning based on the first cleaning path information.
In this embodiment, the specific operations of steps 301 to 304 are substantially the same as those of steps 201 to 204 in the embodiment shown in fig. 2, and will not be described herein.
Step 305, receiving data cleaning information sent by the client after data cleaning based on the first cleaning path information.
In this embodiment, after performing data cleansing based on the first cleansing path information, the client (for example, the device 101 shown in fig. 1) may send the data cleansing information to an execution body (for example, the device 102 shown in fig. 1) for determining a cleansing path of data. The data cleaning information may include a memory space released after data cleaning and a time spent for data cleaning. It should be noted that the first cleaning path information is described in detail in the embodiment shown in fig. 2, and is not described herein.
Step 306, a cleaning ratio is determined based on the data cleaning information.
In this embodiment, the execution subject may determine the cleaning ratio based on the data cleaning information. For example, the executing body may first obtain the average speed of the data generated by the client in the past period, then take the product of the average speed of the data generated by the client and the first period as the occupied space of the data generated by the client in the first period, and finally take the ratio of the storage space released after the data is cleaned and the occupied space of the data generated by the client in the first period as the cleaning ratio.
Step 307, determining whether the cleaning ratio is greater than a preset ratio threshold.
In this embodiment, the executing body may compare the cleaning ratio with a preset ratio threshold (for example, 80%), and if the cleaning ratio is greater than the preset ratio threshold, it indicates that the cleaning effect is achieved, and then the step 308 is continued; if the cleaning rate is less than or equal to the preset rate threshold, it indicates that the cleaning effect has not been achieved, and the process continues to step 309.
And step 308, optimizing the data cleaning model by taking the first characteristic value and the first cleaning path information as training samples to obtain an optimized data cleaning model.
In this embodiment, if the cleaning ratio is greater than the preset ratio threshold, the executing body may optimize the data cleaning model with the first feature value as input and the first cleaning path information as input, so as to adjust the value of the weight coefficient of the data cleaning model, thereby obtaining the optimized data cleaning model. Generally, the cleaning path information determined by the optimized data cleaning model is higher in accuracy. It should be noted that the first feature value and the data cleaning model are described in detail in the embodiment shown in fig. 2, and are not described herein.
Step 309, obtaining second behavior data generated by the user using the application during a second time period.
In this embodiment, if the cleaning rate is less than or equal to the preset rate threshold, the executing body may acquire second behavior data generated by the user using the application in the second period of time. Wherein the second time period is typically one time period after the first time period. The behavior data generated during the second period of time may be the second behavior data. It should be noted that, the behavior data is described in detail in the embodiment shown in fig. 2, and will not be described herein.
Step 310, merging the first behavior data and the second behavior data, and determining a merged feature value based on the merged behavior data.
In this embodiment, the execution body may combine the first behavior data and the second behavior data, and determine the combined feature value based on the combined behavior data. Here, the operation of determining the combined eigenvalue is substantially the same as the operation of determining the first eigenvalue, and will not be described here.
Step 311, inputting the combined characteristic value into the data cleaning model to obtain combined cleaning path information.
In this embodiment, the execution body may input the merged feature value to the data cleaning model, so as to obtain merged cleaning path information. Here, the operation of obtaining the merged clean path information is substantially the same as the operation of obtaining the first clean path information, and will not be described herein.
Step 312, the merged clean path information is sent to the client, so that the client performs data cleaning again based on the merged clean path information.
In this embodiment, the execution body may send the merged clean path information to the client. Thus, the client can clean the data according to the combined cleaning path information. Here, the operation of the client for data cleaning according to the combined cleaning path information is substantially the same as the operation of the client for data cleaning according to the first cleaning path information, which is not described herein.
It should be noted that, after the client performs data cleaning again based on the merged cleaning path information, the method may return to continue to step 305 to continue to optimize the data cleaning model, or cause the client to perform data cleaning again. And the data cleaning model is repeatedly circulated, so that the accuracy of determining the cleaning path information is continuously improved.
As can be seen from fig. 3, compared to the corresponding embodiment of fig. 2, the flow 300 of the method for pushing information in this embodiment is increased by steps 305-312. Therefore, under the condition that the cleaning effect is achieved, the scheme described by the embodiment uses the first characteristic value and the first cleaning path information as training samples to continuously optimize the data cleaning model so as to improve the accuracy of the data cleaning model in determining the cleaning path information. And under the condition that the cleaning effect is not achieved, collecting more behavior data to redefine the cleaning path information, so that the client side performs data cleaning again. And the data cleaning model is repeatedly circulated, so that the accuracy of determining the cleaning path information is continuously improved.
With continued reference to FIG. 4, a flow 400 of one embodiment of a method for cleaning data according to the present application is shown. The method for cleaning up the data is applied to the client and comprises the following steps:
In step 401, first behavior data generated by a user using an application installed on a client during a first period of time is collected and reported.
In this embodiment, the execution body of the method for cleaning up data (for example, the device 101 shown in fig. 1) may collect and report first behavior data generated by the user using the application installed on the client during the first period of time. The first behavior data may be directly reported to the server (e.g., the device 102 shown in fig. 1), or may be reported to the SDK platform, and the SDK platform sends the first behavior data to the data storage platform for storage. In this way, the server may obtain the first row of data through an interface with the data storage platform or a message queue. It should be noted that, the first period and the first row of data are described in detail in the embodiment shown in fig. 2, and are not described herein.
Step 402, in response to receiving first cleaning path information determined based on first data sent by the server, cleaning data stored in a first cleaning path corresponding to the first cleaning path information.
In this embodiment, the server may determine the first cleaning path information based on the first data, and send the first cleaning path information to the execution body. After receiving the first cleaning path information, the executing body may clean data stored in the first cleaning path corresponding to the first cleaning path information. In some embodiments, when the first cleaning path information includes only the cleaning path of the application, the client may clean data stored under the cleaning path of the application included in the first cleaning path information. In some embodiments, where the first cleaning path information includes both a cleaning path of the application and a cleaning confidence of the application, the client may select an application with a cleaning confidence greater than a preset cleaning confidence threshold (e.g., 60%) and clean data stored under the cleaning path of the selected application.
In some embodiments, after performing data cleansing, the executing body may further generate data cleansing information and send the data cleansing information to the server. It should be noted that the data cleansing information is described in detail in the embodiment shown in fig. 2, and is not described herein.
The method for cleaning up data provided in the foregoing embodiment of the present application first collects and reports first behavior data generated by a user using an application installed on a client in a first period of time; and then, under the condition that first cleaning path information which is sent by the server and is determined based on the first row of data is received, cleaning the data stored in the first cleaning path corresponding to the first cleaning path information. The server determines the cleaning path based on the behavior data of the user, so that the time for determining the cleaning path is greatly shortened, and the data cleaning efficiency is improved. Meanwhile, the situation that in the prior art, a user needs to wait for a long time to interrupt scanning in the process of scanning an application installed on a client side, so that data cleaning failure is caused is avoided.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing a computer device (e.g., device 101 or device 102 of FIG. 1) of an embodiment of the present application. The computer device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments herein.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 501. It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a determination unit, an input unit, and a transmission unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires first behavior data generated by a user using an application installed on a client in a first period of time". As another example, it can be described as: a processor includes a reporting unit and a cleaning unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the reporting unit may also be described as "a unit that collects and reports first behavior data generated by the user using an application installed on the client during the first period of time".
As another aspect, the present application also provides a computer-readable medium that may be contained in the computer device described in the above embodiments; or may exist alone without being assembled into the computer device. The computer readable medium carries one or more programs which, when executed by the computer device, cause the computer device to: acquiring first behavior data generated by a user using an application installed on a client in a first time period; determining a first characteristic value based on the first row of data, wherein the first characteristic value is used for representing the use condition of the application; inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information; and sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information. Or cause the computer device to: collecting and reporting first behavior data generated by a user using an application installed on a client in a first time period; and in response to receiving first cleaning path information which is sent by the server and is determined based on the first row of data, cleaning the data stored in the first cleaning path corresponding to the first cleaning path information.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims (10)

1. The method for determining the cleaning path of the data is applied to a server and comprises the following steps:
acquiring first behavior data generated by a user using an application installed on a client in a first time period;
determining a first characteristic value based on the first row of data, wherein the first characteristic value is used for representing the use condition of the application;
inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information;
sending the first cleaning path information to the client so that the client can clean data based on the first cleaning path information;
Wherein the method further comprises:
receiving data cleaning information sent by the client after data cleaning based on the first cleaning path information;
determining a cleaning ratio based on the data cleaning information;
and if the cleaning ratio is greater than a preset ratio threshold, the data cleaning model is optimized by taking the first characteristic value and the first cleaning path information as training samples, and the optimized data cleaning model is obtained.
2. The method of claim 1, wherein the method further comprises:
if the cleaning rate is smaller than or equal to the preset rate threshold, second behavior data generated by the user using the application in a second time period are obtained;
combining the first behavior data and the second behavior data, and determining a combined feature value based on the combined behavior data;
inputting the combined characteristic values into the data cleaning model to obtain combined cleaning path information;
and sending the combined cleaning path information to the client so that the client can clean data again based on the combined cleaning path information.
3. The method of claim 1, wherein the first row of data includes an identification and a duration of use of the application.
4. The method of claim 1, wherein the first characteristic value comprises at least one of: the application data generation method comprises the steps of using frequency of the application, cleaning path of the application, speed of generating data by the application and occupied space of generating the data by the application.
5. The method of claim 1, wherein the first cleaning path information comprises a cleaning path of the application or comprises a cleaning confidence and a cleaning path of the application.
6. The method according to one of claims 1 to 5, wherein the data cleaning model is trained by:
obtaining a training sample, wherein the training sample comprises a sample characteristic value and sample cleaning path information;
constructing a linear regression equation by taking the sample characteristic value as an independent variable, wherein each independent variable of the linear regression equation corresponds to a weight coefficient;
constructing a logistic regression function by taking the linear equation as an independent variable;
and taking the sample characteristic value as input, taking the sample cleaning path information as output, and training the logistic regression function to obtain the data cleaning model.
7. The method of claim 6, wherein the values of the weight coefficients are solved by a maximum likelihood estimation method.
8. A method for cleaning up data, applied to a client, comprising:
collecting and reporting first behavior data generated by a user using an application installed on the client in a first time period;
in response to receiving first cleaning path information which is sent by a server and is determined based on the first row of data, cleaning data stored in a first cleaning path corresponding to the first cleaning path information, wherein the server determines a first characteristic value based on the first row of data, and the first characteristic value is used for representing the use condition of the application; inputting the first characteristic value into a preset data cleaning model to obtain first cleaning path information, wherein the data cleaning model is used for determining the cleaning path information;
generating data cleaning information and sending the data cleaning information to the server, wherein the server determines a cleaning ratio based on the data cleaning information; and if the cleaning ratio is greater than a preset ratio threshold, the data cleaning model is optimized by taking the first characteristic value and the first cleaning path information as training samples, and the optimized data cleaning model is obtained.
9. A computer device, comprising:
one or more processors;
a storage device on which one or more programs are stored;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7 or the method of claim 8.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7 or implements the method of claim 8.
CN201811587961.1A 2018-12-25 2018-12-25 Method and device for determining cleaning path of data and cleaning data Active CN109669779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811587961.1A CN109669779B (en) 2018-12-25 2018-12-25 Method and device for determining cleaning path of data and cleaning data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811587961.1A CN109669779B (en) 2018-12-25 2018-12-25 Method and device for determining cleaning path of data and cleaning data

Publications (2)

Publication Number Publication Date
CN109669779A CN109669779A (en) 2019-04-23
CN109669779B true CN109669779B (en) 2023-05-26

Family

ID=66146062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811587961.1A Active CN109669779B (en) 2018-12-25 2018-12-25 Method and device for determining cleaning path of data and cleaning data

Country Status (1)

Country Link
CN (1) CN109669779B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632051A (en) * 2020-12-25 2021-04-09 中国工商银行股份有限公司 Neural network-based database cleaning method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646086B (en) * 2013-12-13 2017-01-25 北京奇虎科技有限公司 Junk file cleaning method and device
CN105868367B (en) * 2016-03-30 2019-06-18 北京奇虎科技有限公司 Method, server and the system of a kind of root path in cleaning Android device
CN106202166B (en) * 2016-06-24 2020-08-18 北京奇虎技术服务有限公司 File cleaning method and device and corresponding client
CN106201601B (en) * 2016-06-30 2019-11-26 北京奇虎科技有限公司 A kind of file clean-up method, electronic equipment and server
CN108337358B (en) * 2017-09-30 2020-01-14 Oppo广东移动通信有限公司 Application cleaning method and device, storage medium and electronic equipment
CN108427737B (en) * 2018-02-28 2020-10-30 上海连尚网络科技有限公司 Data cleaning method, equipment and computer readable medium
CN108932140A (en) * 2018-07-13 2018-12-04 重庆邮电大学 The method of cleaning background application based on Android user behavior habit

Also Published As

Publication number Publication date
CN109669779A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN109460513B (en) Method and apparatus for generating click rate prediction model
CN109492772B (en) Method and device for generating information
CN105808652B (en) Method and device for realizing online customer service
CN108520470B (en) Method and apparatus for generating user attribute information
CN111125574B (en) Method and device for generating information
CN110866040B (en) User portrait generation method, device and system
CN109582854B (en) Method and apparatus for generating information
CN109669779B (en) Method and device for determining cleaning path of data and cleaning data
CN111488517A (en) Method and device for training click rate estimation model
CN110347973B (en) Method and device for generating information
CN110457089B (en) Data acquisition method, data acquisition device, computer readable storage medium and computer equipment
US20230053859A1 (en) Method and apparatus for outputting information
CN111506643B (en) Method, device and system for generating information
CN111131354B (en) Method and apparatus for generating information
CN111949860B (en) Method and apparatus for generating a relevance determination model
CN111125501B (en) Method and device for processing information
CN110795424B (en) Characteristic engineering variable data request processing method and device and electronic equipment
CN109408716B (en) Method and device for pushing information
CN113762581A (en) Method and device for generating activity population prediction model, electronic equipment and medium
WO2020078049A1 (en) User information processing method and device, server, and readable medium
CN111125572B (en) Method and device for processing information
CN113282471B (en) Equipment performance testing method and device and terminal equipment
CN111526054B (en) Method and device for acquiring network
CN114297234A (en) Method and device for identifying key behavior data
CN117453988A (en) Product recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant