CN108287860A - Model generating method, garbage files recognition methods and device - Google Patents

Model generating method, garbage files recognition methods and device Download PDF

Info

Publication number
CN108287860A
CN108287860A CN201710791588.0A CN201710791588A CN108287860A CN 108287860 A CN108287860 A CN 108287860A CN 201710791588 A CN201710791588 A CN 201710791588A CN 108287860 A CN108287860 A CN 108287860A
Authority
CN
China
Prior art keywords
file
garbage files
information
directory information
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710791588.0A
Other languages
Chinese (zh)
Inventor
曹聪
曹一聪
魏雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710791588.0A priority Critical patent/CN108287860A/en
Publication of CN108287860A publication Critical patent/CN108287860A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of model generating method, garbage files recognition methods and devices, belong to data processing field.The method includes:Obtain the first data set and the second data set, calculate the corresponding fisrt feature matrix of the first data set and the corresponding second characteristic matrix of the second data set, fisrt feature matrix is used to indicate the text feature of the first directory information, and second characteristic matrix is used to indicate the text feature of the second directory information;According to fisrt feature matrix and second characteristic matrix, training obtains object-class model.The present invention by eigenmatrix be trained to obtain for determine file to be detected whether be the garbage files object-class model, so that can directly use the object-class model intelligent recognition garbage files in subsequent process, the workload of manual maintenance configuration file in the related technology is alleviated.

Description

Model generating method, garbage files recognition methods and device
Technical field
The present embodiments relate to data processing field, more particularly to a kind of model generating method, garbage files identification side Method and device.
Background technology
With popularizing for intelligent terminal, various terminals application software emerges one after another, and incident often deposited in terminal End is easily lead in various garbage files, such as browser rs cache file, bluetooth file, image cache, these garbage files End operation is slow or increases power consumption.Therefore, it is necessary to which garbage files are timely identified and are cleared up.
The use of relatively broad garbage files recognition methods is at present the garbage files recognition methods based on configuration file.Base Include in the core concept of the garbage files recognition methods of configuration file:Pre- first pass through manually is observed in each application program operation Generated garbage files determine the directory informations such as the rubbish type of n garbage files and the routing information in place path, will This corresponding directory information of n garbage files is recorded in configuration file;Client-side program will be according to the road in the configuration file Diameter is scanned successively, if the type of the file scanned is consistent with rubbish type, identifies that this document is garbage files.
But in the above-mentioned methods, configuration file is to need artificial real-time servicing, i.e., whenever the new rubbish type of appearance When, it is required to first pass through artificial observation and determines the rubbish type, then to configuration file into edlin and update.In configuration file After the completion of update, client-side program could be based on updated configuration file and identify garbage files.And with the increasing of rubbish type More, manual maintenance configuration file is not only very loaded down with trivial details, but also does not allow easy to operate, causes the recognition effect of garbage files bad.
Invention content
Lead to the garbage files recognition methods based on configuration file to solve manual maintenance configuration file in the related technology The bad problem of recognition effect, an embodiment of the present invention provides model generating method, garbage files recognition methods and devices.Institute It is as follows to state technical solution:
In a first aspect, a kind of model generating method is provided, the method includes:
The first data set and the second data set are obtained, first data set includes the first mesh of at least one garbage files Information is recorded, second data set includes the second directory information of at least one non-junk file, first data set and institute It states the second data set and intersection is not present;
According to first directory information of at least one garbage files, first data set corresponding the is calculated One eigenmatrix, the fisrt feature matrix are used to indicate the text feature of first directory information;
According to second directory information of at least one non-junk file, it is corresponding to calculate second data set Second characteristic matrix, the second characteristic matrix are used to indicate the text feature of second directory information;
According to the fisrt feature matrix and the second characteristic matrix, training obtains object-class model, the target Disaggregated model is for determining whether file to be detected is the garbage files.
Second aspect provides a kind of garbage files recognition methods, using model generating method as described in relation to the first aspect The object-class model generated, the method includes:
The directory information of file to be detected is obtained, the directory information of the file to be detected includes the file to be detected The routing information in path where extension name information and the file to be detected;
According to the extension name information of the file to be detected and the corresponding routing information, using the target classification mould Type obtains the recognition result of the file to be detected, the recognition result be used to indicate the file to be detected be garbage files or Person's non-junk file.
The third aspect, provides a kind of model generating means, and described device includes:
Acquisition module includes at least one rubbish for obtaining the first data set and the second data set, first data set First directory information of rubbish file, second data set includes the second directory information of at least one non-junk file, described Intersection is not present in first data set and second data set;
First computing module, for according to first directory informations of at least one garbage files, described in calculating The corresponding fisrt feature matrix of first data set, the text that the fisrt feature matrix is used to indicate first directory information are special Sign;
Second computing module calculates institute for second directory information according at least one non-junk file The corresponding second characteristic matrix of the second data set is stated, the second characteristic matrix is used to indicate the text of second directory information Feature;
Training module, for according to the fisrt feature matrix and the second characteristic matrix, training to obtain target classification Model, the object-class model is for determining whether file to be detected is the garbage files.
In one possible implementation, the acquisition module, including:Acquiring unit and the first determination unit;
The acquiring unit, for obtaining rubbish configuration file, the rubbish configuration file include it is preset it is described at least The first path information of the first extension name information and place path of one garbage files;
First determination unit, for for each garbage files, described the first of the garbage files to be expanded Exhibition name information and the first path information are determined as first directory informations of the garbage files, obtain include described in extremely First data set of few first directory information.
In one possible implementation, the acquisition module, including:It Traversal Unit, the second determination unit and obtains Unit;
The Traversal Unit, for traversing the corresponding directory information of k disk file in the operating system, institute The routing information in path, the k where stating the extension name information and the disk file that directory information includes the disk file For positive integer;
Second determination unit, for when it is the non-junk file to detect i-th disk file, by described the The directory information of i disk file is determined as second directory information, second directory information is added to described In second data set, the i is positive integer, the i≤k;
It is described to obtain unit, for obtain include at least one second directory information second data set.
In one possible implementation, the first directory information of at least one garbage files include it is described at least The the first extension name information and first path information of one garbage files,
The acquisition module is additionally operable to meet the first preset condition when the directory information of i-th of disk file When, determine that i-th of disk file is the non-junk file;
Wherein, first preset condition include i-th of disk file the extension name information with it is described at least One the first extension name information is different and/or the corresponding routing information of i-th of disk file and described at least one A first path information is different.
In one possible implementation, first computing module, including:First participle unit, first calculate list Member and the first generation unit;
The first participle unit obtains m for carrying out word segmentation processing at least one first directory information One feature word, the m are positive integer;
First computing unit, it is described for calculating the corresponding the First Eigenvalue of m fisrt feature word The First Eigenvalue is used to indicate the discrimination that the fisrt feature word judges the garbage files;
First generation unit is used for according to the corresponding the First Eigenvalue of the m fisrt feature word, Generate the corresponding fisrt feature matrix of first data set.
In one possible implementation, second computing module, including:Second participle unit, second calculate list Member and the second generation unit;
Second participle unit obtains n for carrying out word segmentation processing at least one second directory information Two feature words, the n are positive integer;
Second computing unit, it is described for calculating the corresponding Second Eigenvalue of n second feature word Second Eigenvalue is used to indicate the discrimination that the second feature word judges the non-junk file;
Second generation unit is used for according to the corresponding Second Eigenvalue of the n second feature word, Generate the corresponding second characteristic matrix of second data set.
In one possible implementation, described device further includes:
Sort module is counted for first data set to be divided into the first training set and the first test set by described second It is divided into the second training set and the second test set according to collection, first training set and second training set obtain described for training Object-class model, first test set and second test set to the object-class model for being tested to obtain Classification accuracy;
The training module, including:Third determination unit, the 4th determination unit and training unit;
The third determination unit, for according to the fisrt feature matrix, determining first training set corresponding the One feature submatrix;
4th determination unit, for according to the second characteristic matrix, determining second training set corresponding the Two feature submatrixs;
The training unit, for according to the fisrt feature submatrix and the second feature submatrix, training to obtain The object-class model.
In one possible implementation, the training unit is additionally operable to the fisrt feature submatrix and described In second feature submatrix input logic regression model, training obtains the object-class model.
Fourth aspect provides a kind of garbage files identification device, using model generating means as described in relation to the first aspect The object-class model generated, described device include:
The directory information of acquisition module, the directory information for obtaining file to be detected, the file to be detected includes institute State the routing information of the extension name information and the file place to be detected path of file to be detected;
Identification module is used for the extension name information according to the file to be detected and the corresponding routing information, uses The object-class model obtains the recognition result of the file to be detected, and the recognition result is used to indicate the text to be detected Whether part is garbage files.
5th aspect, provides a kind of model generation device, the model generation device includes processor and memory, institute It states and is stored at least one instruction, at least one section of program, code set or instruction set in memory, at least one instruction, institute At least one section of program, the code set or instruction set is stated to be loaded by the processor and executed to realize what first aspect was provided Model generating method.
6th aspect, provides a kind of computer readable storage medium, at least one finger is stored in the storage medium Enable, at least one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or Instruction set is loaded by the processor and is executed to realize model generating method that first aspect is provided.
7th aspect, provides a kind of garbage files identification equipment, the garbage files identification equipment include processor and Memory, is stored at least one instruction, at least one section of program, code set or instruction set in the memory, and described at least one Item instruction, at least one section of program, the code set or the instruction set are loaded by the processor and are executed to realize second party The garbage files recognition methods that face is provided.
Eighth aspect provides a kind of computer readable storage medium, at least one finger is stored in the storage medium Enable, at least one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or Instruction set is loaded by the processor and is executed to realize garbage files recognition methods that second aspect is provided.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:
The embodiment of the present invention by obtain include at least one garbage files the first directory information the first data set with Second data set of the second directory information including at least one non-junk file calculates separately the first data set corresponding first Eigenmatrix and the corresponding second characteristic matrix of the second data set, according to fisrt feature matrix and second characteristic matrix, trained To object-class model;On the one hand, since eigenmatrix is used to indicate the text feature of at least one directory information, by spy Sign matrix is trained to obtain object-class model so that the object-class model that training obtains is relatively reliable, and then improves Identify the accuracy rate of garbage files;On the other hand, the obtained object-class model of training for determine file to be detected whether be Garbage files so that can directly use the object-class model intelligent recognition garbage files in subsequent process, alleviate correlation The workload of manual maintenance configuration file in technology.
Description of the drawings
Figure 1A be the present embodiments relate to implementation environment schematic diagram;
Figure 1B is the flow chart of model generating method and garbage files recognition methods provided by one embodiment of the present invention;
Fig. 2 is the flow chart of model generating method and garbage files recognition methods that another embodiment of the present invention provides;
Fig. 3 is the flow chart of model generating method and garbage files recognition methods that another embodiment of the present invention provides;
Fig. 4 is the flow chart of model generating method and garbage files recognition methods that another embodiment of the present invention provides;
Fig. 5 is the principle schematic of model generating method provided by one embodiment of the present invention;
Fig. 6 is the flow chart for the garbage files recognition methods that another embodiment of the present invention provides;
Fig. 7 is the interface schematic diagram involved by garbage files recognition methods provided by one embodiment of the present invention;
Fig. 8 is the structural schematic diagram of model generating means provided by one embodiment of the present invention;
Fig. 9 is the structural schematic diagram for the model generating means that another embodiment of the present invention provides;
Figure 10 is the structural schematic diagram of garbage files identification device provided by one embodiment of the present invention;
Figure 11 is the structural schematic diagram of terminal 1100 provided by one embodiment of the present invention;
Figure 12 is the structural schematic diagram of server 1200 provided by one embodiment of the present invention.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
First, to the present embodiments relate to some nouns explain:
The anti-document frequency of word frequency-(Term Frequency-Inverse Document Frequency, TF-IDF) is calculated Method:It is a kind of algorithm for extracting the text feature of content of text.
The core concept of TF-IDF algorithms is:Cutting word is carried out to content of text, obtains each feature word;For each spy Word is levied, the TF that this feature word occurs in content of text is obtained, the IDF of this feature word is calculated, by this feature word TF-IDF value, that is, characteristic value of the product of TF and IDF as this feature word indicates text by the characteristic value of each feature word The semanteme of this content.
Logic Regression Models:It is the LR models of LR (Logistic Regression, logistic regression) algorithm structure.LR moulds Type is a kind of linear classification model, and model structure is simple, classifying quality is preferable, and has ready-made library lib (library).
Technical solution provided in an embodiment of the present invention, including model generating method and garbage files recognition methods, wherein mould Type generation method be mainly used for training obtain for determine file to be detected whether be garbage files object-class model, rubbish File identification method is mainly used for the directory information of file to be detected being input in the object-class model that above-mentioned training obtains, Recognition result is obtained, which is used to indicate whether file to be detected is garbage files.
It should be noted that the model generating method is usually completed by server;Schematically, the model generating method It can also be completed by terminal, which is usually completed by terminal, schematically, garbage files identification Method can also be completed by server.For ease of description, in following methods embodiment, model is only completed with server and is generated Method, terminal are completed to illustrate for garbage files recognition methods.
Please refer to Fig.1 A, it illustrates the present embodiments relate to implementation environment schematic diagram.The implementation environment includes Server 120 and terminal 140.
Server 120 is several servers of a server or a virtual platform, either One cloud computing service center.Server 120 is used to complete the model generating method provided in the embodiment of the present invention.
Optionally, it trains to obtain object-class model by machine learning algorithm when server 120, the object-class model When for determining whether file to be detected is garbage files, which is sent to terminal 140.
It is connected by communication network between server 120 and terminal 140.Optionally, communication network is cable network or nothing Gauze network.
Terminal 140 can be mobile phone, tablet computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, knee Mo(u)ld top half pocket computer and desktop computer etc..Terminal 140 is used to complete the garbage files provided in the embodiment of the present invention Recognition methods.
Optionally, operating system management class application program, the operating system management in terminal 140 are installed in terminal 140 Class application program is used to complete the garbage files recognition methods provided in the embodiment of the present invention;For example, operating system management class Application program is computer house keeper.
In general, when server 120 trains to obtain object-class model by machine learning algorithm, which uses When whether determine file to be detected is garbage files, which is sent to terminal 140;It is corresponding, terminal 140 According to the object-class model received, each disk file in terminal 140 is scanned and recognized, it is determined whether including rubbish Rubbish file.
B is please referred to Fig.1, it illustrates model generating methods provided by one embodiment of the present invention and garbage files identification side The flow chart of method.The model generating method and garbage files recognition methods include:
Step 101, the first data set and the second data set are obtained, the first data set includes the of at least one garbage files One directory information, the second data set include the second directory information of at least one non-junk file, the first data set and the second number Intersection is not present according to collection.
Disk file in operating system includes garbage files and non-junk file, and garbage files include system rubbish text At least one of part, software garbage file, online garbage files, registration table garbage files, cache file, non-junk file is Other disk files in addition to garbage files in an operating system.
The directory information of disk file includes the routing information of extension name information and place path.
Extension name information is used to indicate the file type of the disk file, and extension name information includes the file of the disk file Extension name (English:Filename extension), the suffix name of the entitled file of file extent;For example, the text of disk file A Part is entitled " read my .txt ", then the extension name information of disk file A is " txt ", the files classes for indicating disk file A Type is text-only file type.
Routing information is used to indicate the document location of the disk file in an operating system;For example, the path of disk file A Information is " c:\windows\xxx”.
In order to facilitate the directory information for the directory information and non-junk file for distinguishing garbage files, below only with garbage files The first directory information include the first extension name information of garbage files and the first path information in place path, non-junk file The second directory information include carrying out for the second extension name information of non-junk file and second routing information in place path Explanation.
Optionally, rubbish configuration file and non-junk configuration file are previously stored in server, in rubbish configuration file The first directory information including at least one garbage files, non-junk configuration file include the of at least one non-junk file Two directory informations.Server obtains the first data set and the mode of the second data set includes:Server is from rubbish configuration file Obtain include at least one first directory information the first data set, acquisition includes at least one the from non-junk configuration file Second data set of two directory informations.
Step 102, according to the first directory information of at least one garbage files, corresponding first spy of the first data set is calculated Matrix is levied, fisrt feature matrix is used to indicate the text feature of the first directory information.
Server determines the text feature of the first directory information according to the first directory information of at least one garbage files, Calculate the corresponding fisrt feature matrix of the first data set.The calculating process of the corresponding fisrt feature matrix of first data set is under The correlative detail in embodiment is stated, details are not described herein.
Text feature includes words-frequency feature, and the text feature of the first directory information is used to indicate words at least one first The feature of the frequency of occurrences in directory information.For example, text feature includes:The corresponding TF-IDF of words, fisrt feature matrix are TF- IDF matrixes.
Step 103, according to the second directory information of at least one non-junk file, the second data set corresponding second is calculated Eigenmatrix, second characteristic matrix are used to indicate the text feature of the second directory information.
Server determines that the text of the second directory information is special according to the second directory information of at least one non-junk file Sign calculates the corresponding second characteristic matrix of the second data set.The calculating process of fisrt feature matrix and second characteristic matrix referring to Correlative detail in following embodiments, details are not described herein.
Step 104, according to fisrt feature matrix and second characteristic matrix, training obtains object-class model, target classification Model is for determining whether file to be detected is garbage files.
Fisrt feature matrix and second characteristic matrix are input to this by the advance construction logic regression model of server, server In Logic Regression Models, training obtains object-class model.Optionally, Logic Regression Models are LR models.
It is the process for generating model above, optionally, after generating model, server is by trained object-class model It is issued to terminal, corresponding, terminal gets the object-class model, and the garbage files for executing as described below identified Journey.
Step 105, the directory information of file to be detected is obtained, the directory information of file to be detected includes file to be detected The routing information in path where extension name information and file to be detected.
Wherein, file to be detected is any one disk file in operating system.
In one possible implementation, each disk file and the corresponding mesh of each disk file are prestored in terminal Information is recorded, when terminal determines file to be detected, obtains directory information corresponding with the file to be detected.Alternatively possible In realization method, the directory information of the file to be detected is carried in file to be detected, it is right when terminal determines file to be detected File to be detected carries out the directory information that parsing obtains file to be detected.
Step 106, according to the extension name information of file to be detected and corresponding routing information, using in object-class model The recognition result of the file to be detected is obtained, recognition result is used to indicate whether file to be detected is garbage files.
After terminal is handled the extension name information of file to be detected and corresponding routing information, with the shape of feature vector Formula is input in object-class model, obtains the recognition result of the file to be detected.
It should be noted that step 101 to step 104 can be implemented separately as a kind of model generating method, step 105 It can be implemented separately as a kind of garbage files recognition methods with step 106, the present embodiment is not limited this.
In conclusion the embodiment of the present invention is by obtaining the first of the first directory information for including at least one garbage files Second data set of data set and the second directory information including at least one non-junk file, calculates separately the first data set pair The fisrt feature matrix and the corresponding second characteristic matrix of the second data set answered, according to fisrt feature matrix and second feature square Battle array, training obtain object-class model;On the one hand, since eigenmatrix is used to indicate the text spy of at least one directory information Sign, object-class model is obtained by being trained to eigenmatrix so that and the object-class model that training obtains is relatively reliable, And then improve the accuracy rate of identification garbage files;On the other hand, the object-class model that training obtains is to be detected for determining Whether file is garbage files so that the object-class model intelligent recognition garbage files can be directly used in subsequent process, Alleviate the workload of manual maintenance configuration file in the related technology.
Since the quantity of non-junk file in operating system is far longer than the quantity of garbage files, if being stored in server There is the rubbish configuration file of the first directory information including garbage files, and is stored with the second mesh for including all non-junk files The non-junk configuration file for recording information, then can waste a large amount of storage resource, therefore in one possible implementation, step 101 can realize the following steps by substituting, as shown in Figure 2:
Step 201, rubbish configuration file is obtained, rubbish configuration file includes the first of preset at least one garbage files The first path information of extension name information and place path.
Before server obtains rubbish configuration file, predefined at least in each disk file in an operating system One garbage files is stored with the first extension name information and first path information including at least one garbage files in server Rubbish configuration file.Therefore, when carrying out garbage files identification, server obtains pre-stored rubbish configuration file.
Step 202, for each garbage files, the first extension name information and first path information of garbage files are determined For the first directory information of garbage files, obtain include at least one first directory information the first data set.
Server extracts the first extension name information and first path of at least one garbage files from rubbish configuration file Information believes the first extension name information of each garbage files and first path information as the first catalogue of the garbage files Breath, to obtain the first data set.
Step 203, the corresponding directory information of k disk file in traversing operation system, directory information includes magnetic The routing information in path where the extension name information and disk file of disk file, k is positive integer.
Server passes through the catalogue of all disk files in traversing operation system after getting rubbish configuration file Information excludes disk file, that is, garbage files that directory information is the first directory information, and it is the first catalogue to obtain directory information not The disk file of information, that is, at least one non-junk file.
Optionally, the value of k is that the quantity of all disk files or the value of k are default value in the operating system. The present embodiment comparison is not limited.
Step 204, when the directory information of i-th of disk file meets the first preset condition, i-th of disk file is determined For non-junk file, i is positive integer, i≤k.
Wherein, the first preset condition includes that the extension name information of i-th of disk file and at least one first extension name are believed Breath difference and/or the corresponding routing information of i-th of disk file and at least one first path information is different.
When server detects a disk file, the extension name information of the disk file and at least one rubbish are judged Whether the first extension name information of rubbish file is identical, and judges the routing information of the disk file and at least one garbage files Whether first path information is identical;When the directory information of the disk file meets:Extension name information is extended at least one first Name is when information is different and at least one of routing information the two conditions different from least one first path information, Determine that the disk file is non-junk file.
Step 205, when it is non-junk file to detect i-th of disk file, by the directory information of i-th of disk file It is determined as the second directory information, the second directory information is added in the second data set, obtains including at least one second catalogue Second data set of information.
When it is non-junk file to detect i-th of disk file, server is true by the directory information of i-th of disk file It is set to the second directory information, i.e., the extension name information of i-th of disk file is determined as the second extension name information, by i-th of magnetic The corresponding routing information of disk file is determined as the second routing information.
For example, setting k is 5,000,000, i.e. operating system includes 5,000,000 disk files, when detecting the 5000000th magnetic Disk file be non-junk file when, obtain include 4,150,000 the second directory informations the second data set.
In conclusion the embodiment of the present invention includes also the first of at least one garbage files by the acquisition of rubbish configuration file First data set of directory information determines at least one non-junk file by each disk file in traversing operation system, Obtain the second data set of the second directory information for including at least one non-junk file;So that need not stored in server There is non-junk configuration file, and only need in the case of being stored with rubbish configuration file, accurate first data can be got Collection and the second data set, save a large amount of storage resource.
In the following, to above-mentioned step 102 fisrt feature matrix corresponding with the first data set in step 103 and the second data The generating process for collecting corresponding second characteristic matrix is schematically introduced.Step 102 and step 103 can be implemented as by replacement For the following steps, as shown in Figure 3:
Step 301, word segmentation processing is carried out at least one first directory information, obtains m fisrt feature word, m is just Integer.
Word segmentation processing refer to using preset word participle strategy by least one first directory information be divided into several the One feature word.In segmenting method, meaning of a word participle method and statistical morphology of the word participle strategy including string matching extremely Few one kind.
Due in several fisrt feature words for being obtained after word segmentation processing, including some nonsense words, and these Nonsense words do not have practical significance, are redundancies for the extraction of feature word, it is then desired to be filtered processing.
Optionally, server at least one first directory information carry out word segmentation processing obtain x fisrt feature word it Afterwards, server is filtered processing according to pre-set word filtering policy to x fisrt feature word, obtains m first Feature word, x are positive integer, m≤x.
Wherein, word filtering policy includes removing nonsense words from x fisrt feature word, it is not intended to which adopted word includes Stop words and/or unrelated part of speech word, stop words refer to meaningless common word or some symbols, for example, " ", " ", ":" etc., Unrelated part of speech word includes:Conjunction, descriptive word, tone function word, adjective, pronoun etc..
For example, the first directory information is Q, after being segmented to Q, 6 fisrt feature words can be obtained, are indicated respectively For o, p, q, r, s and t;By filtration treatment remove nonsense words, obtain 3 fisrt feature words, be expressed as o, r and t。
Step 302, the m corresponding the First Eigenvalue of fisrt feature word is calculated, the First Eigenvalue is used to indicate the The discrimination that one feature word judges garbage files.
The First Eigenvalue is used to indicate the discrimination that fisrt feature word judges garbage files, that is to say, that first Characteristic value is bigger, indicates that the fisrt feature word gets over the feature that can represent garbage files, that is, uses the fisrt feature word as rubbish The feature of rubbish file is stronger with the discrimination for distinguishing non-junk file.
The First Eigenvalue of fisrt feature word can be precalculated by algorithm, can also be trained by model It arrives.
In one possible implementation, for each fisrt feature word, the server statistics fisrt feature word The word frequency TF occurred at least one first directory information calculates the anti-document frequency IDF of the fisrt feature word, to count The product for calculating the word frequency TF and anti-document frequency IDF of the fisrt feature word, using the product as the TF- of the fisrt feature word IDF values, that is, the First Eigenvalue.
Schematically, for a destination document djIn feature word ti, word frequency TF is calculated by following formulaI, j
Wherein, TFI, jIt is characterized word tiIn destination document djIn word frequency, nI, jIt is characterized word tiIn destination document dj In occurrence number, ∑knk,jFor destination document djIn all words the sum of occurrence number.
Schematically, for a destination document djIn feature word ti, reversed document frequency is calculated by following formula IDFi
Wherein, IDFiIt is feature word tiReversed document frequency, | D | be the sum of the number of files in total document library,Be in total document library include word tiDocument document information wjThe sum of.
Step 303, according to the m corresponding the First Eigenvalue of fisrt feature word, it is corresponding to generate the first data set Fisrt feature matrix.
Server indicates the m corresponding the First Eigenvalue of fisrt feature word with a matrix type, obtains first The corresponding fisrt feature matrix of data set.For example, fisrt feature matrix is TF-IDF matrixes.
Step 304, word segmentation processing is carried out at least one second directory information, obtains n second feature word, n is just Integer.
Server carries out word segmentation processing at least one second directory information, obtains n second feature word.
Step 305, the n corresponding Second Eigenvalue of second feature word is calculated, Second Eigenvalue is used to indicate the The discrimination that two feature words judge non-junk file.
Server calculates the n corresponding Second Eigenvalue of second feature word.
Step 306, according to the n corresponding Second Eigenvalue of second feature word, it is corresponding to generate the second data set Second characteristic matrix.
Server generates the second data set corresponding the according to the n corresponding Second Eigenvalue of second feature word Two eigenmatrixes.
It should be noted that step 301 can execute side by side to step 303, with step 304 to step 306.Second data Collect corresponding second characteristic matrix generating process can analogy with reference to fisrt feature matrix in step 301 to step 303 generation Process, details are not described herein.
In order to which the object-class model obtained to training is tested, the classification accuracy of the object-class model is determined, In one possible implementation, the first data set is divided into the first training set and the first test set, by the second data set point At the second training set and the second test set;Wherein, the first training set and the second training set obtain object-class model for training, First test set and the second test set are used to be tested to obtain classification accuracy to object-class model, step 104 can by for It changes into as the following steps, as shown in Figure 4:
Step 401, the first data set is divided into the first training set and the first test set.
Optionally, the first data set includes the first directory information of y garbage files, by the first data set according to default ratio Example is randomly divided into the first training set and the first test set, and the first training set includes the first directory information of y1 garbage files, the One test set includes corresponding first directory information of y2 garbage files, and y=y1+y2, y, y1, y2 are positive integer.
Preset ratio can be y1:Y2=1:1, can also be y1:Y2=2:1, the present embodiment is not limited this.
Step 402, the second data set is divided into the second training set and the second test set.
Optionally, the second data set includes the second directory information of w garbage files, by the second data set according to default ratio Example is randomly divided into the second training set and the second test set, and the second training set includes w1 the second directory informations, the second test set packet W2 the second directory informations are included, w=w1+w2, w, w1, w2 are positive integer.
Preset ratio can be w1:W2=1:1, can also be w1:W2=2:1, the present embodiment is not limited this.
Step 403, according to fisrt feature matrix, the corresponding fisrt feature submatrix of the first training set is determined.
Server determines the y1 in the first training set the first directory informations, special from each first in fisrt feature matrix Determination and the y1 corresponding the First Eigenvalues of the first directory information are determined in value indicative, by this y1 the First Eigenvalue with matrix Form indicate, obtain the corresponding fisrt feature submatrix of the first training set.
Step 404, according to second characteristic matrix, the corresponding second feature submatrix of the second training set is determined.
Server determines and w1 the second directory informations in the second training set corresponding the according to fisrt feature matrix Two characteristic values obtain the corresponding second feature submatrix of w1 Second Eigenvalue.
Server determines the w1 in the second training set the second directory informations, special from each second in second characteristic matrix Determination and the w1 corresponding Second Eigenvalues of the second directory information are determined in value indicative, by this w1 Second Eigenvalue with matrix Form indicate, obtain the corresponding second feature submatrix of the second training set.
It should be noted that step 401 and step 403 can execute side by side with step 402 and step 404.
Step 405, according to fisrt feature submatrix and second feature submatrix, training obtains object-class model.
Optionally, for server by fisrt feature submatrix and second feature submatrix input LR models, training obtains mesh Mark disaggregated model.
In order to check the performance for the garbage files identification for training obtained object-class model, in a kind of possible realization side In formula, according to the first test set and the second test set, object-class model is tested to obtain classification accuracy.
According to fisrt feature matrix, the corresponding third feature submatrix of the first test set is determined, according to second characteristic matrix, The corresponding fourth feature submatrix of the second training set is determined, according to third feature submatrix and fourth feature submatrix, to target Disaggregated model is tested to obtain classification accuracy.
Optionally, classification accuracy includes overall classification accuracy and/or individual segregation accuracy rate, overall classification accuracy It is used to indicate the corresponding classification accuracy of totality of each classification, individual segregation accuracy rate is used to indicate corresponding point an of classification Class accuracy rate.
For example, according to third feature submatrix and fourth feature submatrix, object-class model is tested, is obtained The corresponding classification accuracy of garbage files is 90%, and the corresponding classification accuracy of non-junk file is 95%, and general classification is accurate Rate is 92%.
In a schematical example, as shown in figure 5, server obtains the first data set and the second data set, by the One data set is randomly divided into the first training set and the first test set in proportion, and the second data set is randomly divided into the second instruction in proportion Practice collection and the second test set.In the first directory information of at least one of the first training set of server pair and the second training set extremely Few one second record information carries out Text Pretreatment and feature calculation respectively, obtains the corresponding feature submatrix of the first training set 1 Feature submatrix corresponding with the second training set 2;The first directory information of at least one of first test set and second are tested The second record of at least one of collection information carries out Text Pretreatment and feature calculation respectively, obtains the corresponding feature of the first test set Submatrix 3 and the corresponding feature submatrix of the second test set 4;Wherein Text Pretreatment includes word segmentation processing and filtration treatment.Clothes Business device is trained to obtain object-class model according to feature submatrix 1 and feature submatrix 2, the target point obtained using training Class model tests feature submatrix 3 and feature submatrix 4, and it is 90% to obtain classification accuracy.
In conclusion the embodiment of the present invention is incited somebody to action also by the way that the first data set is divided into the first training set and the first test set Second data set is divided into the second training set and the second test set, according to fisrt feature matrix, determines the first training set corresponding One feature submatrix determines the corresponding second feature submatrix of the second training set, according to fisrt feature according to second characteristic matrix Submatrix and second feature submatrix, training obtain object-class model;So that passing through the first training set and the second training training Object-class model is got, object-class model is tested by the first test set and the second test set to obtain classification standard True rate checks the performance of the garbage files identification of object-class model.
Referring to FIG. 6, it illustrates the flow charts of garbage files recognition methods provided by one embodiment of the present invention.The rubbish Rubbish file identification method includes:
Step 601, each file to be detected in automatically scanning operating system.
In one possible implementation, the manually opened automatic scanner of user so that terminal scan operation system In each file to be detected.That is terminal is scanned into after the main interface for opening application-specific when getting to correspond to When the scanning trigger action of mouth, automatic scanner, i.e., each file to be detected in automatically scanning operating system are opened.It is optional , application-specific is operating system management class application program.
Scanning entrance is the operable control for opening automatic scanner.Optionally, the type for scanning entrance includes At least one of button, controllable entry, sliding block.The first cleaning entrance in the embodiment of the present invention and the second cleaning entrance Can analogy reference scan entrance description, repeat no more.
Scanning trigger action is for triggering the user's operation for opening the corresponding automatic scanner of scanning entrance.It is optional , scanning trigger action includes the group of any one or more in clicking operation, slide, pressing operation, long press operation It closes.The first trigger action and the second trigger action in the embodiment of the present invention can analogy reference scan trigger action description, no It repeats again.
In alternatively possible realization method, after system start-up, scanner program automatically scanning behaviour is started by backstage Make each file to be detected in system, simplifies user's operation.
Further, in order to clear up garbage files in time, when starting scanner program automatically scanning garbage files, The scanner program real time scan garbage files of startup can be passed through.Due to being real time scan, it is thus possible to scan in time and clear up rubbish Rubbish file, and then accelerate system running speed, and save the resource consumed by the operation of garbage files.
In addition to this, in order to avoid because of real time scan garbage files to user caused by perplex, using timing scan rubbish text The mode of part for example, scanning garbage files every 5 minutes scanner programs by startup, or passed through startup every 10 minutes Scanner program scans garbage files.The present embodiment is not limited the time of timing scan garbage files, actual application In, the time of timing scan garbage files can also be set by user.
Step 602, for each file to be detected, the directory information of file to be detected, the catalogue of file to be detected are obtained Information includes the routing information of the extension name information and file to be detected place path of file to be detected.
When terminal is scanned to a file to be detected, file to be detected is parsed, the file to be detected is obtained Extension name information and routing information.
Step 603, it according to the extension name information of file to be detected and corresponding routing information, is obtained using object-class model To the corresponding recognition result of the file to be detected, recognition result is used to indicate whether file to be detected is garbage files.
Terminal pre-processes the extension name information of file to be detected and corresponding routing information, including word segmentation processing and Filtration treatment obtains at least one feature word, calculates the characteristic value of at least one feature word, by this feature value with feature to The form of amount is input in object-class model, obtains the corresponding recognition result of the file to be detected.
Step 604, the recognition result that file to be detected is garbage files, display scanning knot are used to indicate according at least one Fruit, the scanning result include the fileinfo of at least one garbage files.
Terminal is successively identified the file to be detected scanned using object-class model, obtains each text to be detected The corresponding recognition result of part is used to indicate the recognition result that file to be detected is garbage files, display according at least one The scanning result of fileinfo including at least one garbage files.Optionally, the fileinfo packet of at least one garbage files Include but be not limited to the file type and quantity of garbage files.
Optionally, by floating frame or prompt item in the form of show scanning result in the user interface, the floating frame or Prompt item is shown in the fixed position of user interface, alternatively, the floating frame or prompt item are moved with the sliding of user interface Display.
Floating frame can fix the centre position for being shown in desktop, can also be shown in any other position of desktop, And if terminal enters other application interface, which can also move display, the present embodiment with the sliding of the page The mode and position that show scanning result are not defined.Due to scanning to after garbage files, by displaying it to User allows user clearly to know the slack-off reason of system speed, and user is promoted to be confirmed whether that cleaning scans in time Garbage files.
Step 605, display reminding information, prompt message is for prompting whether clear up garbage files.
Further, it after the garbage files that display scans, is cleared up in time to realize in the case where user confirms The garbage files scanned influence the operation of user, the present embodiment carries to avoid the file that accidentally cleaning user is being currently used The method of confession further includes while showing scanning result:When it is garbage files that recognition result, which is used to indicate file to be detected, Terminal shows the prompt message for being used for prompting whether to clear up garbage files.
In one possible implementation, prompt message include first cleaning entrance, when terminal show scanning result and When the first cleaning entrance, if user, which determines, clears up the garbage files, the first trigger action is carried out to the first cleaning entrance, so that It obtains terminal and gets corresponding first trigger action of the first cleaning entrance;If user, which determines, does not clear up the garbage files, not right First cleaning entrance carries out the first trigger action.
For example, the first cleaning entrance is the button for showing word " clearing up immediately ", button is " clear immediately when the user clicks It is used to indicate terminal when reason " and clears up the garbage files.
In alternatively possible realization method, prompt message includes the first cleaning entrance and the second cleaning entrance, works as end When end shows scanning result, the first cleaning entrance and the second cleaning entrance, if user, which determines, clears up the garbage files, to first It clears up entrance and carries out the first trigger action;If user, which determines, does not clear up the garbage files, second is carried out to the second cleaning entrance Trigger action.
For example, the first cleaning entrance is the button for showing word "Yes", the second cleaning entrance is to show word "No" Button, terminal is used to indicate when button "Yes" when the user clicks and clears up the garbage files, is used when button "No" when the user clicks The garbage files are not cleared up in instruction terminal.
Step 606, judge whether the trigger action got is the first trigger action.
When terminal judges that the trigger action got is the first trigger action, the execution step 607;When terminal judges When to go out the trigger action got not be the first trigger action, step 608 is executed.
Step 607, when the trigger action got is the first trigger action, garbage files are cleared up.
When the trigger action got is the first trigger action, garbage files are cleared up.Optionally, in cleaning garbage files Before, the directory information of the garbage files is added in rubbish configuration file, i.e., by the extension name information of the garbage files and Corresponding routing information is added in rubbish configuration file.
Step 608, when the trigger action got is not or not first trigger action, garbage files are not cleared up.
When the trigger action got is not or not first trigger action, erroneous judgement information is generated and stores, erroneous judgement information is used for File to be detected is redefined as non-junk file by instruction.
When the trigger action that terminal is got is not or not first trigger action, garbage files are not cleared up.Optionally, terminal is given birth to At erroneous judgement information, which is used to indicate is redefined as non-junk file by file to be detected, and by the erroneous judgement information It stores in the terminal, so that terminal can more accurately be identified file to be detected in subsequent process.
Optionally, terminal generate erroneous judgement information after, when terminal receive corresponding to the erroneous judgement information trigger action, should Trigger action, which is used to indicate terminal, will judge information by accident when being sent to background server, and terminal will judge information by accident according to the trigger action It is sent to server;Corresponding, server receives and stores the erroneous judgement information.Server is used for several erroneous judgements to receiving Information carries out clustering, by manually determining the erroneous judgement information for obtaining misjudged number and being higher than predetermined threshold, so that Server can be higher than the erroneous judgement information of predetermined threshold according to misjudged number in subsequent process, change to object-class model Into.
Optionally, terminal shows that scanning result, the scanning result include the first scanning result after scanning to garbage files And/or second scanning result, the first scanning result include scanning at least one come dependent on configuration file in the related technology The fileinfo of a garbage files, the garbage files recognition methods that the second scanning result is provided including the use of the embodiment of the present invention Scan the fileinfo of at least one garbage files come.
In a schematical example, as shown in fig. 7, the first scanning result 71 include system rubbish, software garbage, on These three options of net rubbish, the second scanning result 72 include this option of depth rubbish, and the first scanning result 71 acquiescence is selected It selects, and the second scanning result 72 acquiescence is not selected, then the option for needing to clear up is determined by user, " cleared up immediately " by click Button 73 determines the option cleared up and chosen with release disk space.
Following is apparatus of the present invention embodiment, can be used for executing the method for the present invention embodiment.For apparatus of the present invention reality Undisclosed details in example is applied, the method for the present invention embodiment is please referred to.
Referring to FIG. 8, it illustrates the structural schematic diagrams of model generating means provided by one embodiment of the present invention.The mould Type generating means can by special hardware circuit, alternatively, software and hardware be implemented in combination with as model generation device whole or A part, the model generating means include:Acquisition module 810, the first computing module 820, the second computing module 830 and training mould Block 840.
Acquisition module 810, for realizing above-mentioned steps 101;
First computing module 820, for realizing above-mentioned steps 102;
Second computing module 830, for realizing above-mentioned steps 103;
Training module 840, for realizing above-mentioned steps 104.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, the acquisition module 810, packet It includes:Acquiring unit 811 and the first determination unit 812;
Acquiring unit 811, for realizing above-mentioned steps 201;
First determination unit 812, for realizing above-mentioned steps 202.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, the acquisition module 810, packet It includes:Traversal Unit 813, the second determination unit 814 and obtain unit 815;
Traversal Unit 813, for realizing above-mentioned steps 203;
Second determination unit 814, for realizing above-mentioned steps 205;
Unit 815 is obtained, for realizing above-mentioned steps 206.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, at least one garbage files The first directory information include at least one garbage files the first extension name information and first path information,
Acquisition module 810 is additionally operable to realize above-mentioned steps 204.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, first computing module 820, including:First participle unit 821, the first computing unit 822 and the first generation unit 823;
First participle unit 821, for realizing above-mentioned steps 301;
First computing unit 822, for realizing above-mentioned steps 302;
First generation unit 823, for realizing above-mentioned steps 303.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, second computing module 830, including:Second participle unit 831, the second computing unit 832 and the second generation unit 833;
Second participle unit 831, for realizing above-mentioned steps 304;
Second computing unit 832, for realizing above-mentioned steps 305;
Second generation unit 833, for realizing above-mentioned steps 306.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, the device, further includes:Point Generic module 850.
Sort module 850, for realizing above-mentioned steps 401 and step 402;
Training module 840, including:Third determination unit 841, the 4th determination unit 842 and training unit 843;
Third determination unit 841, for realizing above-mentioned steps 403;
4th determination unit 842, for realizing above-mentioned steps 404;
Training unit 843, for realizing above-mentioned steps 405.
In the alternative embodiment provided based on embodiment illustrated in fig. 8, as shown in figure 9, the training unit 843, also For by fisrt feature submatrix and second feature submatrix input logic regression model, training to obtain object-class model.
Correlative detail is in combination with reference chart 1B to embodiment of the method shown in Fig. 7.Wherein, acquisition module 810 is additionally operable to reality Any other implicit or disclosed and relevant function of obtaining step in existing above method embodiment;First computing module, 820 He Second computing module 830 is additionally operable to realize any other implicit or disclosed relevant with calculating step in above method embodiment Function;Training module 840 is additionally operable to realize any other implicit or disclosed related to training step in above method embodiment Function.
Referring to FIG. 10, it illustrates the structural representations of garbage files identification device provided by one embodiment of the present invention Figure.The garbage files identification device can be by special hardware circuit, alternatively, being implemented in combination with for software and hardware is known as garbage files Other device all or part of, the object-class model generated using the model generating means of such as first aspect, the rubbish File identification device includes:Acquisition module 1010 and identification module 1020.
Acquisition module 1010, for realizing above-mentioned steps 105;
Identification module 1020, for realizing above-mentioned steps 106.
Correlative detail is in combination with reference chart 1B to embodiment of the method shown in Fig. 7.Wherein, acquisition module 1010 is additionally operable to reality Any other implicit or disclosed and relevant function of obtaining step in existing above method embodiment;Identification module 1010 is additionally operable to Realize any other implicit or disclosed and relevant function of identification step in above method embodiment.
It should be noted that the device that above-described embodiment provides, when realizing its function, only with above-mentioned each function module It divides and for example, in practical application, can be completed as needed and by above-mentioned function distribution by different function modules, The internal structure of equipment is divided into different function modules, to complete all or part of the functions described above.In addition, The apparatus and method embodiment that above-described embodiment provides belongs to same design, and specific implementation process refers to embodiment of the method, this In repeat no more.
An embodiment of the present invention provides a kind of model generation device, which includes processor and memory, It is stored at least one instruction, at least one section of program, code set or instruction set in memory, at least one instruction, at least one section Program, code set or instruction set are loaded by processor and are executed to realize the model generation side of above-mentioned each embodiment of the method offer Method.
Optionally, which is server.
An embodiment of the present invention provides a kind of garbage files identification equipment, the garbage files identification equipment include processor and Memory, is stored at least one instruction, at least one section of program, code set or instruction set in memory, at least one instruction, extremely Few one section of program, code set or instruction set are loaded by processor and are executed to realize the rubbish of above-mentioned each embodiment of the method offer File identification method.
Optionally, which is server.
1 is please referred to Fig.1, it illustrates the structural schematic diagrams of terminal 1100 provided by one embodiment of the present invention.The terminal 1100 may include RF (Radio Frequency, radio frequency) circuit 1110, to include one or more computer-readable deposits Memory 1120, input unit 1130, display unit 1140, sensor 1150, voicefrequency circuit 1160, the WiFi of storage media (wireless fidelity, Wireless Fidelity) module 1170, include there are one or more than one processing core processor The components such as 1180 and power supply 1190.It will be understood by those skilled in the art that device structure shown in Figure 11 is not constituted pair The restriction of equipment may include either combining certain components or different component cloth than illustrating more or fewer components It sets.Wherein:
RF circuits 1110 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, one or the processing of more than one processor 1180 are transferred to;In addition, the data for being related to uplink are sent to Base station.In general, RF circuits 1110 include but not limited to antenna, at least one amplifier, tuner, one or more oscillator, It is subscriber identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), double Work device etc..In addition, RF circuits 1110 can also be communicated with network and other equipment by radio communication.Wireless communication can use Any communication standard or agreement, including but not limited to GSM (Global System of Mobile communication, the whole world Mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution, long term evolution), Email, SMS (Short Messaging Service, SMS (Short Message Service)) etc..Memory 1120 can be used for storing software program and module.Processor 1180 are stored in the software program and module of memory 1120 by operation, to perform various functions at application and data Reason.Memory 1120 can include mainly storing program area and storage data field, wherein storing program area can storage program area, Application program (such as sound-playing function, image player function etc.) needed at least one function etc.;Storage data field can deposit Storage uses created data (such as audio data, phone directory etc.) etc. according to terminal 1100.In addition, memory 1120 can be with Can also include nonvolatile memory, for example, at least disk memory, a flash memory including high-speed random access memory Device or other volatile solid-state parts.Correspondingly, memory 1120 can also include Memory Controller, to provide The access of processor 1180 and input unit 1130 to memory 1120.
Input unit 1130 can be used for receiving the number or character information of input, and generate and user setting and function Control related keyboard, mouse, operating lever, optics or the input of trace ball signal.Specifically, input unit 1130 may include touching Sensitive surfaces 1131 and other input equipments 1132.Touch sensitive surface 1131, also referred to as touch display screen or Trackpad collect User on it or neighbouring touch operation (such as user using any suitable object or attachment such as finger, stylus touch-sensitive Operation on surface 1131 or near touch sensitive surface 1131), and corresponding attachment device is driven according to preset formula. Optionally, touch sensitive surface 1131 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus is examined The touch orientation of user is surveyed, and detects the signal that touch operation is brought, transmits a signal to touch controller;Touch controller from Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then gives processor 1180, and can reception processing Order that device 1180 is sent simultaneously is executed.Furthermore, it is possible to more using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Type realizes touch sensitive surface 1131.In addition to touch sensitive surface 1131, input unit 1130 can also include other input equipments 1132.Specifically, other input equipments 1132 can include but is not limited to physical keyboard, function key (such as volume control button, Switch key etc.), it is trace ball, mouse, one or more in operating lever etc..
Display unit 1140 can be used for showing information input by user or the information and equipment 110 that are supplied to user Various graphical user interface, these graphical user interface can be made of figure, text, icon, video and its arbitrary combination. Display unit 1140 may include display panel 1141, optionally, LCD (Liquid Crystal Display, liquid crystal may be used Display), the forms such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel 1141.Further, touch sensitive surface 1131 can be covered on display panel 1141, when touch sensitive surface 1131 detects on it Or after neighbouring touch operation, processor 1180 is sent to determine the type of touch event, is followed by subsequent processing device 1180 according to tactile The type for touching event provides corresponding visual output on display panel 1141.Although in fig. 11, touch sensitive surface 1131 with it is aobvious Show that panel 1141 is to realize input and input function as two independent components, but it in some embodiments it is possible to will Touch sensitive surface 1131 is integrated with display panel 1141 and realizes and outputs and inputs function.
Terminal 1100 may also include at least one sensor 1150, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ring The light and shade of border light adjusts the brightness of display panel 1141, and proximity sensor can close when terminal 1100 is moved in one's ear Display panel 1141 and/or backlight.As a kind of motion sensor, gravity accelerometer can detect in all directions The size of (generally three axis) acceleration, can detect that size and the direction of gravity, can be used to identify mobile phone posture when static It (for example pedometer, is struck using (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function Hit) etc.;Other biographies such as the gyroscope, barometer, hygrometer, thermometer, the infrared sensor that can also configure as terminal 1100 Sensor, details are not described herein.
Voicefrequency circuit 1160, loud speaker 1121, microphone 1122 can provide the audio interface between user and terminal 1100. The transformed electric signal of the audio data received can be transferred to loud speaker 1121, by loud speaker 1121 by voicefrequency circuit 1160 Be converted to voice signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 1122, by voicefrequency circuit 1160 receive after be converted to audio data, then by after the processing of audio data output processor 1180, through RF circuits 1110 to send It is exported to memory 1120 to another equipment, or by audio data to be further processed.Voicefrequency circuit 1160 is also possible to wrap Earphone jack is included, to provide the communication of peripheral hardware earphone and terminal 1100.
WiFi belongs to short range wireless transmission technology, and terminal 1100 can help user to receive and dispatch electricity by WiFi module 1170 Sub- mail, browsing webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Figure 11 shows Go out WiFi module 1170, but it is understood that, and it is not belonging to must be configured into for terminal 1100, it completely can be according to need It to be omitted in the range for the essence for not changing invention.
Processor 1180 is the control centre of terminal 1100, utilizes each portion of various interfaces and connection whole equipment Point, by running or execute the software program and/or module that are stored in memory 1120, and calls and be stored in memory Data in 1120 execute the various functions and processing data of terminal 1100, to carry out integral monitoring to equipment.Optionally, Processor 1180 may include one or more processing cores;Optionally, processor 1180 can integrate application processor and modulatedemodulate Adjust processor, wherein the main processing operation system of application processor, user interface and application program etc., modem processor Main processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 1180.
Terminal 1100 further includes the power supply 1190 (such as battery) powered to all parts, it is preferred that power supply can pass through Power-supply management system and processor 1180 are logically contiguous, to realize management charging, electric discharge, Yi Jigong by power-supply management system The functions such as consumption management.Power supply 1190 can also include one or more direct current or AC power, recharging system, power supply The random components such as fault detection circuit, power supply changeover device or inverter, power supply status indicator.
Although being not shown, terminal 1100 can also include camera, bluetooth module etc., and details are not described herein.
2 are please referred to Fig.1, it illustrates the structural schematic diagrams of server 1200 provided by one embodiment of the present invention.The clothes Business device 1200 includes central processing unit (CPU) 1201 including random access memory (RAM) 1202 and read-only memory (ROM) 1203 system storage 1204, and connect the system bus of system storage 1204 and central processing unit 1201 1205.The server 1200 further includes the basic input/output system of transmission information between each device helped in computer (the I/O systems) 1206 of system, and for the large capacity of storage program area 1213, application program 1214 and other program modules 1215 Storage device 1207.
The basic input/output 1206 includes display 1208 for showing information and is inputted for user The input equipment 1209 of such as mouse, keyboard etc of information.The wherein described display 1208 and input equipment 1209 all pass through The input and output controller 1210 for being connected to system bus 1205 is connected to central processing unit 1201.The basic input/defeated It can also includes that input and output controller 1210 is touched for receiving and handling from keyboard, mouse or electronics to go out system 1206 Control the input of multiple other equipments such as pen.Similarly, input and output controller 1210 also provide output to display screen, printer or Other kinds of output equipment.
The mass-memory unit 1207 (is not shown by being connected to the bulk memory controller of system bus 1205 Go out) it is connected to central processing unit 1201.The mass-memory unit 1207 and its associated computer-readable medium are Server 1200 provides non-volatile memories.That is, the mass-memory unit 1207 may include such as hard disk or The computer-readable medium (not shown) of person's CD-ROI drivers etc.
Without loss of generality, the computer-readable medium may include computer storage media and communication media.Computer Storage medium includes information such as computer-readable instruction, data structure, program module or other data for storage The volatile and non-volatile of any method or technique realization, removable and irremovable medium.Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storages its technologies, CD-ROM, DVD or other optical storages, tape Box, tape, disk storage or other magnetic storage apparatus.Certainly, skilled person will appreciate that the computer storage media It is not limited to above-mentioned several.Above-mentioned system storage 1204 and mass-memory unit 1207 may be collectively referred to as memory.
According to various embodiments of the present invention, the server 1200 can also be arrived by network connections such as internets Remote computer operation on network.Namely server 1200 can be connect by the network being connected on the system bus 1205 Mouth unit 1211 is connected to network 1212, in other words, can also be connected to using Network Interface Unit 1211 other kinds of Network or remote computer system (not shown).
The embodiments of the present invention are for illustration only, can not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that realizing the model generating method and garbage files identification of above-described embodiment All or part of step can be completed by hardware in method, relevant hardware can also be instructed to complete by program, institute The program stated can be stored in a kind of computer readable storage medium, and storage medium mentioned above can be read-only storage Device, disk or CD etc..In other words, it is stored at least one instruction, at least one section of program, code set in the storage medium or refers to Collection is enabled, at least one instruction, at least one section of program, code set or instruction set are loaded by processor and executed to realize as above-mentioned Model generating method in each embodiment of the method and/or garbage files recognition methods.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (15)

1. a kind of model generating method, which is characterized in that the method includes:
The first data set and the second data set are obtained, first data set includes the first catalogue letter of at least one garbage files Breath, second data set include the second directory information of at least one non-junk file, first data set and described the Intersection is not present in two data sets;
According to first directory information of at least one garbage files, corresponding first spy of first data set is calculated Matrix is levied, the fisrt feature matrix is used to indicate the text feature of first directory information;
According to second directory information of at least one non-junk file, second data set corresponding second is calculated Eigenmatrix, the second characteristic matrix are used to indicate the text feature of second directory information;
According to the fisrt feature matrix and the second characteristic matrix, training obtains object-class model, the target classification Model is for determining whether file to be detected is the garbage files.
2. according to the method described in claim 1, it is characterized in that, the first data set of the acquisition, including:
Rubbish configuration file is obtained, the rubbish configuration file includes the first extension of preset at least one garbage files The first path information of name information and place path;
For each garbage files, by the first extension name information of the garbage files and the first path information It is determined as first directory information of the garbage files, obtains including described the of at least one first directory information One data set.
3. according to the method described in claim 1, it is characterized in that, the second data set of the acquisition, including:
The corresponding directory information of k disk file in the operating system is traversed, the directory information includes the magnetic The routing information in path where the extension name information of disk file and the disk file, the k are positive integer;
When it is the non-junk file to detect i-th of disk file, by the directory information of i-th of disk file It is determined as second directory information, second directory information is added in second data set;It obtains including described Second data set of at least one second directory information, the i are positive integer, the i≤k.
4. according to the method described in claim 3, it is characterized in that, the first directory information packet of at least one garbage files The the first extension name information and first path information of at least one garbage files are included,
It is described when it is the non-junk file to detect i-th disk file, by the directory information of i-th of disk file It is determined as before second directory information, including:
When the directory information of i-th of disk file meets the first preset condition, i-th of disk file is determined For the non-junk file;
Wherein, first preset condition include i-th of disk file the extension name information with it is described at least one First extension name information is different and/or the corresponding routing information of i-th of disk file and described at least one the One routing information is different.
5. method according to any one of claims 1 to 4, which is characterized in that described according at least one garbage files First directory information, calculate the corresponding fisrt feature matrix of first data set, including:
Word segmentation processing is carried out at least one first directory information, obtains m fisrt feature word, the m is positive integer;
The corresponding the First Eigenvalue of m fisrt feature word is calculated, the First Eigenvalue is used to indicate described The discrimination that one feature word judges the garbage files;
According to the corresponding the First Eigenvalue of the m fisrt feature word, it is corresponding to generate first data set The fisrt feature matrix.
6. method according to any one of claims 1 to 4, which is characterized in that described according at least one non-junk text Second directory information of part calculates the corresponding second characteristic matrix of second data set, including:
Word segmentation processing is carried out at least one second directory information, obtains n second feature word, the n is positive integer;
The corresponding Second Eigenvalue of n second feature word is calculated, the Second Eigenvalue is used to indicate described The discrimination that two feature words judge the non-junk file;
According to the corresponding Second Eigenvalue of the n second feature word, it is corresponding to generate second data set The second characteristic matrix.
7. according to the method described in claim 1, it is characterized in that, described according to the fisrt feature matrix and described second special Levy matrix further includes before training obtains object-class model:
First data set is divided into the first training set and the first test set, second data set is divided into the second training set With the second test set, first training set and second training set obtain the object-class model for training, described First test set and second test set to the object-class model for being tested to obtain classification accuracy;
It is described that according to the fisrt feature matrix and the second characteristic matrix, training obtains object-class model, including:
According to the fisrt feature matrix, the corresponding fisrt feature submatrix of first training set is determined;
According to the second characteristic matrix, the corresponding second feature submatrix of second training set is determined;
According to the fisrt feature submatrix and the second feature submatrix, training obtains the object-class model.
8. the method according to the description of claim 7 is characterized in that described according to the fisrt feature submatrix and described second Feature submatrix, training obtain object-class model, including:
By in the fisrt feature submatrix and the second feature submatrix input logic regression model, training obtains the mesh Mark disaggregated model.
9. a kind of garbage files recognition methods, which is characterized in that using as power 1 to weigh 8 it is any as described in model generating method institute The object-class model generated, the method includes:
The directory information of file to be detected is obtained, the directory information of the file to be detected includes the extension of the file to be detected The routing information in path where name information and the file to be detected;
According to the extension name information of the file to be detected and the corresponding routing information, obtained using the object-class model To the recognition result of the file to be detected, the recognition result is used to indicate whether the file to be detected is garbage files.
10. a kind of model generating means, which is characterized in that described device includes:
Acquisition module, for obtaining the first data set and the second data set, first data set includes at least one rubbish text First directory information of part, second data set include the second directory information of at least one non-junk file, and described first Intersection is not present in data set and second data set;
First computing module calculates described first for first directory information according at least one garbage files The corresponding fisrt feature matrix of data set, the fisrt feature matrix are used to indicate the text feature of first directory information;
Second computing module, for according to second directory information of at least one non-junk file, calculating described the The corresponding second characteristic matrix of two data sets, the text that the second characteristic matrix is used to indicate second directory information are special Sign;
Training module is used for according to the fisrt feature matrix and the second characteristic matrix, and training obtains object-class model, The object-class model is for determining whether file to be detected is the garbage files.
11. a kind of garbage files identification device, which is characterized in that generated using model generating means as described in relation to the first aspect The object-class model, described device includes:
The directory information of acquisition module, the directory information for obtaining file to be detected, the file to be detected includes described waits for Detect the routing information of the extension name information and the file place to be detected path of file;
Identification module is used for the extension name information according to the file to be detected and the corresponding routing information, using described Object-class model obtains the recognition result of the file to be detected, and the recognition result, which is used to indicate the file to be detected, is No is garbage files.
12. a kind of model generation device, which is characterized in that the model generation device includes processor and memory, described to deposit Be stored at least one instruction, at least one section of program, code set or instruction set in reservoir, at least one instruction, it is described extremely Few one section of program, the code set or instruction set are loaded by the processor and are executed to realize such as any institute of claim 1 to 8 The model generating method stated.
13. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium Few one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or the instruction Collection is loaded by the processor and is executed to realize model generating method as described in any of the claims 1 to 8.
14. a kind of garbage files identification equipment, which is characterized in that the garbage files identification equipment includes processor and storage Device is stored at least one instruction, at least one section of program, code set or instruction set in the memory, and described at least one refers to It enables, at least one section of program, the code set or the instruction set are loaded by the processor and executed to realize such as claim 9 The garbage files recognition methods.
15. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium Few one section of program, code set or instruction set, at least one instruction, at least one section of program, the code set or the instruction Collection is loaded by the processor and is executed to realize garbage files recognition methods as claimed in claim 9.
CN201710791588.0A 2017-09-05 2017-09-05 Model generating method, garbage files recognition methods and device Pending CN108287860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710791588.0A CN108287860A (en) 2017-09-05 2017-09-05 Model generating method, garbage files recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710791588.0A CN108287860A (en) 2017-09-05 2017-09-05 Model generating method, garbage files recognition methods and device

Publications (1)

Publication Number Publication Date
CN108287860A true CN108287860A (en) 2018-07-17

Family

ID=62831503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710791588.0A Pending CN108287860A (en) 2017-09-05 2017-09-05 Model generating method, garbage files recognition methods and device

Country Status (1)

Country Link
CN (1) CN108287860A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868093A (en) * 2021-10-13 2021-12-31 平安银行股份有限公司 Junk file monitoring method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646086A (en) * 2013-12-13 2014-03-19 北京奇虎科技有限公司 Junk file cleaning method and device
CN105224574A (en) * 2014-06-30 2016-01-06 北京金山安全软件有限公司 Method and device for automatically identifying junk files
CN105446980A (en) * 2014-06-27 2016-03-30 北京金山安全软件有限公司 Method and device for identifying picture junk files
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN106202166A (en) * 2016-06-24 2016-12-07 北京奇虎科技有限公司 The method for cleaning of file, device and corresponding client
CN106708426A (en) * 2016-11-11 2017-05-24 努比亚技术有限公司 Garbage file recognition device and method
CN107066604A (en) * 2017-04-25 2017-08-18 努比亚技术有限公司 A kind of cleaning garbage files method and terminal
CN107086952A (en) * 2017-04-19 2017-08-22 中国石油大学(华东) A kind of Bayesian SPAM Filtering method based on TF IDF Chinese word segmentations

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646086A (en) * 2013-12-13 2014-03-19 北京奇虎科技有限公司 Junk file cleaning method and device
CN105446980A (en) * 2014-06-27 2016-03-30 北京金山安全软件有限公司 Method and device for identifying picture junk files
CN105224574A (en) * 2014-06-30 2016-01-06 北京金山安全软件有限公司 Method and device for automatically identifying junk files
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN106202166A (en) * 2016-06-24 2016-12-07 北京奇虎科技有限公司 The method for cleaning of file, device and corresponding client
CN106708426A (en) * 2016-11-11 2017-05-24 努比亚技术有限公司 Garbage file recognition device and method
CN107086952A (en) * 2017-04-19 2017-08-22 中国石油大学(华东) A kind of Bayesian SPAM Filtering method based on TF IDF Chinese word segmentations
CN107066604A (en) * 2017-04-25 2017-08-18 努比亚技术有限公司 A kind of cleaning garbage files method and terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868093A (en) * 2021-10-13 2021-12-31 平安银行股份有限公司 Junk file monitoring method, device, equipment and storage medium
CN113868093B (en) * 2021-10-13 2024-05-24 平安银行股份有限公司 Junk file monitoring method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106453053B (en) Group message display methods and device
CN110069306A (en) A kind of message display method and terminal device
US20160241589A1 (en) Method and apparatus for identifying malicious website
CN104182488A (en) Search method, server and client
CN106778117B (en) Permission open method, apparatus and system
CN110222212A (en) A kind of display control method and terminal device
CN108337374A (en) A kind of message prompt method and mobile terminal
CN109445894A (en) A kind of screenshot method and electronic equipment
CN108205408B (en) Message display method and device
US20170109756A1 (en) User Unsubscription Prediction Method and Apparatus
CN103501487A (en) Method, device, terminal, server and system for updating classifier
CN109857494A (en) A kind of message prompt method and terminal device
CN109309696A (en) Portfolios method, sender, recipient and storage medium
CN109947650A (en) Script step process methods, devices and systems
CN109871358A (en) A kind of management method and terminal device
CN110069675A (en) A kind of search method and mobile terminal
CN110471589A (en) Information display method and terminal device
CN110457086A (en) A kind of control method of application program, mobile terminal and server
CN110177040A (en) Picture sharing method and mobile terminal
CN114706895A (en) Emergency event plan recommendation method and device, storage medium and electronic equipment
CN108595481A (en) A kind of notification message display methods and terminal device
CN108763544A (en) A kind of display methods and terminal
CN104731782B (en) A kind of method and mobile terminal of information processing
CN109344125A (en) A kind of file name update method and terminal device
CN108628534A (en) A kind of character methods of exhibiting and mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180717