CN105159913B - Method and device for determining file to be cleaned - Google Patents

Method and device for determining file to be cleaned Download PDF

Info

Publication number
CN105159913B
CN105159913B CN201510392012.8A CN201510392012A CN105159913B CN 105159913 B CN105159913 B CN 105159913B CN 201510392012 A CN201510392012 A CN 201510392012A CN 105159913 B CN105159913 B CN 105159913B
Authority
CN
China
Prior art keywords
keywords
file
white
black
target file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510392012.8A
Other languages
Chinese (zh)
Other versions
CN105159913A (en
Inventor
白锡亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201510392012.8A priority Critical patent/CN105159913B/en
Publication of CN105159913A publication Critical patent/CN105159913A/en
Application granted granted Critical
Publication of CN105159913B publication Critical patent/CN105159913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system

Abstract

The embodiment of the invention discloses a method and a device for determining a file to be cleaned, which are used for scanning a storage space and extracting path information of a target file; according to a preset blacklist and a preset white list, searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file, and determining whether the target file is a file to be cleaned according to a search result. Compared with the prior art, the blacklist or the white list in the embodiment of the invention only comprises a plurality of keywords, so that compared with the method for cleaning the characteristics of the files in the file list, the occupied storage space is reduced; in addition, according to the embodiment of the invention, whether the target file is the file to be cleaned can be determined only by judging whether the path information of the target file comprises the black keyword or the white keyword, the complexity of the file matching process is low, and the comprehensiveness of determining the file to be cleaned is improved.

Description

Method and device for determining file to be cleaned
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for determining a file to be cleaned.
Background
Various application programs are installed in the mobile device, and during the process of using the mobile device by a user, the application programs may create files, for example, some pictures received by the application programs may be stored in picture files, some data required during the operation of the application programs may be stored in cache files, and the picture files or cache files may not be required after being used, but occupy a large amount of storage space of the mobile device, resulting in insufficient storage space of the mobile device. It is therefore necessary to clean these files to free up storage space on the mobile device.
According to the prior art, in order to clean a file of a mobile device, a cleanable file list is required to be preset, and file characteristic information of a plurality of known cleanable files (such as a path of the file and a name of an application installation package to which the file belongs) is recorded in the list. When the file is cleaned, firstly, the storage space of the mobile terminal is scanned, the characteristics of the file in the storage space are extracted, the extracted characteristics are compared with the characteristics recorded in the list, and if the characteristics of a certain file in the storage space are consistent with the characteristics of the files in the list, the file is determined to be a cleanable file.
According to prior art solutions, the characteristics of the file in the mobile device are required to be consistent with the characteristics in the list in order to allow the file to be determined as a cleanable file. However, with the increase of the number and types of application programs, the number of files to be cleaned is more and more, and in order to realize comprehensive cleaning, a cleanable file list is inevitably larger and larger, and the cleanable file list occupies extra space; on the other hand, it also results in increased complexity of the file matching process, reduced speed of determining cleanable files, and increased additional consumption of system resources.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for determining a file to be cleaned, so as to realize comprehensive cleaning and reduce the additional consumption of system resources.
In order to achieve the above object, an embodiment of the present invention discloses a method for determining a file to be cleaned, including:
scanning the storage space, and extracting path information of a target file;
according to a preset blacklist and a preset white list, searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
and determining whether the target file is a file to be cleaned or not according to the search result.
Optionally, after scanning the storage space and extracting the path information of the target file, before searching for the black keyword in the black list and the white keyword in the white list in the path information of the target file according to a preset black list and a preset white list, the method further includes:
judging whether the extracted path information of the target file has the installation package name of the application program to which the target file belongs;
removing the installation package name from the extracted path information of the target file under the condition that the installation package name exists;
the searching for the black keywords in the black list and the white keywords in the white list in the path information of the target file according to the preset black list and the preset white list includes:
and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
Optionally, the determining, according to the search result, whether the target file is a file to be cleaned includes:
under the condition that the path information of the target file contains the black keyword, further judging whether the black keyword belongs to a first type keyword; the first type keywords are high-frequency keywords obtained by counting path information of the cleanable files; determining that the target file is a file to be cleaned under the condition that the black keywords belong to the first type keywords;
alternatively, the first and second electrodes may be,
under the condition that the path information of the target file contains the white keyword, further judging whether the white keyword belongs to a second type keyword; the second type keywords are high-frequency keywords obtained by counting path information of the unclonable files; determining that the target file is not a file to be cleaned under the condition that the white keywords belong to the second type keywords;
alternatively, the first and second electrodes may be,
determining that the target file is a file to be cleaned under the condition that the path information of the target file contains the black keyword and does not contain the white keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains the white keyword and does not contain the black keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains at least two white keywords;
alternatively, the first and second electrodes may be,
and under the condition that the path information of the target file contains at least two black keywords, determining that the target file is a file to be cleaned.
Optionally, the determining, according to the search result, whether the target file is a file to be cleaned includes:
under the condition that the path information of the target file comprises the white keywords and the black keywords, judging whether the black keywords belong to first type keywords or not, and judging whether the white keywords belong to second type keywords or not, wherein the first type keywords are high-frequency keywords obtained by counting the path information of the file which can be cleaned, and the second type keywords are high-frequency keywords obtained by counting the path information of the file which can not be cleaned;
determining the target file as a file to be cleaned under the condition that the black keywords belong to a first type of keywords and the white keywords do not belong to a second type of keywords;
determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords;
and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
Optionally, the determining, according to the search result, whether the target file is a file to be cleaned includes:
under the condition that the path information of the target file contains the black keywords and the white keywords, judging whether the white keywords are keywords in a preset keyword group;
if yes, acquiring the application category of the target file;
and determining whether the target file is a file to be cleaned or not according to the category of the application to which the target file belongs.
A to-be-cleaned file determining apparatus comprising: a path information extraction unit, a keyword search unit and a file determination unit,
the path information extraction unit is used for scanning the storage space and extracting the path information of the target file;
the keyword searching unit is used for searching the black keywords in the black list and the white keywords in the white list in the path information of the target file according to a preset black list and a preset white list; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
and the file determining unit is used for determining whether the target file is a file to be cleaned according to the searching result.
Optionally, the apparatus further comprises: a package name judging unit and a package name removing unit,
the package name judging unit is used for scanning the storage space by the path information extracting unit and judging whether the extracted path information of the target file has an installation package name of an application program to which the target file belongs or not before searching the black keywords in the black list and the white keywords in the white list in the path information of the target file according to a preset black list and a preset white list after the path information of the target file is extracted by the keyword searching unit;
the package name removing unit is used for removing the installation package name from the extracted path information of the target file under the condition that the installation package name exists;
the keyword search unit is specifically configured to: and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
Optionally, the file determining unit is specifically configured to:
under the condition that the path information of the target file contains the black keyword, further judging whether the black keyword belongs to a first type keyword; the first type keywords are high-frequency keywords obtained by counting path information of the cleanable files; determining that the target file is a file to be cleaned under the condition that the black keywords belong to the first type keywords;
alternatively, the first and second electrodes may be,
under the condition that the path information of the target file contains the white keyword, further judging whether the white keyword belongs to a second type keyword; the second type keywords are high-frequency keywords obtained by counting path information of the unclonable files; determining that the target file is not a file to be cleaned under the condition that the white keywords belong to the second type keywords;
alternatively, the first and second electrodes may be,
determining that the target file is a file to be cleaned under the condition that the path information of the target file contains the black keyword and does not contain the white keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains the white keyword and does not contain the black keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains at least two white keywords;
alternatively, the first and second electrodes may be,
and under the condition that the path information of the target file contains at least two black keywords, determining that the target file is a file to be cleaned.
Optionally, the file determining unit includes: a first judging subunit and a first determining subunit,
the first judging subunit is configured to, when the path information of the target file includes the white keyword and the black keyword, judge whether the black keyword belongs to a first type keyword, and judge whether the white keyword belongs to a second type keyword, where the first type keyword is a high-frequency keyword obtained by counting path information of a cleanable file, and the second type keyword is a high-frequency keyword obtained by counting path information of a unclonable file;
the first determining subunit is configured to determine that the target file is a file to be cleaned when the black keyword belongs to a first type of keyword and the white keyword does not belong to a second type of keyword; determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords; and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
Optionally, the file determining unit includes: a second judging subunit, a category acquiring subunit and a second determining subunit,
the second judging subunit is configured to, when the path information of the target file includes both the black keyword and the white keyword, judge whether the white keyword is a keyword in a preset keyword group, and if so, trigger the category acquiring subunit;
the category obtaining subunit is configured to obtain a category of an application to which the target file belongs;
and the second determining subunit is configured to determine whether the target file is a file to be cleaned according to the category of the application to which the target file belongs.
According to the technical scheme provided by the embodiment of the invention, the storage space is scanned, and the path information of the target file is extracted; according to a preset blacklist and a preset white list, searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file, and determining whether the target file is a file to be cleaned according to a search result. Compared with the prior art, the blacklist or the white list in the embodiment of the invention only comprises a plurality of keywords, so that compared with the method for cleaning the characteristics of the files in the file list, the occupied storage space is reduced; in addition, according to the embodiment of the invention, whether the target file is the file to be cleaned can be determined only by judging whether the path information of the target file comprises the black keyword or the white keyword, the complexity of the file matching process is low, and the comprehensiveness of determining the file to be cleaned is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for determining a file to be cleaned according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for determining a file to be cleaned according to an embodiment of the present invention;
fig. 3 is a flowchart of a specific execution process of step S300 in another method for determining a file to be cleaned according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for determining a file to be cleaned according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a to-be-cleaned file determining apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for determining a file to be cleaned, where the method may include the following steps:
s100: scanning the storage space, and extracting path information of a target file;
the method comprises the steps of scanning files in a storage space of the mobile equipment one by one, wherein the scanned files are all target files, and when one target file is scanned, extracting a path of the target file and an installation package name of an application program to which the target file belongs. Specifically, the path may be extracted from the attribute information of the target file.
In the embodiment of the invention, the storage space can be scanned by receiving a file cleaning instruction of a user; when an instruction of a user for clearing files is received, scanning a storage space according to the instruction of clearing files, and extracting a path of a target file and an installation package name of an application program to which the target file belongs; or a time condition can be preset, and whether the preset time condition is met or not is judged; and under the condition of meeting a preset time condition, scanning the storage space, and extracting the path of the target file and the installation package name of the application program to which the target file belongs.
Specifically, taking a preset time condition as an example, the step of judging whether the preset time condition is met may be judging whether the current time reaches a preset scanning time point; or judging whether the time length passed after the last scanning reaches a preset time length threshold value. For example, 12 pm of each day is set as the scanning time point, and when it is determined that 12 pm has been reached, scanning of the storage space of the mobile device is started. The time length threshold between two times of scanning can be preset to be one day, and after one time of scanning is finished, the time of one day is judged to be reached, and the storage space of the mobile device is scanned.
It should be noted that, in the embodiment of the present invention, there is no limitation on how to start scanning, and there is no limitation on a preset time condition, and a user may arbitrarily set a mode of starting scanning, and may also arbitrarily set a time condition.
S200: according to a preset blacklist and a preset white list, searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
in the embodiment of the invention, the path information of the target file is matched with the black keywords in the black list or the white keywords in the white list, and the black keywords or the white keywords are searched in the path information of the target file. The black and white keywords are extracted from a large amount of path information. Some of the black keywords in the black list can be first-type keywords, the first-type keywords are high-frequency keywords obtained by counting path information of the cleanable file, and the target file can be determined to be the file to be cleaned as long as the black keywords belonging to the first-type keywords appear in the path information of the target file, regardless of whether other white keywords exist in the path information of the target file; and a part of the white keywords in the white list can be second-type keywords, the second-type keywords are high-frequency keywords obtained by counting the path information of the uncleanable file, and as long as the white keywords belonging to the second-type keywords appear in the path information of the target file, the target file can be determined not to be the file to be cleaned, and no matter whether other black keywords exist in the path information of the target file or not.
It should be noted that the black keywords belonging to the first type of keywords and the white keywords belonging to the second type of keywords do not appear in the path information at the same time; and black keywords not belonging to the first type of keywords and white keyword words not belonging to the second type of keywords may appear in the path information of the same target document at the same time.
S300: and determining whether the target file is the file to be cleaned or not according to the search result.
In the embodiment of the present invention, the path information of the target file may include only the black keyword, may include only the white keyword, or may include both the black keyword and the white keyword.
In an embodiment of the present invention, if the path information of the target file only contains the black keyword, the target file may be directly determined as the file to be cleaned. If the path information of the target file only contains the white keyword, the target file can be directly determined not to be the file to be cleaned. If the path information of the target file contains both the black keywords and the white keywords, the target file can be directly determined to be the file to be cleaned according to the fact that the black keywords belong to the first type of keywords; it can be directly determined that the target file is not the file to be cleaned according to the fact that the white keywords belong to the second type keywords. If the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, determining that the target file is the file to be cleaned according to the fact that the black keywords are located behind the white keywords in the path information of the target file; and determining that the target file is not the file to be cleaned according to the fact that the white key words are located in front of the black key words. Under the condition that the path information of the target file contains both black keywords and white keywords, if the path information contains at least two black keywords, directly determining that the target file is a file to be cleaned; and if the path information of the target file at least comprises two white keywords, directly determining that the target file is not the file to be cleaned.
It should be noted that, under the condition that the path information of the target file contains both the black keywords and the white keywords, the black keywords belonging to the first type of keywords and the white keywords belonging to the second type of keywords do not appear at the same time; when the path information of the target file contains at least two black keywords, at least two white keywords are not contained at the same time.
According to the method for determining the file to be cleaned, provided by the embodiment of the invention, the blacklist or the white list only comprises a few key words, and compared with the characteristics of the file in the file list which can be cleaned, the occupied storage space is reduced; in addition, according to the embodiment of the invention, whether the target file is the file to be cleaned can be determined only by judging whether the path information of the target file comprises the black keyword or the white keyword, the complexity of the file matching process is low, and the comprehensiveness of determining the file to be cleaned is improved.
As shown in fig. 2, in the embodiment of the present invention, in order to achieve a better technical effect, after step S100 and before step S200, step S101 and step S102 may be further included:
s101: judging whether the extracted path information of the target file has the installation package name of the application program to which the target file belongs;
s102: and in the case that the installation package name exists, removing the installation package name from the extracted path information of the target file.
If the installation package name does not exist, the execution of the file to be cleaned determining method of the invention can be ended.
Meanwhile, step S200 may include:
and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
In the case that the installation package name exists in the path information, since the path information may include the same keyword as the installation package name, there may exist a keyword that is hit repeatedly when matching a black keyword or a white keyword, and in order to avoid repeated hits of the keyword in the path information, the installation package name needs to be removed.
Specifically, two characters ". or com" may exist in the installation package name, and the installation package name is found according to the characters ". or com" existing in the path information, and the installation package name is removed.
In the embodiment of the invention, the installation package name is removed from the path information of the target file, so that the situation that when the path information contains the installation package name, the black keyword or the white keyword possibly contained in the installation package name is repeatedly hit when the black keyword or the white keyword is matched is avoided.
In one embodiment of the present invention, step S300 may include any one of the following six ways:
in a first mode, under the condition that the path information of the target file contains the black keyword, further judging whether the black keyword belongs to a first type keyword; the first type keywords are high-frequency keywords obtained by counting path information of the cleanable files; determining that the target file is a file to be cleaned under the condition that the black keywords belong to the first type keywords;
specifically, the first type of keyword may include: keywords for paths of the temporal type, such as: temp, tmp,. temp, etc.; all advertising words, such as: ad. ads, banner, _ chartboost, etc.; keywords for paths of all log types, such as: log; keywords such as emaillicon,/ddad/error, etc. may also be used.
In the embodiment of the invention, as long as the black keywords belonging to the first type of keywords appear in the path information of the target file, the target file can be directly determined as the file to be cleaned, and whether the path information also contains the white keywords or not is not considered.
In a second mode, under the condition that the path information of the target file contains the white keyword, further judging whether the white keyword belongs to a second type keyword; the second type keywords are high-frequency keywords obtained by counting path information of the unclonable files; determining that the target file is not a file to be cleaned under the condition that the white keywords belong to the second type keywords;
specifically, the second type keyword may include: keywords such as screen, recycle,/bak/,/shot/, direct, and the like; encryption type keywords may also be included, such as: watermark, key, etc.
In the embodiment of the invention, as long as the white keywords belonging to the second type keywords appear in the path information of the target file, the target file can be directly determined not to be the file to be cleaned, and whether the path information also contains the black keywords or not is not considered.
Determining that the target file is a file to be cleaned under the condition that the path information of the target file contains the black keyword and does not contain the white keyword;
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains the white keyword and does not contain the black keyword;
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains at least two white keywords;
and a sixth mode, under the condition that the path information of the target file contains at least two black keywords, determining that the target file is a file to be cleaned.
In the embodiment of the invention, the target file is directly determined to be the file to be cleaned as long as the path information of the target file contains more than two black keywords; the method comprises more than two white keywords, and the target file is directly determined not to be the file to be cleaned.
As shown in fig. 3, in an embodiment of the present invention, step S300 may include:
s311, under the condition that the path information of the target file comprises the white keyword and the black keyword, judging whether the black keyword belongs to a first type keyword or not, and judging whether the white keyword belongs to a second type keyword or not, wherein the first type keyword is a high-frequency keyword obtained by counting the path information of the file which can be cleaned, and the second type keyword is a high-frequency keyword obtained by counting the path information of the file which can not be cleaned;
s312, under the condition that the black keywords belong to the first type of keywords and the white keywords do not belong to the second type of keywords, determining the target file as a file to be cleaned; determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords; and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
There may be many white keywords that do not belong to the second type of keyword, for example: album,/bg/, asset, book, collection, commun, cover, reader, resources, save, screen, skin, packer, ucam, storage, patch, photo,/public/, portal, picture, fact, emot, face, fast, head, game, emoji, wrapper, watermark, wallpaper, message, manga, movie, note, the name, backup, etc.
It should be noted that,/bg,/public represents bg, public, and white key words when they belong to the primary path.
It should be noted that the white keyword, in which the black keyword exists later, is not a keyword of a type related to the user or a storage type keyword.
In the embodiment of the invention, when the black keywords which do not belong to the first type of keywords and the white keywords which do not belong to the second type of keywords appear in the path information of the target file, the target file can be determined as the file to be cleaned according to the position relationship of the black keywords behind the white keywords.
As shown in fig. 4, in an embodiment of the present invention, step S300 in the method shown in fig. 1 may include:
s321: under the condition that the path information of the target file contains both the black keyword and the white keyword, judging whether the white keyword is a keyword in a preset keyword group, if so, executing step S322;
if the white keyword is not the keyword in the preset keyword group, the execution of the present invention may be ended, or step S311 and step S312 shown in fig. 3 may be continuously executed.
In the embodiment of the present invention, the preset keyword group may be a keyword group with certain characteristics, such as a resource type keyword group, a storage type keyword group, and the like.
The storage type key may be: store, etc., which may appear in the path information as a white key in the case where the target file belongs to different applications.
The key phrases of the resource types may include: user, head, portrait, resources, date,/res/etc.
S322: acquiring the category of the application to which the target file belongs;
and when the white keyword contained in the path information of the target file belongs to one of user, head, portrait and resources, acquiring the category of the application to which the target file belongs.
S323: and determining whether the target file is a file to be cleaned according to the application category to which the target file belongs.
If the white keyword belongs to one of a user, a head and a portrait, and the category of the application to which the target file belongs is a social category, determining that the target file is not a file to be cleaned; if the file is not the social class, the target file is determined to be the file to be cleaned according to the fact that the white key words are in front and the black key words are behind. If the white key words are one of resources, date and/res, determining that the target file is not the file to be cleaned according to the fact that the application category of the target file is a game category, and if the application category of the target file is not the game category, determining that the target file is the file to be cleaned according to the fact that the white key words are in the front and the black key words are in the back. If the category of the application to which the target file belongs is a reading category, determining that the target file is not a file to be cleaned; and if the file is not the reading class, determining whether the target file is the file to be cleaned according to the positions of the black key words and the white key words.
In the embodiment of the invention, when the path information of the target file belongs to the storage type key words and the category of the application to which the target file belongs to the reading class, the target file can be determined not to be the file to be cleaned.
In the embodiment of the present invention, as long as one of user, head, portrait, resources, date, and/res is used in the path information of the target file, it is necessary to determine whether the target file is a file to be cleaned or not according to the class of the application to which the target file belongs.
Corresponding to the method embodiment, the invention also provides a device for determining the file to be cleaned.
As shown in fig. 5, an apparatus for determining a file to be cleaned according to an embodiment of the present invention may include: the path information extraction unit 100, the keyword search unit 200 and the file determination unit 300,
a path information extraction unit 100 for scanning the storage space and extracting path information of the target file;
the method comprises the steps of scanning files in a storage space of the mobile equipment one by one, wherein the scanned files are all target files, and when one target file is scanned, extracting a path of the target file and an installation package name of an application program to which the target file belongs. Specifically, the path may be extracted from the attribute information of the target file.
In the embodiment of the invention, the storage space can be scanned by receiving a file cleaning instruction of a user; when an instruction of a user for clearing files is received, scanning a storage space according to the instruction of clearing files, and extracting a path of a target file and an installation package name of an application program to which the target file belongs; or a time condition can be preset, and whether the preset time condition is met or not is judged; and under the condition of meeting a preset time condition, scanning the storage space, and extracting the path of the target file and the installation package name of the application program to which the target file belongs.
The keyword searching unit 200 is configured to search, according to a preset blacklist and a preset white list, for a black keyword in the blacklist and a white keyword in the white list in the path information of the target file; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
in the embodiment of the invention, the path information of the target file is matched with the black keywords in the black list or the white keywords in the white list, and the black keywords or the white keywords are searched in the path information of the target file. The black and white keywords are extracted from a large amount of path information. Some of the black keywords in the black list can be first-type keywords, the first-type keywords are high-frequency keywords obtained by counting path information of the cleanable file, and the target file can be determined to be the file to be cleaned as long as the black keywords belonging to the first-type keywords appear in the path information of the target file, regardless of whether other white keywords exist in the path information of the target file; and a part of the white keywords in the white list can be second-type keywords, the second-type keywords are high-frequency keywords obtained by counting the path information of the uncleanable file, and as long as the white keywords belonging to the second-type keywords appear in the path information of the target file, the target file can be determined not to be the file to be cleaned, and no matter whether other black keywords exist in the path information of the target file or not.
The file determining unit 300 is configured to determine whether the target file is a file to be cleaned according to the search result.
In the embodiment of the present invention, the path information of the target file may include only the black keyword, may include only the white keyword, or may include both the black keyword and the white keyword.
In an embodiment of the present invention, if the path information of the target file only contains the black keyword, the target file may be directly determined as the file to be cleaned. If the path information of the target file only contains the white keyword, the target file can be directly determined not to be the file to be cleaned. If the path information of the target file contains both the black keywords and the white keywords, the target file can be directly determined to be the file to be cleaned according to the fact that the black keywords belong to the first type of keywords; it can be directly determined that the target file is not the file to be cleaned according to the fact that the white keywords belong to the second type keywords. If the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, determining that the target file is the file to be cleaned according to the fact that the black keywords are located behind the white keywords in the path information of the target file; and determining that the target file is not the file to be cleaned according to the fact that the white key words are located in front of the black key words. Under the condition that the path information of the target file contains both black keywords and white keywords, if the path information contains at least two black keywords, directly determining that the target file is a file to be cleaned; and if the path information of the target file at least comprises two white keywords, directly determining that the target file is not the file to be cleaned.
It should be noted that, under the condition that the path information of the target file contains both the black keywords and the white keywords, the black keywords belonging to the first type of keywords and the white keywords belonging to the second type of keywords do not appear at the same time; when the path information of the target file contains at least two black keywords, at least two white keywords are not contained at the same time.
In an embodiment of the present invention, the file determining unit 300 may be specifically configured to:
under the condition that the path information of the target file contains the black keyword, further judging whether the black keyword belongs to a first type keyword; the first type keywords are high-frequency keywords obtained by counting path information of the cleanable files; determining that the target file is a file to be cleaned under the condition that the black keywords belong to the first type keywords;
alternatively, the first and second electrodes may be,
under the condition that the path information of the target file contains the white keyword, further judging whether the white keyword belongs to a second type keyword; the second type keywords are high-frequency keywords obtained by counting path information of the unclonable files; determining that the target file is not a file to be cleaned under the condition that the white keywords belong to the second type keywords;
alternatively, the first and second electrodes may be,
determining that the target file is a file to be cleaned under the condition that the path information of the target file contains the black keyword and does not contain the white keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains the white keyword and does not contain the black keyword;
alternatively, the first and second electrodes may be,
determining that the target file is not a file to be cleaned under the condition that the path information of the target file contains at least two white keywords;
alternatively, the first and second electrodes may be,
and under the condition that the path information of the target file contains at least two black keywords, determining that the target file is a file to be cleaned.
In another embodiment of the present invention, the file determining unit 300 may include: a first judging subunit and a first determining subunit,
the first judging subunit is configured to, when the path information of the target file includes the white keyword and the black keyword, judge whether the black keyword belongs to a first type keyword, and judge whether the white keyword belongs to a second type keyword, where the first type keyword is a high-frequency keyword obtained by counting path information of a cleanable file, and the second type keyword is a high-frequency keyword obtained by counting path information of a unclonable file;
the first determining subunit is configured to determine that the target file is a file to be cleaned when the black keyword belongs to a first type of keyword and the white keyword does not belong to a second type of keyword; determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords; and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
There may be many white keywords that do not belong to the second type of keyword, for example: album,/bg/, asset, book, collection, commun, cover, reader, resources, save, screen, skin, packer, ucam, storage, patch, photo,/public/, portal, picture, fact, emot, face, fast, head, game, emoji, wrapper, watermark, wallpaper, message, manga, movie, note, the name, backup, etc.
It should be noted that,/bg,/public represents bg, public, and white key words when they belong to the primary path.
It should be noted that the white keyword, in which the black keyword exists later, is not a keyword of a type related to the user or a storage type keyword.
In the embodiment of the invention, when the black keywords which do not belong to the first type of keywords and the white keywords which do not belong to the second type of keywords appear in the path information of the target file, the target file can be determined as the file to be cleaned according to the position relationship of the black keywords behind the white keywords.
In another embodiment of the present invention, the file determining unit 300 may include: a second judging subunit, a category acquiring subunit and a second determining subunit,
the second judging subunit is configured to, when the path information of the target file includes both the black keyword and the white keyword, judge whether the white keyword is a keyword in a preset keyword group, and if so, trigger the category acquiring subunit;
in the embodiment of the present invention, the preset keyword group may be a keyword group with certain characteristics, such as a resource type keyword group, a storage type keyword group, and the like.
The storage type key may be: store, etc., which may appear in the path information as a white key in the case where the target file belongs to different applications.
The key phrases of the resource types may include: user, head, portrait, resources, date,/res/etc.
The category obtaining subunit is configured to obtain a category of an application to which the target file belongs;
and when the white keyword contained in the path information of the target file belongs to one of user, head, portrait and resources, acquiring the category of the application to which the target file belongs.
And the second determining subunit is configured to determine whether the target file is a file to be cleaned according to the category of the application to which the target file belongs.
If the white keyword belongs to one of a user, a head and a portrait, and the category of the application to which the target file belongs is a social category, determining that the target file is not a file to be cleaned; if the file is not the social class, the target file is determined to be the file to be cleaned according to the fact that the white key words are in front and the black key words are behind. If the white key words are one of resources, date and/res, determining that the target file is not the file to be cleaned according to the fact that the application category of the target file is a game category, and if the application category of the target file is not the game category, determining that the target file is the file to be cleaned according to the fact that the white key words are in the front and the black key words are in the back. If the category of the application to which the target file belongs is a reading category, determining that the target file is not a file to be cleaned; and if the file is not the reading class, determining whether the target file is the file to be cleaned according to the positions of the black key words and the white key words.
In the embodiment of the invention, when the path information of the target file belongs to the storage type key words and the category of the application to which the target file belongs to the reading class, the target file can be determined not to be the file to be cleaned.
In other embodiments of the present invention, the apparatus shown in fig. 5 may further include: a package name judging unit and a package name removing unit,
the package name determining unit is configured to, after the path information extracting unit 100 scans a storage space and extracts path information of a target file, determine, by the keyword searching unit 200, whether an installation package name of an application program to which the target file belongs exists in the extracted path information of the target file before searching a black keyword in a black list and a white keyword in a white list in the path information of the target file according to a preset black list and a white list;
the package name removing unit is used for removing the installation package name from the extracted path information of the target file under the condition that the installation package name exists;
the keyword search unit 200 is specifically configured to: and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
In the case that the installation package name exists in the path information, since the path information may include the same keyword as the installation package name, there may exist a keyword that is hit repeatedly when matching a black keyword or a white keyword, and in order to avoid repeated hits of the keyword in the path information, the installation package name needs to be removed.
Specifically, two characters ". or com" may exist in the installation package name, and the installation package name is found according to the characters ". or com" existing in the path information, and the installation package name is removed.
In the embodiment of the invention, the installation package name is removed from the path information of the target file, so that the situation that when the path information contains the installation package name, the black keyword or the white keyword possibly contained in the installation package name is repeatedly hit when the black keyword or the white keyword is matched is avoided.
According to the device for determining the files to be cleaned, provided by the embodiment of the invention, the blacklist or the white list only comprises some key words, and compared with the characteristics of the files in the file list, the occupied storage space is reduced; in addition, according to the embodiment of the invention, whether the target file is the file to be cleaned can be determined only by judging whether the path information of the target file comprises the black keyword or the white keyword, the complexity of the file matching process is low, and the comprehensiveness of determining the file to be cleaned is improved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct relevant hardware to perform the steps, and the program may be stored in a computer-readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (4)

1. A method for determining a file to be cleaned is characterized by comprising the following steps:
scanning the storage space, and extracting path information of a target file;
according to a preset blacklist and a preset white list, searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
determining whether the target file is a file to be cleaned or not according to the search result;
wherein, the determining whether the target file is a file to be cleaned according to the search result comprises:
under the condition that the path information of the target file comprises the white keywords and the black keywords, judging whether the black keywords belong to first type keywords or not, and judging whether the white keywords belong to second type keywords or not, wherein the first type keywords are high-frequency keywords obtained by counting the path information of the file which can be cleaned, and the second type keywords are high-frequency keywords obtained by counting the path information of the file which can not be cleaned; determining the target file as a file to be cleaned under the condition that the black keywords belong to a first type of keywords and the white keywords do not belong to a second type of keywords; determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords; and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
2. The method according to claim 1, wherein after the scanning the storage space and extracting the path information of the target file, before searching for the black keyword in the black list and the white keyword in the white list in the path information of the target file according to a preset black list and a preset white list, the method further comprises:
judging whether the extracted path information of the target file has the installation package name of the application program to which the target file belongs;
removing the installation package name from the extracted path information of the target file under the condition that the installation package name exists;
the searching for the black keywords in the black list and the white keywords in the white list in the path information of the target file according to the preset black list and the preset white list includes:
and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
3. A to-be-cleaned-document determining apparatus, comprising: a path information extraction unit, a keyword search unit and a file determination unit,
the path information extraction unit is used for scanning the storage space and extracting the path information of the target file;
the keyword searching unit is used for searching the black keywords in the black list and the white keywords in the white list in the path information of the target file according to a preset black list and a preset white list; wherein, the black keywords are: keywords which can clear the file path and are contained in the blacklist; the white keywords are: keywords of unclonable file paths contained in the white list;
the file determining unit is used for determining whether the target file is a file to be cleaned or not according to the searching result;
wherein the file determining unit includes: the first judging subunit is used for judging whether the black keyword belongs to a first type keyword or not and judging whether the white keyword belongs to a second type keyword or not under the condition that the path information of the target file comprises the white keyword and the black keyword, wherein the first type keyword is a high-frequency keyword obtained by counting the path information of the cleanable file, and the second type keyword is a high-frequency keyword obtained by counting the path information of the unclonable file; the first determining subunit is configured to determine that the target file is a file to be cleaned when the black keyword belongs to a first type of keyword and the white keyword does not belong to a second type of keyword; determining that the target file is not a file to be cleaned under the condition that the black keywords do not belong to the first type of keywords and the white keywords belong to the second type of keywords; and under the condition that the black keywords do not belong to the first type of keywords and the white keywords do not belong to the second type of keywords, judging the positions of the black keywords and the white keywords in the path information of the target file, determining that the target file is a file to be cleaned when the black keywords are positioned behind the white keywords, and determining that the target file is not the file to be cleaned when the black keywords are positioned in front of the white keywords.
4. The apparatus of claim 3, further comprising: a package name judging unit and a package name removing unit,
the package name judging unit is used for scanning the storage space by the path information extracting unit and judging whether the extracted path information of the target file has an installation package name of an application program to which the target file belongs or not before searching the black keywords in the black list and the white keywords in the white list in the path information of the target file according to a preset black list and a preset white list after the path information of the target file is extracted by the keyword searching unit;
the package name removing unit is used for removing the installation package name from the extracted path information of the target file under the condition that the installation package name exists;
the keyword search unit is specifically configured to: and searching for black keywords in the blacklist and white keywords in the white list in the path information of the target file without the installation package name according to a preset blacklist and a preset white list.
CN201510392012.8A 2015-07-06 2015-07-06 Method and device for determining file to be cleaned Active CN105159913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510392012.8A CN105159913B (en) 2015-07-06 2015-07-06 Method and device for determining file to be cleaned

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510392012.8A CN105159913B (en) 2015-07-06 2015-07-06 Method and device for determining file to be cleaned

Publications (2)

Publication Number Publication Date
CN105159913A CN105159913A (en) 2015-12-16
CN105159913B true CN105159913B (en) 2020-08-07

Family

ID=54800770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510392012.8A Active CN105159913B (en) 2015-07-06 2015-07-06 Method and device for determining file to be cleaned

Country Status (1)

Country Link
CN (1) CN105159913B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653663B (en) * 2015-12-29 2019-05-10 珠海豹趣科技有限公司 A kind of file clean-up method and device
CN106446682A (en) * 2016-06-24 2017-02-22 北京壹人壹本信息科技有限公司 Security protection method and apparatus
CN106201601B (en) * 2016-06-30 2019-11-26 北京奇虎科技有限公司 A kind of file clean-up method, electronic equipment and server
CN106844619A (en) * 2017-01-17 2017-06-13 深圳市金立通信设备有限公司 A kind of file clean-up method and terminal
CN106980661B (en) * 2017-03-20 2020-10-09 北京金山安全软件有限公司 Method and device for cleaning data files in mobile terminal and electronic equipment
CN106991150B (en) * 2017-03-28 2020-03-10 维沃移动通信有限公司 Webpage data display method and mobile terminal
CN107862214A (en) * 2017-06-16 2018-03-30 平安科技(深圳)有限公司 Prevent the method, apparatus and storage medium of sensitive information leakage
CN112241395B (en) * 2019-07-17 2024-04-23 腾讯科技(深圳)有限公司 Application program file cleaning method and device, terminal equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930209A (en) * 2012-10-16 2013-02-13 北京奇虎科技有限公司 File processing method and file processing device in mobile equipment
CN103369003A (en) * 2012-03-30 2013-10-23 网秦无限(北京)科技有限公司 A method and a system for scanning redundancy files in a mobile device by using cloud computing
CN103559299A (en) * 2013-11-14 2014-02-05 贝壳网际(北京)安全技术有限公司 Method, device and mobile terminal for cleaning up files
CN104239157A (en) * 2014-08-19 2014-12-24 北京奇虎科技有限公司 Method and device for optimizing and cleaning data of mobile terminal

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7895276B2 (en) * 2007-01-29 2011-02-22 Litera Technology Llc Method of managing metadata in attachments to e-mails in a network environment
US20090044134A1 (en) * 2007-08-06 2009-02-12 Apple Inc Dynamic interfaces for productivity applications
CN104267986B (en) * 2014-09-25 2018-07-31 北京金山安全软件有限公司 Method and device for cleaning junk files of game applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103369003A (en) * 2012-03-30 2013-10-23 网秦无限(北京)科技有限公司 A method and a system for scanning redundancy files in a mobile device by using cloud computing
CN102930209A (en) * 2012-10-16 2013-02-13 北京奇虎科技有限公司 File processing method and file processing device in mobile equipment
CN103559299A (en) * 2013-11-14 2014-02-05 贝壳网际(北京)安全技术有限公司 Method, device and mobile terminal for cleaning up files
CN104239157A (en) * 2014-08-19 2014-12-24 北京奇虎科技有限公司 Method and device for optimizing and cleaning data of mobile terminal

Also Published As

Publication number Publication date
CN105159913A (en) 2015-12-16

Similar Documents

Publication Publication Date Title
CN105159913B (en) Method and device for determining file to be cleaned
KR101999471B1 (en) Information recommendation methods and devices
JP6438135B2 (en) Data mining method and apparatus based on social platform
CN107657048B (en) User identification method and device
US11914639B2 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
CN105744292A (en) Video data processing method and device
US9355250B2 (en) Method and system for rapidly scanning files
CN106528894B (en) The method and device of label information is set
CN110166811B (en) Bullet screen information processing method, device and equipment
CN111083141A (en) Method, device, server and storage medium for identifying counterfeit account
CN105760522A (en) Information search method and device based on application program
US20180239839A1 (en) Method and Apparatus for Identifying To-Be-Cleaned Data, and Electronic Device
CN111401238A (en) Method and device for detecting character close-up segments in video
CN108958592B (en) Video processing method and related product
CN107590490A (en) Unanimous vote face information acquisition method, device and the computer-readable recording medium of invoice
KR101027617B1 (en) System and method for protecting pornograph
CN105550207B (en) Information popularization method and device
CN111274450A (en) Video identification method
CN104268504B (en) Image identification method and device
CN104156694A (en) Method and device for identifying target object of image
US9332031B1 (en) Categorizing accounts based on associated images
CN112287800A (en) Advertisement video identification method and system under no-sample condition
KR101174119B1 (en) System and method for advertisement
CN103093213A (en) Video file classification method and terminal
CN109685079B (en) Method and device for generating characteristic image category information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant