WO2017202214A1 - 文件验证方法及装置 - Google Patents

文件验证方法及装置 Download PDF

Info

Publication number
WO2017202214A1
WO2017202214A1 PCT/CN2017/084042 CN2017084042W WO2017202214A1 WO 2017202214 A1 WO2017202214 A1 WO 2017202214A1 CN 2017084042 W CN2017084042 W CN 2017084042W WO 2017202214 A1 WO2017202214 A1 WO 2017202214A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
verified
information
feature
generating
Prior art date
Application number
PCT/CN2017/084042
Other languages
English (en)
French (fr)
Inventor
黄武
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017202214A1 publication Critical patent/WO2017202214A1/zh
Priority to US15/974,241 priority Critical patent/US11188635B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals

Definitions

  • the present invention relates to the field of network technologies, and in particular, to a file verification method and apparatus.
  • APK Application Package
  • the terminal uses the APK to install the service provider.
  • Service Some developers will maliciously imitate legitimate or official applications to invade user privacy and the interests of service providers. In order to avoid this, you need to verify the APK to determine whether the APK is a counterfeit APK, thus protecting user privacy. And the interests of the service provider.
  • the method for verifying the APK at present may be: when the user or the developer finds that any application may counterfeit other applications, the application is reported, and the verification personnel apply the reported application according to the report information.
  • the program's APK is manually verified to obtain verification results.
  • an embodiment of the present invention provides a file verification method and apparatus.
  • the technical solution is as follows:
  • a file verification method comprising:
  • Extracting file summary data from the file to be verified, the file to be verified is an installation package of the application to be verified, and the file summary data is used to uniquely identify the file content of the file to be verified;
  • a document verification apparatus comprising:
  • a file summary data extraction module configured to extract file summary data from the file to be verified, the file to be verified is an installation package of the application to be verified, and the file summary data is used to uniquely identify the file content of the file to be verified;
  • a feature string generating module configured to generate a feature string of the file to be verified according to the file summary data
  • An object file determining module configured to determine file information of the target file from the feature database according to the feature string of the file to be verified, where the target file is a file that matches a feature string of the file to be verified, At least one file information and a feature string of the genuine file are stored in the feature database, and the file information includes at least a certificate feature value;
  • a verification module configured to verify the file to be verified according to the file information of the target file and the file information of the file to be verified.
  • the file verification apparatus may further include at least one functional module for performing the file verification method in the embodiment of the present invention.
  • a file verification apparatus comprising: a memory and one or more processors configured to perform the method:
  • Extracting file summary data from the file to be verified, the file to be verified is an installation package of the application to be verified, and the file summary data is used to uniquely identify the file content of the file to be verified;
  • the target file is a file that matches the feature string of the file to be verified
  • the feature database stores at least file information and a feature string of the plurality of genuine files, where the file information includes at least a certificate feature value.
  • the file to be verified can be verified, and the file to be verified can be actively collected, and whether it is a genuine application or a counterfeit version of the genuine application, and the verification result is correspondingly stored in the feature database. Therefore, the counterfeit application can be struck to protect the user information security and the service provider's interests.
  • FIG. 1 is a flowchart of a file verification method according to an embodiment of the present invention
  • FIG. 2A is a flowchart of a file verification method according to an embodiment of the present invention.
  • 2B is a flowchart of a method for generating a second feature string according to an embodiment of the present invention
  • 2C is a flowchart of verification according to a feature string according to an embodiment of the present invention.
  • 2D is a flow chart of storing feature strings according to an embodiment of the present invention.
  • 2E is a flowchart of a query method according to an embodiment of the present invention.
  • 2F is a flowchart of a file verification method according to an embodiment of the present invention.
  • FIG. 3 is a block diagram of a file verification apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram of an apparatus 400 for file verification according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a file verification method according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:
  • the method provided by the embodiment of the present invention extracts the file summary data from the file to be verified, and generates a feature string of the file to be verified according to the file summary data, and then, according to the feature string of the verification file, from the feature database. Determining the file information of the target file to verify the file to be verified according to the file information of the target file, and can actively collect the file to be verified and verify whether it belongs to a genuine application or a counterfeit version of the genuine application, and will verify The result is stored in the feature database, so that the counterfeit application can be struck to protect the user information security and the service provider's interests.
  • the verifying the file to be verified according to the file information of the target file and the file information of the file to be verified includes:
  • determining, according to the feature string of the to-be-verified file, the file information of the target file from the feature database includes:
  • a file corresponding to the feature string whose similarity is within the preset range is determined as the target file of the file to be verified.
  • the similarity is a Hamming distance.
  • the file summary data is a digest file, where the file name, file type, and summary information of all resource files in the file to be verified are stored in the digest file; And generating, according to the file summary data, the feature string of the file to be verified includes:
  • the generating, according to the feature text, the feature string of the file to be verified includes:
  • the generating the feature text according to the specified rule according to the file name, the file type, and the summary information of all the resource files includes:
  • the file summary data is an application icon of the to-be-verified application; correspondingly, the generating, according to the file summary data, the feature string of the to-be-verified file includes :
  • generating, according to the application icon of the to-be-verified application, the feature string of the file to be verified includes:
  • the file summary data includes an application icon of the application to be verified and the summary file, and correspondingly, according to the file summary data, generating a location
  • the feature string of the verification file includes:
  • the feature database further includes a white list, and correspondingly, the file information according to the target file and the file information of the file to be verified are Verifying the file for verification includes:
  • file information of all genuine files is stored in the white list.
  • the method further includes:
  • the feature string and the file information of the file to be verified are stored in the feature database.
  • the feature database further includes file information, a feature string, and each of the plurality of genuine files and the plurality of non-genuine files.
  • the verification result of the file correspondingly, the verifying the file to be verified according to the file information of the target file and the file information of the file to be verified includes:
  • the file information of the target file is consistent with the file information of the file to be verified, and the target file is a verification pass file, the file to be verified is verified;
  • the file information of the target file is consistent with the file information of the file to be verified, and the target file is a verification failure file, the verification of the file to be verified fails.
  • the method further includes:
  • the feature string, the file information and the verification result of the file to be verified are stored in the feature database.
  • the file information further includes a file name
  • the method further includes:
  • the query result where the query result includes at least the file name of the at least one matching file and the corresponding verification result, so that the file name of the at least one matching file is displayed on the interface of the sending end And corresponding verification results.
  • the feature string information in the feature database is stored in the form of a K-D tree.
  • the non-genuine file described in the embodiment of the present invention refers to an installation package of a counterfeit application
  • the execution body of the embodiment is a server
  • the server may be a server for performing application publishing, such as an application store server.
  • the application may be uploaded by the server in the third direction, so that the subsequent file verification method is performed, and when the verification is passed, the application uploaded by the third party is posted to the webpage provided by the application store server, so that the user can view and download the file.
  • the server can also be verified based on the application already in the app store to avoid having an application to fish.
  • the server can also be used for server authentication of files, independent of the application store server, so that it can serve multiple application stores at the same time.
  • FIG. 2A is a flowchart of a file verification method according to an embodiment of the present invention. Referring to FIG. 2A, the method includes:
  • the installation package of the application to be verified in the embodiment of the present invention is an APK (Application Package).
  • the file to be verified is a compressed file.
  • the file to be verified performs a decompression operation to extract file summary data from the file to be verified, and the file summary data is used to uniquely identify the file content of the file to be verified; of course, the file to be verified may not be a compressed file, and the present invention The embodiment does not limit this.
  • the server may detect the file format of the file to be verified when receiving the file to be verified, and when the file format indicates that the file to be verified is a compressed file, perform a decompression operation on the file to be verified and a subsequent step, when the file format is If the file to be verified is not a compressed file, skip the decompression operation and directly perform the subsequent verification step.
  • the file summary data includes an application icon of the application to be verified and the summary file.
  • the file summary data may also be any one of an application icon of the application to be verified and the summary file. Or other data that can uniquely identify the content of the file to be verified, which is not specifically limited in this embodiment of the present invention.
  • the file name, file type, and summary information of all resource files in the file to be verified are stored in the summary file.
  • the summary file may be a MANIFEST.MF file in an APK file, and the summary information of all resource files in the APK is recorded in the MANIFEST.MF file, and the summary information of each resource file is used to uniquely identify the corresponding resource file, in another
  • the MANIFEST.MF file records the file feature values of all resource files in the APK or the file identifiers of all resource files.
  • the file to be verified before the file summary data is extracted from the file to be verified, the file to be verified needs to be collected first.
  • the specific collection method may be: collecting from various application stores, and the application store is for providing A platform for downloading a plurality of applications for users, because different terminals or system developers provide corresponding applications for terminals of different brands or different systems for users to download, so by collecting the files to be verified from various application stores, Collect as many applications as possible in the application market to maximize the impact on non-genuine applications.
  • the method for collecting the to-be-verified file may also be collected from the application download link of the webpage.
  • the file to be verified may be collected by other methods, which is not limited by the embodiment of the present invention.
  • the file to be verified may be collected in batches, and the file verification method described in the embodiment of the present invention is only for one file to be verified, and the specific implementation manner for multiple files to be verified may adopt the present invention.
  • the installation package of all applications that can be downloaded by the user can be verified to the maximum extent, thereby determining whether the verified application is a genuine application or a non-genuine application, thereby combating the counterfeit application to protect the user information.
  • Safe and genuine application developer benefits. Need to explain The non-genuine application described in the embodiment of the present invention refers to an application developed by imitating a genuine APK.
  • the method for generating the feature string of the to-be-verified file is different according to the specific content of the file summary data.
  • the file summary data includes the application icon of the to-be-verified application and the digest file, generating according to the file digest data.
  • the method for generating the feature string of the file to be verified may be: generating a first feature string of the file to be verified according to the application icon of the application to be verified; generating feature text according to the summary file, and generating the to-be-based according to the feature text And verifying a second feature string of the file; generating a feature string of the file to be verified according to the first feature string and the second feature string.
  • the method for generating the feature string of the file to be verified according to the application icon of the to-be-verified application may be: using a pHash (Perception Hash) algorithm or a SIFT (Scale Invariant Feature Transform) according to the application icon of the application to be verified.
  • the scale invariant feature transform algorithm generates a first feature string of the file to be verified.
  • the first feature string of the to-be-verified file may be calculated by using other algorithms, which is not specifically limited in this embodiment of the present invention.
  • the application icon corresponding to the genuine application is usually imitated. Therefore, in the embodiment of the present invention, the application icon is used as a reference standard, and the verification file is verified, that is, according to whether the application icon of the application to be verified is For the imitation of other genuine applications, determine whether the file to be verified is an imitation of other genuine applications.
  • generating the feature string of the to-be-verified file according to the file summary data includes: generating feature text according to the specified rule according to the file name, file type, and summary information of the all resource files; Feature text, generating a second feature string of the file to be verified.
  • the method for generating the feature text according to the specified rule according to the file name, the file type, and the summary information of all the resource files may be: obtaining the specified summary information from all the resource files according to the file type of all the resource files, the designation
  • the summary information is summary information of a specified type of resource file; the feature text is generated according to the specified summary information. That is, the specified rule is to select the summary information of the resource file of the specified type.
  • the specified rule may also be other rules, which is not limited by the embodiment of the present invention.
  • the file suffix is obtained from all the resource files. a file of ".png", and obtaining summary information of the resource files of the specified type, and sequentially outputting the specified summary information to generate feature texts, which may be arranged according to the first alphabetical order of the specified type of files, or may be generated according to the file generation time.
  • the arrangement is not specifically limited in the embodiment of the present invention.
  • the method for generating the second feature string of the file to be verified according to the feature text may be: generating a second feature string of the file to be verified by using a sensitive hash simhash algorithm according to the feature text.
  • the file summary data is a digest file
  • the process of generating the second feature string of the file to be verified may be represented by FIG. 2B.
  • the resource file can verify the file to be verified from the specific content of the application.
  • the method for generating the feature string of the file to be verified according to the first feature string and the second feature string may be: directly connecting the first feature string and the second feature string to each other to generate the
  • the feature string of the file to be verified may be generated by inserting the first feature string into the specified position of the second feature string; of course, other methods may be used to generate the file to be verified.
  • the feature string is not specifically limited in this embodiment of the present invention.
  • the step of generating the feature string of the file to be verified may or may not be performed, which is not limited by the embodiment of the present invention.
  • the application icon and the digest file of the application to be verified may be verified according to the first feature string and the second feature string, respectively, according to an application icon of the application to be verified and
  • the verification result of the digest file further determines the verification result of the to-be-verified file.
  • the feature database stores at least the file information and the feature string of the plurality of genuine files
  • the verification of the file to be verified is performed, only the genuine files stored in the feature database are compared, and the verification purpose is achieved.
  • the feature database is reduced to occupy the server memory; in addition, the file information of the genuine file is only fed back to the user when the user queries, so as to ensure that the application installed by the user is a genuine application.
  • the file information and the feature string of the plurality of genuine files and the plurality of genuine files and the plurality of non-genuine files may be stored.
  • the verification result of the file is such that when the verification result query request of the user is received, the verification result can be quickly fed back, and the verification result query request is used to query whether any file to be inquired is a genuine file, and the verification result query request is at least Carry the file information of the file to be queried.
  • the method for determining the file information of the target file from the feature database according to the feature string of the to-be-verified file may be: calculating a similarity between the feature string of the file to be verified and each feature string in the feature database; A file corresponding to the feature string whose similarity is within the preset range is determined as the target file of the file to be verified.
  • the preset range is different according to the similarity calculation method.
  • the method for setting the preset range and the specific value are not limited in the embodiment of the present invention;
  • the certificate feature value refers to the application to be verified by using an encryption algorithm.
  • the feature value obtained by encrypting the certificate may be an eigenvalue of the message-digest algorithm (MD), or may be a feature value calculated by other algorithms, which is not limited in this embodiment of the present invention.
  • MD message-digest algorithm
  • the file information may include other information, such as a file name, a feature value of the file, and the like, which are not specifically limited in this embodiment of the present invention.
  • the similarity is a Hamming distance
  • the Hamming distance refers to the number of different characters of the corresponding positions of the two character strings.
  • the method for determining file information of the target file from the feature database according to the feature string of the to-be-verified file is: calculating a feature string of the file to be verified and a character string between each feature string in the feature database For the clear distance, the file corresponding to the feature string whose Hamming distance is less than the preset distance is determined as the target file of the file to be verified, that is, the target file may be considered to have a counterfeit relationship with the file to be verified.
  • the file summary data includes the application icon of the to-be-verified application and the digest file
  • the first character string corresponding to the application icon is not corresponding to the second character string corresponding to the digest file.
  • the similarity between the feature strings of the corresponding file summary data in the feature database is calculated for the first feature string and the second feature string, respectively, and then determined respectively.
  • a first object file having a similar icon is applied, and a second object file similar to the summary file.
  • the file information of the file that is counterfeit with the application icon of the application to be verified can be respectively acquired, and
  • the summary file of the verification application has file information of a counterfeit file; in addition, storing the first feature string and the second feature string separately can improve the speed of acquiring the file information of the target file, thereby improving the file verification. effectiveness.
  • the feature string information in the feature database is stored in the form of a K-D tree.
  • the feature string is divided into multi-dimensional nodes for storage.
  • the target file of the verification file is determined, the feature string of the file to be verified is split into multiple nodes, and similar feature characters are retrieved from the KD tree according to the split result. a string, and calculating a similarity between the feature string of the file to be verified and the similar feature string. If the similarity is within a preset range, determining that the file corresponding to the similar feature string is the target file, that is, The file to be verified may have a phishing relationship.
  • the determination speed of the target file can be improved, and the verification efficiency of the file to be verified can be improved.
  • the method for verifying the file to be verified according to the file information of the target file and the file information of the file to be verified may be:
  • the file information of the target file is consistent with the file information of the file to be verified, and the file to be verified is verified, that is, the file to be verified and the target file are files belonging to the same application; if the file information of the target file is If the file information of the verification file is inconsistent, the verification of the file to be verified is not passed, that is, the counterfeit version of the file to be verified and the target file is confirmed, that is, the application to be verified is a counterfeit version of the application corresponding to the target file.
  • the feature database further stores a white list
  • the method for verifying the file to be verified according to the file information of the target file and the file information of the file to be verified may be If the file information of the target file is inconsistent with the file information of the file to be verified, query whether the file information of the file to be verified is stored in the white list; if the file information of the file to be verified is stored in the white list, The file to be verified is verified to pass; if the file information of the file to be verified is not stored in the white list, the verification of the file to be verified fails.
  • the whitelist stores file information of all genuine files.
  • the method for further verifying whether the file to be verified is a genuine application corresponding file is: verifying whether the white list stores the The file information of the file to be verified, if yes, confirm that the application to be verified is a genuine application, and if not, confirm that the application to be verified is a counterfeit version of the corresponding application of the target file. this.
  • the feature database further includes file information of the non-genuine files, a feature string, and verification results of each of the plurality of genuine files and the plurality of non-genuine files, correspondingly
  • the verifying the file to be verified according to the file information of the target file and the file information of the file to be verified includes: if the file information of the target file is consistent with the file information of the file to be verified, and the target file is If the file information of the target file is consistent with the file information of the file to be verified, and the target file is a verification failure file, the verification of the file to be verified fails.
  • the step 203 and the step 204 are subsequent verification processes performed according to the feature string generated in step 202, that is, the suspicious feature string is determined from the feature database according to the feature string, and the suspicious feature string is determined. It is a feature string with a similarity between the feature string and a preset range.
  • the file information of the target file is obtained according to the suspect feature string, and the file to be verified is verified according to the file information of the target file.
  • the feature string and the file information of the file to be verified are stored in the feature database.
  • the feature database If only the file information and the feature string of the plurality of genuine files are stored in the feature database, if the file to be verified is verified, that is, the file to be verified is a genuine file, the feature string of the verification file is changed. File information is stored in the feature database.
  • the file information and the feature string of the plurality of genuine files, the file information and the feature string of the plurality of non-genuine files, and the plurality of genuine files and
  • the verification result of each group file in the plurality of non-genuine files is obtained, the character string of the file to be verified is verified after the file to be verified is verified according to the file information of the object file and the file information of the file to be verified.
  • the file information and the verification result are stored in the feature database to avoid the repeated verification process of the file to be verified.
  • the flow of the feature string of the file to be verified from generation to storage into the feature database can be represented by FIG. 2D.
  • the non-genuine file may be recorded by identifying the file, and the marked file may be deleted or displayed.
  • the file to be verified prompts the user based on its identifier, so that the user knows that the file is at risk.
  • file information and feature strings of the plurality of genuine files, file information and feature strings of the plurality of non-genuine files, and the plurality of genuine files and the plurality of non-senses are simultaneously stored.
  • the information query function can also be implemented, that is, when the user needs to find an application, input the file name of the corresponding file on the query interface, so that the server queries the feature database according to the file name.
  • the file in the stored file that matches the file name as shown in FIG. 2E, specifically includes the following steps:
  • the interface of the query service may be set in an application store, or may be set in an application such as a mobile phone housekeeper, or may be set in another application or a webpage, which is not limited by the embodiment of the present invention.
  • the interface of the query service is set in the application store, when the user wants to download an application, the file name of the file corresponding to the application is input, and the server obtains the query request, and the query request carries at least the file name of the file to be queried. To enable the server to query based on the file name.
  • the matching file refers to a file whose file name matches the file name of the file to be queried in all the files in the feature database, and obtains at least one file name and corresponding file of the matching file from the feature database according to the file name of the file to be queried.
  • the method for verifying the result may be: obtaining a file matching the file name of the file to be queried from the feature database by using a text recognition technology, determining the file as a matching file, and obtaining the verification result of the matching file.
  • the file name of the file to be queried is “Happy Music”
  • the file name in the feature database is “Happy Pairs”, “Animal Cancellation”, “Every Day Love Elimination”, etc.
  • the at least one matching file may have only genuine files, or only non-genuine files, and may include both genuine files and non-genuine files, depending on the data stored in the database according to the feature and the The file name of the file to be queried.
  • the query result is fed back to the sending end of the query request, where the query result includes at least the file name of the at least one matching file and the corresponding verification result, so that the file name and corresponding verification of the at least one matching file are displayed on the interface of the sending end. result.
  • the query is terminated. If the feedback is sent to the sending end of the query request, the file name of the at least one matching file and the corresponding verification result are displayed on the interface of the sending end, so that the user selects the application to be installed according to the query result.
  • the server when only the file information and the feature string of the plurality of genuine files are stored in the feature database, when the server processes the query request, the matching files obtained are all genuine files, in which case , the step of obtaining the verification result of the matching file may be omitted.
  • the prompt information is fed back to the sending end of the query request to display the prompt at the sending end of the query request.
  • the message is used to prompt the user to search for related information.
  • the prompt message may include related information of the recommended application or other information in addition to the information for prompting the user, which is not specifically limited in the embodiment of the present invention.
  • the file verification system corresponding to the file verification method provided by the present invention may be composed of four modules, including: a file collection module, a feature calculation module, a comprehensive analysis module, and a query service module.
  • the file collection module is configured to collect the file to be verified, that is, to perform step 201;
  • the feature calculation module is configured to calculate the feature string of the file summary data, that is, to perform step 202;
  • the comprehensive analysis module is configured to obtain the file to be verified.
  • the query service module is configured to provide the query service, that is, to perform step 206 to step 208.
  • the overall process is shown in Figure 2F.
  • the file verification method extracts the file summary data from the file to be verified, and generates a feature string of the file to be verified according to the file summary data, and then according to the feature string of the verification file. Determining the file information of the target file in the feature database, and verifying the file to be verified according to the file information of the target file, and realizing collecting the file to be verified and verifying whether it belongs to a genuine application or a counterfeit version of the genuine application.
  • the verification result is correspondingly stored in the feature database, so that the counterfeit application can be struck to protect the user information security and the service provider's interests; further, by storing the feature string in the form of a KD tree, the file verification efficiency can be improved;
  • the corresponding retrieved APK software name and the verification result are sent to the sending end of the query request, so that the client can know the Genuine files and counterfeit files related to the APK file, and further Obtaining genuine documents can choose to install the application or execution against the corresponding counterfeit software, to further protect user information security and the interests of service providers.
  • FIG. 3 is a block diagram of a file verification apparatus according to an embodiment of the present invention.
  • the apparatus includes a file summary data extraction module 301, a feature string generation module 302, an object file determination module 303, and a verification module 304.
  • the file summary data extraction module 301 is configured to extract file summary data from the file to be verified, where the file to be verified is an installation package of the application to be verified, and the file summary data is used to uniquely identify the file content of the file to be verified;
  • the feature string generating module 302 is configured to generate a feature string of the file to be verified according to the file summary data.
  • the target file determining module 303 is configured to determine file information of the target file from the feature database according to the feature string of the file to be verified, where the target file is a file that matches the feature string of the file to be verified. Storing at least a plurality of genuine file file information and a feature string in the feature database, the file information including at least a certificate feature value;
  • the verification module 304 is configured to verify the file to be verified according to the file information of the target file and the file information of the file to be verified.
  • the verification module is used to:
  • the target file determining module is configured to:
  • a file corresponding to the feature string whose similarity is within the preset range is determined as the target file of the file to be verified.
  • the similarity is a Hamming distance.
  • the file summary data is a summary file, where the file name, file type, and summary information of all resource files in the file to be verified are stored in the summary file;
  • the feature string generating module is configured to:
  • the feature string generating module is configured to:
  • the feature string generating module is configured to:
  • the file summary data is an application icon of the to-be-verified application; correspondingly, the feature string generation module is configured to:
  • the feature string generating module is configured to:
  • the file summary data includes an application icon of the application to be verified and the summary file, and correspondingly, the feature string generation module is configured to:
  • a whitelist is also stored in the feature database, and correspondingly, the verification module is configured to:
  • file information of all genuine files is stored in the white list.
  • the device further includes:
  • a storage module configured to store the feature string and the file information of the file to be verified into the feature database if the file to be verified is verified.
  • the feature database further includes file information, a feature string, and the plurality of genuine files and the plurality of non-genuine files in the non-genuine file.
  • the verification result of each file correspondingly, the verification module is used to:
  • the file information of the target file is consistent with the file information of the file to be verified, and the target file is a verification pass file, the file to be verified is verified;
  • the file information of the target file is consistent with the file information of the file to be verified, and the target file is a verification failure file, the verification of the file to be verified fails.
  • the device further includes:
  • a storage module configured to store the feature string, the file information, and the verification result of the file to be verified into the feature database.
  • the file information further includes a file name
  • the device further includes:
  • a receiving module configured to receive a query request, where the query request carries at least a file name of a file to be queried;
  • a matching file obtaining module configured to obtain, according to the file name, a file name of the at least one matching file and a corresponding verification result from the feature database
  • a sending module configured to feed back a query result to the sending end of the query request, where the query result includes at least a file name of the at least one matching file and a corresponding verification result, so that the at least one interface is displayed on the interface of the sending end Match the file name of the file and the corresponding verification result.
  • the feature string information in the feature database is stored in the form of a K-D tree.
  • the file verification device when verifying a file, the file verification device provided by the above embodiment is only illustrated by the division of each functional module. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the document verification device and the file verification method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • FIG. 4 is a block diagram of an apparatus 400 for file verification according to an embodiment of the present invention.
  • device 400 can be provided as a server.
  • apparatus 400 includes a processing component 422 that further includes one or more processors, and memory resources represented by memory 432 for storing instructions executable by processing component 422, such as an application.
  • An application stored in memory 432 may include one or more modules each corresponding to a set of instructions.
  • processing component 422 is configured to execute instructions to perform the methods described above.
  • Device 400 may also include a power supply component 426 configured to perform power management of device 400, a wired or wireless network interface 450 configured to connect device 400 to the network, and an input/output (I/O) interface 458.
  • Device 400 may operate based on an operating system stored in the memory 432, for example, Windows Server TM, Mac OS X TM , Unix TM, Linux TM, FreeBSD TM or the like.
  • a computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor in a server to perform the file verification method of the above embodiments.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Abstract

一种文件验证方法及装置,属于网络技术领域。方法包括:从待验证文件中提取文件摘要数据(101);根据文件摘要数据,生成待验证文件的特征字符串(102);根据待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息(103);根据目标文件的文件信息和待验证文件的文件信息进行验证(104)。上述文件验证方法及装置能够对仿冒应用进行打击,保障用户信息安全及服务商利益。

Description

文件验证方法及装置
本申请要求于2016年5月24日提交中国专利局、申请号为201610349815X、发明名称为“文件验证方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
文件验证方法及装置
技术领域
本发明涉及网络技术领域,特别涉及一种文件验证方法及装置。
背景技术
随着网络和智能终端的普及,以及网络技术的不断发展,服务商通过可安装的APK(Application Package,应用程序包)向用户提供服务,也即是,终端通过安装APK,来使用服务商提供的服务。而有些开发者会恶意模仿合法或官方应用程序,以达到侵犯用户隐私和服务商利益的目的,为了避免发生上述情况,需要对APK进行验证,以判断该APK是否为仿冒APK,从而保护用户隐私和服务商利益。
在相关技术中,目前验证APK的方法可以为:用户或开发人员在发现任一应用程序可能仿冒了其他应用程序时,对该应用程序进行举报,验证人员根据该举报信息后,对所举报应用程序的APK进行人工验证,以获取验证结果。
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:
上述文件验证方法过分依赖人力,出现漏检仿冒APK文件的风险,进而导致用户信息安全和服务商利益存在潜在威胁。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种文件验证方法及装置。所述技术方案如下:
一方面,提供了一种文件验证方法,所述方法包括:
从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
根据所述文件摘要数据,生成所述待验证文件的特征字符串;
根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
另一方面,提供了一种文件验证装置,所述装置包括:
文件摘要数据提取模块,用于从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
特征字符串生成模块,用于根据所述文件摘要数据,生成所述待验证文件的特征字符串;
目标文件确定模块,用于根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
验证模块,用于根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
该文件验证装置还可以包括用于执行本发明实施例中的文件验证方法的至少一个功能模块。
再一方面,提供了一种文件验证装置,所述装置包括:存储器以及一个或多个处理器,所述处理器被配置执行下述方法:
从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
根据所述文件摘要数据,生成所述待验证文件的特征字符串;
根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件 信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
本发明实施例提供的技术方案带来的有益效果是:
通过从待验证文件中提取文件摘要数据,并根据该文件摘要数据生成该待验证文件的特征字符串,再根据该带验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,以根据该目标文件的文件信息,对该待验证文件进行验证,能够实现主动收集待验证文件,并验证其是属于正版应用还是属于正版应用的仿冒版本,并将验证结果对应存储至特征数据库中,从而能够对仿冒应用进行打击,保障用户信息安全及服务商利益。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种文件验证方法流程图;
图2A是本发明实施例提供的一种文件验证方法流程图;
图2B是本发明实施例提供的一种第二特征字符串生成方法流程图;
图2C是本发明实施例提供的一种根据特征字符串的验证流程图;
图2D是本发明实施例提供的一种特征字符串存储流程图;
图2E是本发明实施例提供的一种查询方法流程图;
图2F是本发明实施例提供的一种文件验证方法流程图;
图3是本发明实施例提供的一种文件验证装置框图;
图4是本发明实施例提供的一种用于文件验证的装置400的框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明 实施方式作进一步地详细描述。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。
图1是本发明实施例提供的一种文件验证方法流程图,如图1所示,包括以下步骤:
101、从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容。
102、根据所述文件摘要数据,生成所述待验证文件的特征字符串。
103、根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值。
104、根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
本发明实施例提供的方法,通过从待验证文件中提取文件摘要数据,并根据该文件摘要数据生成该待验证文件的特征字符串,再根据该带验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,以根据该目标文件的文件信息,对该待验证文件进行验证,能够实现主动收集待验证文件,并验证其是属于正版应用还是属于正版应用的仿冒版本,并将验证结果对应存储至特征数据库中,从而能够对仿冒应用进行打击,保障用户信息安全及服务商利益。
在本发明的第一种可能实现方式中,根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,对所述待验证文件验证通过;
如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,对所述待验证文件验证不通过。
在本发明的第二种可能实现方式中,所述根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息包括:
计算所述待验证文件的特征字符串与所述特征数据库中每个特征字符串的相似度;
将相似度在预设范围内的特征字符串对应的文件确定为所述待验证文件的目标文件。
在本发明的第三种可能实现方式中,所述相似度为汉明距离。
在本发明的第四种可能实现方式中,所述文件摘要数据为摘要文件,所述摘要文件中存储有所述待验证文件中所有资源文件的文件名称、文件类型和摘要信息;相应地,所述根据所述文件摘要数据,生成所述待验证文件的特征字符串包括:
根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本;
根据所述特征文本,生成所述待验证文件的特征字符串。
在本发明的第五种可能实现方式中,所述根据所述特征文本,生成所述待验证文件的特征字符串包括:
根据所述特征文本,通过敏感哈希simhash算法生成所述待验证文件的特征字符串。
在本发明的第六种可能实现方式中,所述根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本包括:
根据所述所有资源文件的文件类型,从所述所有资源文件中获取指定摘要信息,所述指定摘要信息为指定类型资源文件的摘要信息;
根据所述指定摘要信息生成所述特征文本。
在本发明的第七种可能实现方式中,所述文件摘要数据为所述待验证应用的应用图标;相应地,所述根据所述文件摘要数据,生成所述待验证文件的特征字符串包括:
根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串。
在本发明的第八种可能实现方式中,根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串包括:
根据所述待验证应用的应用图标,通过感知哈希pHash算法或尺度不变特征变换SIFT算法生成所述待验证文件的特征字符串。
在本发明的第九种可能实现方式中,所述文件摘要数据包括所述待验证应用的应用图标和所述摘要文件,相应地,所述根据所述文件摘要数据,生成所 述待验证文件的特征字符串包括:
根据所述待验证应用的应用图标,生成所述待验证文件的第一特征字符串;
根据所述摘要文件生成特征文本,并根据所述特征文本生成所述待验证文件的第二特征字符串;
根据所述第一特征字符串和所述第二特征字符串,生成所述待验证文件的特征字符串。
在本发明的第十种可能实现方式中,所述特征数据库中还存储有白名单,相应地,所述根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,查询所述白名单中是否存储有所述待验证文件的文件信息;
如果所述白名单中存储有所述待验证文件的文件信息,对所述待验证文件验证通过;
如果所述白名单中未存储有所述待验证文件的文件信息,对所述待验证文件验证不通过。
在本发明的第十一种可能实现方式中,所述白名单中存储有所有正版文件的文件信息。
在本发明的第十二种可能实现方式中,所述根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证之后,所述方法还包括:
如果所述待验证文件验证通过,将所述待验证文件的特征字符串和文件信息存储至所述特征数据库中。
在本发明的第十三种可能实现方式中,所述特征数据库中还存储有多个非正版文件的文件信息、特征字符串以及所述多个正版文件和所述多个非正版文件中每个文件的验证结果,相应地,所述根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证通过文件,对所述待验证文件验证通过;
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证不通过文件,对所述待验证文件验证不通过。
在本发明的第十四种可能实现方式中,根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证之后,所述方法还包括:
将所述待验证文件的特征字符串、文件信息和验证结果存储至所述特征数据库中。
在本发明的第十五种可能实现方式中,所述文件信息还包括文件名称,相应地,所述方法还包括:
接收查询请求,所述查询请求至少携带待查询文件的文件名称;
根据所述文件名称,从所述特征数据库中获取至少一个匹配文件的文件名称及对应验证结果;
向所述查询请求的发送端反馈查询结果,所述查询结果至少包括所述至少一个匹配文件的文件名称及对应验证结果,以使得在所述发送端的界面显示所述至少一个匹配文件的文件名称及对应验证结果。
在本发明的第十六种可能实现方式中,所述特征数据库中的特征字符串信息以K-D树的形式存储。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是,本发明实施例中所描述的非正版文件是指仿冒应用的安装包,该实施例的执行主体为服务器,该服务器可以是用于进行应用发布的服务器,例如应用商店服务器,可以通过第三方向该服务器进行应用的上传,从而进行后续文件验证方法,并在验证通过时,将第三方所上传的应用发布至应用商店服务器所提供的网页上,以便用户进行查看和下载,当然,该服务器还可以是基于应用商店已有的应用进行验证,以避免有应用浑水摸鱼。该服务器还可以用于文件验证的服务器,与应用商店服务器之间相互独立,从而可以同时为多个应用商店服务。
图2A是本发明实施例提供的一种文件验证方法流程图,参见图2A,该方法包括:
201、从待验证文件中提取文件摘要数据,该待验证文件为待验证应用的安装包,该文件摘要数据用于唯一标识该待验证文件的文件内容。
本发明实施例中的待验证应用的安装包即为APK(Application Package,应用程序包),通常该待验证文件为压缩文件,在获取到该待验证文件时,对 该待验证文件执行解压操作,以从该待验证文件中提取文件摘要数据,该文件摘要数据用于唯一标识该待验证文件的文件内容;当然,该待验证文件也可以不是压缩文件,本发明实施例对此不作限定。
具体地,服务器可以在接收到待验证文件时,检测该待验证文件的文件格式,当文件格式指示该待验证文件为压缩文件时,对该待验证文件执行解压操作以及后续步骤,当文件格式指示该待验证文件不是压缩文件,则跳过解压操作直接执行后续验证步骤。
在本发明实施例中,该文件摘要数据包括该待验证应用的应用图标和该摘要文件,当然,该文件摘要数据也可以是该待验证应用的应用图标和该摘要文件中的任一种,或者是其他能够唯一标识该待验证文件的文件内容的数据,本发明实施例对此不作具体限定。
其中,该摘要文件中存储有该待验证文件中所有资源文件的文件名称、文件类型和摘要信息。例如,该摘要文件可以为APK文件中的MANIFEST.MF文件,该MANIFEST.MF文件中记录了APK中所有资源文件的摘要信息,每个资源文件的摘要信息用于唯一标识对应资源文件,在另一示例中,该MANIFEST.MF文件中记录了APK中所有资源文件的文件特征值或所有资源文件的文件标识。
在本发明另一实施例中,从待验证文件中提取文件摘要数据之前,需要先收集该待验证文件,具体收集方法可以为:从各种应用商店中收集,该应用商店为用于提供可供用户下载的多种应用的平台,由于不同终端或系统开发商,针对不同品牌或不同系统的终端提供相应的应用以供用户下载,因此通过从各种应用商店中收集该待验证文件,能够尽可能多的收集应用市场上的各种应用,从而能够最大程度上对非正版应用进行打击。收集该待验证文件的方法也可以为:从网页的应用下载链接中收集,当然,还可以通过其他方法收集该待验证文件,本发明实施例对此不作限定。需要说明的是,在收集过程中可以批量收集待验证文件,而本发明实施例所阐述的文件验证方法只针对一个待验证文件而言,针对多个待验证文件的具体实施方式可以采用本发明实施例中的方法。
通过收集该待验证文件,能够最大程度实现对所有能够被用户下载的应用的安装包进行验证,从而确定所验证的应用是正版应用还是非正版应用,进而对仿冒应用进行打击,以保障用户信息安全和正版应用开发商利益。需要说明 的是,本发明实施例中所描述的非正版应用是指通过模仿正版APK开发的应用。
202、根据该文件摘要数据,生成该待验证文件的特征字符串。
根据该文件摘要数据的具体内容不同,生成该待验证文件的特征字符串的方法也不同,当该文件摘要数据包括该待验证应用的应用图标和该摘要文件时,根据该文件摘要数据,生成该待验证文件的特征字符串的方法可以为:根据该待验证应用的应用图标,生成该待验证文件的第一特征字符串;根据该摘要文件生成特征文本,并根据该特征文本生成该待验证文件的第二特征字符串;根据该第一特征字符串和该第二特征字符串,生成该待验证文件的特征字符串。
根据该待验证应用的应用图标,生成该待验证文件的特征字符串的方法可以为:根据该待验证应用的应用图标,通过pHash(Perception Hash,感知哈希)算法或SIFT(Scale Invariant Feature Transform,尺度不变特征变换)算法生成该待验证文件的第一特征字符串。当然,也可以采用其他算法计算该待验证文件的第一特征字符串,本发明实施例对此不作具体限定。
由于对于非正版应用而言,通常会模仿对应正版应用的应用图标,因此,在本发明实施例中,将应用图标作为参照标准,对待验证文件进行验证,即根据该待验证应用的应用图标是否为对其他正版应用的模仿,确定该待验证文件是否为对其他正版应用的模仿。
此外,还有一些恶意开发者模仿正版应用的安装包文件中的资源文件,开发非正版应用,侵犯正版应用开发商的利益。当该文件摘要数据为摘要文件,根据该文件摘要数据,生成该待验证文件的特征字符串包括:根据该所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本;根据该特征文本,生成该待验证文件的第二特征字符串。
其中,根据该所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本的方法可以为:根据该所有资源文件的文件类型,从该所有资源文件中获取指定摘要信息,该指定摘要信息为指定类型资源文件的摘要信息;根据该指定摘要信息生成该特征文本。也即是,该指定规则是指选取指定类型资源文件的摘要信息,当然,该指定规则也可以是其他规则,本发明实施例对此不作限定。
例如,当该指定类型为png类型时,从该所有资源文件中获取文件后缀为 “.png”的文件,并获取这些该指定类型资源文件的摘要信息,将该指定摘要信息按顺序排列生成特征文本,可以按照该指定类型文件的首字母顺序进行排列,也可以按照文件生成时间进行排列,本发明实施例对此不作具体限定。
其中,根据该特征文本,生成该待验证文件的第二特征字符串的方法可以为:根据该特征文本,通过敏感哈希simhash算法生成该待验证文件的第二特征字符串。当该文件摘要数据为摘要文件时,生成该待验证文件的第二特征字符串的流程可以用图2B表示。
通过从该待验证文件的摘要文件中获取指定摘要信息,并根据该指定摘要信息生成特征文本,再根据该特征文本生成该待验证文本的第二特征字符串,能够达到根据该待验证文件中资源文件作为参照标准,能够实现从应用的具体内容上对该待验证文件进行验证。
根据该第一特征字符串和该第二特征字符串,生成该待验证文件的特征字符串的方法可以为:直接将该第一特征字符串和该第二特征字符串前后相接生成该待验证文件的特征字符串,也可以将该第一特征字符串插入到该第二特征字符串的指定位置生成该待验证文件的特征字符串;当然,还可以采用其他方法生成该待验证文件的特征字符串,本发明实施例对此不作具体限定。
需要说明的是,根据该第一特征字符串和该第二特征字符串,生成该待验证文件的特征字符串的步骤可以执行,也可以不执行,本发明实施例对此不作限定。当该步骤不执行时,可以分别根据该第一特征字符串和该第二特征字符串,对该待验证应用的应用图标和摘要文件进行验证,并根据对该该待验证应用的应用图标和该摘要文件的验证结果,进一步确定该待验证文件的验证结果。
203、根据该待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,该目标文件为与该待验证文件的特征字符串匹配的文件,该特征数据库中至少存储有多个正版文件的文件信息和特征字符串,该文件信息至少包括证书特征值。
当该特征数据库至少存储有多个正版文件的文件信息和特征字符串时,能够实现在对该待验证文件进行验证时,只与该特征数据库中所存储的正版文件进行比较,在达到验证目的的同时,减小该特征数据库占用服务器内存;此外,还能够实现在用户查询时只将正版文件的文件信息反馈给用户,以保证用户所安装的应用为正版应用。
该特征数据库中除了存储有多个正版文件的文件信息和特征字符串,还可以存储有多个非正版文件的文件信息和特征字符串以及该多个正版文件和该多个非正版文件中每个文件的验证结果,以使得当接收到用户的验证结果查询请求时,能够快速反馈该验证结果,该验证结果查询请求用于查询任一待查询文件是否为正版文件,该验证结果查询请求至少携带该待查询文件的文件信息。
该根据该待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息的方法可以为:计算该待验证文件的特征字符串与该特征数据库中每个特征字符串的相似度;将相似度在预设范围内的特征字符串对应的文件确定为该待验证文件的目标文件。
其中,该预设范围根据该相似度计算方法的不同而不同,本发明实施例对该预设范围的设置方法和具体数值均不作限定;该证书特征值是指通过加密算法对该待验证应用的证书进行加密后得到的特征值,该特征值可以为MD5(Message-Digest Algorithm 5,信息摘要算法5)特征值,也可以为其他算法计算得到的特征值,本发明实施例对此不作限定。需要说明的是,该文件信息除了包括证书特征值以外,还可以包括其他信息,如文件名称、文件的特征值等,本发明实施例对此不作具体限定。
在本发明另一实施例中,该相似度为汉明距离,该汉明距离是指该两个字符串对应位置的不同字符的个数。相应地,根据该待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息的方法为:计算该待验证文件的特征字符串与该特征数据库中每个特征字符串之间的汉明距离,将汉明距离小于预设距离的特征字符串对应的文件确定为该待验证文件的目标文件,即认为该目标文件为与该待验证文件可能存在仿冒关系。
需要说明的是,当该文件摘要数据包括该待验证应用的应用图标和该摘要文件时,在不将该应用图标对应的第一特征字符串与该摘要文件对应的第二字符串对应的第二字符串进行组合的情况下,分别针对该第一特征字符串和该第二特征字符串,计算与该特征数据库中对应文件摘要数据的特征字符串之间的相似度,再分别确定与该应用图标相似的第一目标文件,以及与该摘要文件相似的第二目标文件。
在不将该第一特征字符串和该第二特征字符串进行组合的情况下,能够分别获取到与该待验证应用的应用图标存在仿冒可能的文件的文件信息,以及与 该验证应用的摘要文件存在仿冒可能的文件的文件信息;此外,分别对该第一特征字符串和该第二特征字符串进行存储,能够提高获取目标文件的文件信息的速度,进而提高文件验证效率。
在本发明另一实施例中,该特征数据库中的特征字符串信息以K-D树的形式存储。即将特征字符串拆分成多维节点进行存储,在确定该验证文件的目标文件时,将该待验证文件的特征字符串拆分成多为节点,根据拆分结果从K-D树中检索相似特征字符串,并计算该待验证文件的特征字符串与该相似特征字符串的相似度,如果该相似度在预设范围内,则确定该相似特征字符串对应的文件为目标文件,也即是与该待验证文件可能存在仿冒关系的文件。
通过以K-D树的形式存储特征字符串,能够提高目标文件的确定速度,进而能够对待验证文件的验证效率。
204、根据该目标文件的文件信息和该待验证文件的文件信息,对该待验证文件进行验证。
当该特征数据库中至存储有多个正版文件的文件信息和特征字符串时,根据该目标文件的文件信息和该待验证文件的文件信息,对该待验证文件进行验证的方法可以为:如果该目标文件的文件信息与该待验证文件的文件信息一致,对该待验证文件验证通过,即确认该待验证文件与目标文件为属于相同应用的文件;如果该目标文件的文件信息与该待验证文件的文件信息不一致,对该待验证文件验证不通过,即确认该待验证文件与目标文件的仿冒版本,也即是,该待验证应用为与该目标文件对应应用的仿冒版本。
在本发明另一实施例中,该特征数据库中还存储有白名单,相应地,该根据该目标文件的文件信息和该待验证文件的文件信息,对该待验证文件进行验证的方法还可以为:如果该目标文件的文件信息与该待验证文件的文件信息不一致,查询该白名单中是否存储有该待验证文件的文件信息;如果该白名单中存储有该待验证文件的文件信息,对该待验证文件验证通过;如果该白名单中未存储有该待验证文件的文件信息,对该待验证文件验证不通过。其中,该白名单中存储有所有正版文件的文件信息。
也即是,当验证到该待验证文件的特征字符串与目标文件的特征字符串不一致时,进一步验证该待验证文件是否为正版应用对应文件的方法为:验证该白名单中是否存储有该待验证文件的文件信息,如果有,则确认该待验证应用为正版应用,如果没有,则确认该待验证应用为该目标文件对应应用的仿冒版 本。
在本发明又一实施例中,该特征数据库中还存储有多个非正版文件的文件信息、特征字符串以及该多个正版文件和该多个非正版文件中每个文件的验证结果,相应地,该根据该目标文件的文件信息和该待验证文件的文件信息,对该待验证文件进行验证包括:如果该目标文件的文件信息与该待验证文件的文件信息一致,且该目标文件为验证通过文件,对该待验证文件验证通过;如果该目标文件的文件信息与该待验证文件的文件信息一致,且该目标文件为验证不通过文件,对该待验证文件验证不通过。
在该特征数据库中同时存储有多个正版文件的文件信息和特征字符串、该多个非正版文件的文件信息和特征字符串,以及该多个正版文件和该多个非正版文件中每个群文件的验证结果时,能够实现后续的查询过程,且避免对已验证文件的重复验证过程。
如流程图2C所示,步骤203和步骤204为根据步骤202所生成的特征字符串进行的后续验证过程,即根据该特征字符串从该特征数据库中确定可疑特征字符串,该可疑特征字符串是指与该特征字符串之间的相似度在预设范围内的特征字符串,根据该可疑特征串获取目标文件的文件信息,再根据该目标文件的文件信息对该待验证文件进行验证。
205、如果该待验证文件验证通过,将该待验证文件的特征字符串和文件信息存储至该特征数据库中。
在该特征数据库中只存储有多个正版文件的文件信息和特征字符串时,如果该待验证文件验证通过,即确认该待验证文件为正版文件时,将改待验证文件的特征字符串和文件信息存储至该特征数据库中。
在本发明另一实施例中,在该特征数据库中同时存储有多个正版文件的文件信息和特征字符串、该多个非正版文件的文件信息和特征字符串,以及该多个正版文件和该多个非正版文件中每个群文件的验证结果时,根据该目标文件的文件信息和该待验证文件的文件信息,对该待验证文件进行验证之后,将该待验证文件的特征字符串、文件信息和验证结果存储至该特征数据库中,以避免对该待验证文件的重复验证过程。对于该待验证文件的特征字符串从生成到存储至该特征数据库中的流程,可以用图2D表示。
进一步地,在待验证文件验证未通过,则可以通过对该文件进行标识的方式,来记录非正版文件,对于这些已标记的文件可以进行删除,还可以在显示 该待验证文件时基于其标识对用户进行提示,以使得用户获知该文件存在风险。
需要说明的是,在该特征数据库中同时存储有多个正版文件的文件信息和特征字符串、该多个非正版文件的文件信息和特征字符串,以及该多个正版文件和该多个非正版文件中每个群文件的验证结果时,还能够实现信息查询功能,即当用户需要查找某款应用时,在查询接口输入相应文件的文件名称,以便服务器根据该文件名称查询该特征数据库中所存储的文件中与该文件名称相匹配的文件,如图2E所示,具体包括下述步骤:
206、接收查询请求,该查询请求至少携带待查询文件的文件名称。
该查询服务的接口可以设置在应用商店中,也可以设置在手机管家等应用程序中,或者设置在其他应用或网页中,本发明实施例对此不作限定。当该查询服务的接口设置在应用商店中时,当用户想要下载某一应用,输入该应用对应文件的文件名称,服务器获取该查询请求,该查询请求至少携带该待查询文件的文件名称,以使得该服务器能够根据该文件名称中进行查询。
207、根据该文件名称,从该特征数据库中获取至少一个匹配文件的文件名称及对应验证结果。
该匹配文件是指该特征数据库中所有文件中文件名称与待查询文件的文件名称相匹配的文件,根据该待查询文件的文件名称,从该特征数据库中获取至少一个匹配文件的文件名称及对应验证结果的方法可以为:通过文字识别技术,从该特征数据库中获取与该待查询文件的文件名称相匹配的文件,将该文件确定为匹配文件,在获取该匹配文件的验证结果。
例如,当该待查询文件的文件名称为“开心消消乐”时,将该特征数据库中文件名称为“开心对对碰”、“动物消消乐”、“天天爱消除”等的文件获取为匹配文件,并根据该匹配文件获取对应的验证结果、
需要说明的是,该至少一个匹配文件中可以只有正版文件,也可以只有非正版文件,也可以既包括正版文件也包括非正版文件,具体情况取决于根据该特征数据库中所存储的数据以及该待查询文件的文件名称。
208、向该查询请求的发送端反馈查询结果,该查询结果至少包括该至少一个匹配文件的文件名称及对应验证结果,以使得在该发送端的界面显示该至少一个匹配文件的文件名称及对应验证结果。
在获取到该至少一个匹配文件的文件名称及对应验证结果时,将该查询结 果反馈至查询请求的发送端,以在该发送端的界面显示该至少一个匹配文件的文件名称及对应验证结果,进而使得用户根据该查询结果选择所要安装的应用。
需要说明的是,在该特征数据库中只存储有多个正版文件的文件信息和特征字符串时,在服务器处理该查询请求时,所获取到的匹配文件均为正版文件,在这种情况下,可以省略获取该匹配文件的验证结果的步骤。
在本发明另一实施例中,当该特征数据库中未存储有与文件名称相匹配的相关文件信息时,向该查询请求的发送端反馈提示信息,以在该查询请求的发送端显示该提示消息,用于提示用户未搜索到相关信息,该提示消息除了包括用于提示用户的信息外,还可以包括多个推荐应用的相关信息或者其他信息,本发明实施例对此不作具体限定。
本发明所提供的文件验证方法对应的文件验证系统可以由四个模块组成,包括:文件收集模块、特征计算模块、综合分析模块和查询服务模块。其中,文件收集模块用于收集待验证文件,即用于执行步骤201;特征计算模块用于计算文件摘要数据的特征字符串,即用于执行步骤202;综合分析模块用于获取该待验证文件对应目标文件,并根据该目标文件,对该待验证文件进行验证,即执行步骤202至步骤205;查询服务模块用于提供查询服务,即用于执行步骤206至步骤208。整体流程如图2F所示。
本发明实施例所提供的文件验证方法,通过从待验证文件中提取文件摘要数据,并根据该文件摘要数据生成该待验证文件的特征字符串,再根据该带验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,以根据该目标文件的文件信息,对该待验证文件进行验证,能够实现主动收集待验证文件,并验证其是属于正版应用还是属于正版应用的仿冒版本,并将验证结果对应存储至特征数据库中,从而能够对仿冒应用进行打击,保障用户信息安全及服务商利益;进一步地,通过以K-D树的形式存储特征字符串,能够提高文件验证效率;通过在接收到查询请求时,根据该查询请求中携带的APK文件的软件名或是包名,将相应的检索到的APK软件名称及验证结果发送至查询请求的发送端,以使得客户端能够获知该APK文件相关的正版文件和仿冒文件,进而使得其能选择正版文件执行应用的安装功能或是打击相应仿冒软件,进一步保障用户信息安全及服务商利益。
图3是本发明实施例提供的一种文件验证装置框图。参照图3,该装置包括文件摘要数据提取模块301,特征字符串生成模块302,目标文件确定模块303和验证模块304。
文件摘要数据提取模块301,用于从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
特征字符串生成模块302,用于根据所述文件摘要数据,生成所述待验证文件的特征字符串;
目标文件确定模块303,用于根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
验证模块304,用于根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
在本发明提供的第一种可能实现方式中,所述验证模块用于:
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,对所述待验证文件验证通过;
如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,对所述待验证文件验证不通过。
在本发明提供的第二种可能实现方式中,所述目标文件确定模块用于:
计算所述待验证文件的特征字符串与所述特征数据库中每个特征字符串的相似度;
将相似度在预设范围内的特征字符串对应的文件确定为所述待验证文件的目标文件。
在本发明提供的第三种可能实现方式中,所述相似度为汉明距离。
在本发明提供的第四种可能实现方式中,所述文件摘要数据为摘要文件,所述摘要文件中存储有所述待验证文件中所有资源文件的文件名称、文件类型和摘要信息;相应地,所述特征字符串生成模块用于:
根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本;
根据所述特征文本,生成所述待验证文件的特征字符串。
在本发明提供的第五种可能实现方式中,所述特征字符串生成模块用于:
根据所述特征文本,通过敏感哈希simhash算法生成所述待验证文件的特征字符串。
在本发明提供的第六种可能实现方式中,所述特征字符串生成模块用于:
根据所述所有资源文件的文件类型,从所述所有资源文件中获取指定摘要信息,所述指定摘要信息为指定类型资源文件的摘要信息;
根据所述指定摘要信息生成所述特征文本。
在本发明提供的第七种可能实现方式中,所述文件摘要数据为所述待验证应用的应用图标;相应地,所述特征字符串生成模块用于:
根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串。
在本发明提供的第八种可能实现方式中,所述特征字符串生成模块用于:
根据所述待验证应用的应用图标,通过感知哈希pHash算法或尺度不变特征变换SIFT算法生成所述待验证文件的特征字符串。
在本发明提供的第九种可能实现方式中,所述文件摘要数据包括所述待验证应用的应用图标和所述摘要文件,相应地,所述特征字符串生成模块用于:
根据所述待验证应用的应用图标,生成所述待验证文件的第一特征字符串;
根据所述摘要文件生成特征文本,并根据所述特征文本生成所述待验证文件的第二特征字符串;
根据所述第一特征字符串和所述第二特征字符串,生成所述待验证文件的特征字符串。
在本发明提供的第十种可能实现方式中,所述特征数据库中还存储有白名单,相应地,所述验证模块用于:
如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,查询所述白名单中是否存储有所述待验证文件的文件信息;
如果所述白名单中存储有所述待验证文件的文件信息,对所述待验证文件验证通过;
如果所述白名单中未存储有所述待验证文件的文件信息,对所述待验证文件验证不通过。
在本发明提供的第十一种可能实现方式中,所述白名单中存储有所有正版文件的文件信息。
在本发明提供的第十二种可能实现方式中,所述装置还包括:
存储模块,用于如果所述待验证文件验证通过,将所述待验证文件的特征字符串和文件信息存储至所述特征数据库中。
在本发明提供的第十三种可能实现方式中,所述特征数据库中还存储有多个非正版文件的文件信息、特征字符串以及所述多个正版文件和所述多个非正版文件中每个文件的验证结果,相应地,所述验证模块用于:
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证通过文件,对所述待验证文件验证通过;
如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证不通过文件,对所述待验证文件验证不通过。
在本发明提供的第十四种可能实现方式中,所述装置还包括:
存储模块,用于将所述待验证文件的特征字符串、文件信息和验证结果存储至所述特征数据库中。
在本发明提供的第十五种可能实现方式中,所述文件信息还包括文件名称,相应地,所述装置还包括:
接收模块,用于接收查询请求,所述查询请求至少携带待查询文件的文件名称;
匹配文件获取模块,用于根据所述文件名称,从所述特征数据库中获取至少一个匹配文件的文件名称及对应验证结果;
发送模块,用于向所述查询请求的发送端反馈查询结果,所述查询结果至少包括所述至少一个匹配文件的文件名称及对应验证结果,以使得在所述发送端的界面显示所述至少一个匹配文件的文件名称及对应验证结果。
在本发明提供的第十六种可能实现方式中,所述特征数据库中的特征字符串信息以K-D树的形式存储。
需要说明的是:上述实施例提供的文件验证装置在验证文件时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的文件验证装置与文件验证方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4是本发明实施例提供的一种用于文件验证的装置400的框图。例如,装置400可以被提供为一服务器。参照图4,装置400包括处理组件422,其进一步包括一个或多个处理器,以及由存储器432所代表的存储器资源,用于存储可由处理组件422的执行的指令,例如应用程序。存储器432中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件422被配置为执行指令,以执行上述方法。
装置400还可以包括一个电源组件426被配置为执行装置400的电源管理,一个有线或无线网络接口450被配置为将装置400连接到网络,和一个输入输出(I/O)接口458。装置400可以操作基于存储在存储器432的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
在示例性实施例中,还提供了一种包括指令的计算机可读存储介质,例如包括指令的存储器,上述指令可由服务器中的处理器执行以完成上述实施例中的文件验证方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (20)

  1. 一种文件验证方法,其特征在于,所述方法包括:
    从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
    根据所述文件摘要数据,生成所述待验证文件的特征字符串;
    根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
    根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
  2. 根据权利要求1所述的方法,其特征在于,根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
    如果所述目标文件的文件信息与所述待验证文件的文件信息一致,对所述待验证文件验证通过;
    如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,对所述待验证文件验证不通过。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息包括:
    计算所述待验证文件的特征字符串与所述特征数据库中每个特征字符串的相似度;
    将相似度在预设范围内的特征字符串对应的文件确定为所述待验证文件的目标文件。
  4. 根据权利要求1所述的方法,其特征在于,所述文件摘要数据为摘要文件,所述摘要文件中存储有所述待验证文件中所有资源文件的文件名称、文件类型和摘要信息;相应地,所述根据所述文件摘要数据,生成所述待验证文件 的特征字符串包括:
    根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本;
    根据所述特征文本,生成所述待验证文件的特征字符串。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述特征文本,生成所述待验证文件的特征字符串包括:
    根据所述特征文本,通过敏感哈希simhash算法生成所述待验证文件的特征字符串。
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本包括:
    根据所述所有资源文件的文件类型,从所述所有资源文件中获取指定摘要信息,所述指定摘要信息为指定类型资源文件的摘要信息;
    根据所述指定摘要信息生成所述特征文本。
  7. 根据权利要求1所述的方法,其特征在于,所述文件摘要数据为所述待验证应用的应用图标;相应地,所述根据所述文件摘要数据,生成所述待验证文件的特征字符串包括:
    根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串。
  8. 根据权利要求7所述的方法,其特征在于,根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串包括:
    根据所述待验证应用的应用图标,通过感知哈希pHash算法或尺度不变特征变换SIFT算法生成所述待验证文件的特征字符串。
  9. 根据权利要求1所述的方法,其特征在于,所述文件摘要数据包括所述待验证应用的应用图标和所述摘要文件,相应地,所述根据所述文件摘要数据,生成所述待验证文件的特征字符串包括:
    根据所述待验证应用的应用图标,生成所述待验证文件的第一特征字符串;
    根据所述摘要文件生成特征文本,并根据所述特征文本生成所述待验证文 件的第二特征字符串;
    根据所述第一特征字符串和所述第二特征字符串,生成所述待验证文件的特征字符串。
  10. 根据权利要求1所述的方法,其特征在于,所述特征数据库中还存储有白名单,相应地,所述根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
    如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,查询所述白名单中是否存储有所述待验证文件的文件信息;
    如果所述白名单中存储有所述待验证文件的文件信息,对所述待验证文件验证通过;
    如果所述白名单中未存储有所述待验证文件的文件信息,对所述待验证文件验证不通过。
  11. 根据权利要求10所述的方法,其特征在于,所述白名单中存储有所有正版文件的文件信息。
  12. 根据权利要求1所述的方法,其特征在于,所述特征数据库中还存储有多个非正版文件的文件信息、特征字符串以及所述多个正版文件和所述多个非正版文件中每个文件的验证结果,相应地,所述根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证包括:
    如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证通过文件,对所述待验证文件验证通过;
    如果所述目标文件的文件信息与所述待验证文件的文件信息一致,且所述目标文件为验证不通过文件,对所述待验证文件验证不通过。
  13. 根据权利要求12所述的方法,其特征在于,所述文件信息还包括文件名称,相应地,所述方法还包括:
    接收查询请求,所述查询请求至少携带待查询文件的文件名称;
    根据所述文件名称,从所述特征数据库中获取至少一个匹配文件的文件名称及对应验证结果;
    向所述查询请求的发送端反馈查询结果,所述查询结果至少包括所述至少一个匹配文件的文件名称及对应验证结果,以使得在所述发送端的界面显示所述至少一个匹配文件的文件名称及对应验证结果。
  14. 一种文件验证装置,其特征在于,所述装置包括:存储器以及一个或多个处理器,所述处理器被配置执行下述方法:
    从待验证文件中提取文件摘要数据,所述待验证文件为待验证应用的安装包,所述文件摘要数据用于唯一标识所述待验证文件的文件内容;
    根据所述文件摘要数据,生成所述待验证文件的特征字符串;
    根据所述待验证文件的特征字符串,从特征数据库中确定目标文件的文件信息,所述目标文件为与所述待验证文件的特征字符串匹配的文件,所述特征数据库中至少存储有多个正版文件的文件信息和特征字符串,所述文件信息至少包括证书特征值;
    根据所述目标文件的文件信息和所述待验证文件的文件信息,对所述待验证文件进行验证。
  15. 根据权利要求14所述的装置,其特征在于,所述处理器还被配置执行下述方法:
    计算所述待验证文件的特征字符串与所述特征数据库中每个特征字符串的相似度;
    将相似度在预设范围内的特征字符串对应的文件确定为所述待验证文件的目标文件。
  16. 根据权利要求14所述的装置,其特征在于,所述文件摘要数据为摘要文件,所述摘要文件中存储有所述待验证文件中所有资源文件的文件名称、文件类型和摘要信息;相应地,所述处理器还被配置执行下述方法:
    根据所述所有资源文件的文件名称、文件类型和摘要信息,按照指定规则生成特征文本;
    根据所述特征文本,生成所述待验证文件的特征字符串。
  17. 根据权利要求16所述的装置,其特征在于,所述处理器还被配置执行 下述方法:
    根据所述所有资源文件的文件类型,从所述所有资源文件中获取指定摘要信息,所述指定摘要信息为指定类型资源文件的摘要信息;
    根据所述指定摘要信息生成所述特征文本。
  18. 根据权利要求14所述的装置,其特征在于,所述文件摘要数据为所述待验证应用的应用图标;相应地,所述处理器还被配置执行下述方法:
    根据所述待验证应用的应用图标,生成所述待验证文件的特征字符串。
  19. 根据权利要求14所述的装置,其特征在于,所述文件摘要数据包括所述待验证应用的应用图标和所述摘要文件,相应地,所述处理器还被配置执行下述方法:
    根据所述待验证应用的应用图标,生成所述待验证文件的第一特征字符串;
    根据所述摘要文件生成特征文本,并根据所述特征文本生成所述待验证文件的第二特征字符串;
    根据所述第一特征字符串和所述第二特征字符串,生成所述待验证文件的特征字符串。
  20. 根据权利要求14所述的装置,其特征在于,所述特征数据库中还存储有白名单,相应地,所述处理器还被配置执行下述方法:
    如果所述目标文件的文件信息与所述待验证文件的文件信息不一致,查询所述白名单中是否存储有所述待验证文件的文件信息;
    如果所述白名单中存储有所述待验证文件的文件信息,对所述待验证文件验证通过;
    如果所述白名单中未存储有所述待验证文件的文件信息,对所述待验证文件验证不通过。
PCT/CN2017/084042 2016-05-24 2017-05-12 文件验证方法及装置 WO2017202214A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/974,241 US11188635B2 (en) 2016-05-24 2018-05-08 File authentication method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610349815.XA CN106055602A (zh) 2016-05-24 2016-05-24 文件验证方法及装置
CN201610349815.X 2016-05-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/974,241 Continuation US11188635B2 (en) 2016-05-24 2018-05-08 File authentication method and apparatus

Publications (1)

Publication Number Publication Date
WO2017202214A1 true WO2017202214A1 (zh) 2017-11-30

Family

ID=57174303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/084042 WO2017202214A1 (zh) 2016-05-24 2017-05-12 文件验证方法及装置

Country Status (3)

Country Link
US (1) US11188635B2 (zh)
CN (1) CN106055602A (zh)
WO (1) WO2017202214A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548092B (zh) * 2016-10-31 2019-07-16 杭州嘉楠耘智信息科技有限公司 一种文件处理方法及装置
CN106649218A (zh) * 2016-11-16 2017-05-10 中国人民解放军国防科学技术大学 一种基于SimHash算法的二进制文件快速比较方法
CN108427733B (zh) * 2018-02-28 2021-08-10 网易(杭州)网络有限公司 审核规则的设置方法、装置和系统、设备、存储介质
CN108416212A (zh) * 2018-03-01 2018-08-17 腾讯科技(深圳)有限公司 应用程序识别方法和装置
CN108491458A (zh) * 2018-03-02 2018-09-04 深圳市联软科技股份有限公司 一种敏感文件检测方法、介质及设备
CN108363580A (zh) * 2018-03-12 2018-08-03 平安普惠企业管理有限公司 应用程序安装方法、装置、计算机设备和存储介质
JP7195796B2 (ja) * 2018-07-23 2022-12-26 キヤノン株式会社 情報処理装置、情報処理装置の制御方法、及び、プログラム
US10992703B2 (en) * 2019-03-04 2021-04-27 Malwarebytes Inc. Facet whitelisting in anomaly detection
CN110135149A (zh) * 2019-05-13 2019-08-16 深圳大趋智能科技有限公司 一种应用安装的方法及相关装置
CN110609789A (zh) * 2019-08-29 2019-12-24 烽火通信科技股份有限公司 一种用于软件License校验的方法和系统
TWI730415B (zh) * 2019-09-18 2021-06-11 財團法人工業技術研究院 偵測系統、偵測方法、及藉由使用偵測方法所執行的更新驗證方法
CN112800004B (zh) * 2019-10-28 2023-06-16 浙江宇视科技有限公司 一种车牌算法库的控制方法、装置、设备和介质
CN111046127B (zh) * 2019-12-18 2023-08-29 秒针信息技术有限公司 基于地图的信息显示方法、装置、电子设备及存储介质
CN111416863A (zh) * 2020-03-20 2020-07-14 上海圣剑网络科技股份有限公司 基于客户端的多元化下载管理方法、终端及介质
CN112597485B (zh) * 2021-03-01 2021-06-08 腾讯科技(深圳)有限公司 基于区块链的信息校验方法、装置和设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020516A (zh) * 2013-01-17 2013-04-03 珠海市君天电子科技有限公司 一种防御网购木马的方法及其装置
US20130160147A1 (en) * 2011-12-16 2013-06-20 Dell Products L.P. Protected application programming interfaces
CN104486312A (zh) * 2014-12-04 2015-04-01 北京奇虎科技有限公司 一种应用程序的识别方法和装置
CN104657634A (zh) * 2015-02-28 2015-05-27 百度在线网络技术(北京)有限公司 盗版应用的识别方法和装置
CN105045633A (zh) * 2015-08-10 2015-11-11 广东欧珀移动通信有限公司 一种扫描升级包的方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6697997B1 (en) * 1998-08-12 2004-02-24 Nippon Telegraph And Telephone Corporation Recording medium with a signed hypertext recorded thereon signed hypertext generating method and apparatus and signed hypertext verifying method and apparatus
CN101639880A (zh) * 2008-07-31 2010-02-03 华为技术有限公司 一种文件检测方法和装置
US9003314B2 (en) * 2008-08-06 2015-04-07 Mcafee, Inc. System, method, and computer program product for detecting unwanted data based on an analysis of an icon
US9043919B2 (en) * 2008-10-21 2015-05-26 Lookout, Inc. Crawling multiple markets and correlating
US8756432B1 (en) * 2012-05-22 2014-06-17 Symantec Corporation Systems and methods for detecting malicious digitally-signed applications
CN103067364B (zh) * 2012-12-21 2015-11-25 华为技术有限公司 病毒检测方法及设备
US9292694B1 (en) * 2013-03-15 2016-03-22 Bitdefender IPR Management Ltd. Privacy protection for mobile devices
CN103324697B (zh) * 2013-06-07 2016-08-24 北京掌汇天下科技有限公司 一种基于图标对比的android应用搜索山寨应用剔除方法
CN109977086B (zh) * 2013-11-29 2023-09-01 华为终端有限公司 终端间应用共享的方法和终端
US9197662B2 (en) * 2014-02-26 2015-11-24 Symantec Corporation Systems and methods for optimizing scans of pre-installed applications
CN104980816A (zh) * 2014-04-01 2015-10-14 泰学(北京)文化传媒有限公司 数字视频内容版权保护sd卡
KR101720686B1 (ko) * 2014-10-21 2017-03-28 한국전자통신연구원 시각화 유사도 기반 악성 어플리케이션 감지 장치 및 감지 방법
US20160142409A1 (en) * 2014-11-18 2016-05-19 Microsoft Technology Licensing, Llc Optimized token-based proxy authentication
KR20160109870A (ko) * 2015-03-13 2016-09-21 한국전자통신연구원 안드로이드 멀웨어의 고속 검색 시스템 및 방법
US9916448B1 (en) * 2016-01-21 2018-03-13 Trend Micro Incorporated Detection of malicious mobile apps

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130160147A1 (en) * 2011-12-16 2013-06-20 Dell Products L.P. Protected application programming interfaces
CN103020516A (zh) * 2013-01-17 2013-04-03 珠海市君天电子科技有限公司 一种防御网购木马的方法及其装置
CN104486312A (zh) * 2014-12-04 2015-04-01 北京奇虎科技有限公司 一种应用程序的识别方法和装置
CN104657634A (zh) * 2015-02-28 2015-05-27 百度在线网络技术(北京)有限公司 盗版应用的识别方法和装置
CN105045633A (zh) * 2015-08-10 2015-11-11 广东欧珀移动通信有限公司 一种扫描升级包的方法及装置

Also Published As

Publication number Publication date
US20180253545A1 (en) 2018-09-06
US11188635B2 (en) 2021-11-30
CN106055602A (zh) 2016-10-26

Similar Documents

Publication Publication Date Title
WO2017202214A1 (zh) 文件验证方法及装置
Sun et al. DroidEagle: Seamless detection of visually similar Android apps
US7096493B1 (en) Internet file safety information center
US10073916B2 (en) Method and system for facilitating terminal identifiers
CN109361643B (zh) 一种恶意样本的深度溯源方法
CN104067283B (zh) 识别移动环境的木马化应用程序
JP6609047B2 (ja) アプリケーション情報リスクマネジメントのための方法及びデバイス
WO2016078182A1 (zh) 敏感数据的授权方法、装置和系统
CN107247902B (zh) 恶意软件分类系统及方法
WO2014166312A1 (zh) 一种广告插件识别的方法和系统
CN102110198A (zh) 一种网页防伪的方法
CN104980278A (zh) 验证生物特征图像的可用性的方法和装置
CN103441848A (zh) 移动终端的应用认证方法和系统
CN113595967A (zh) 数据识别方法、设备、存储介质及装置
Deng et al. Understanding real-world threats to deep learning models in android apps
CN110135153A (zh) 软件的可信检测方法及装置
CN113726818A (zh) 一种失陷主机检测方法及装置
WO2017084513A1 (zh) 一种核验信息处理方法及服务器
Shahriar et al. Detection of repackaged android malware
JP2015132942A (ja) 接続先情報判定装置、接続先情報判定方法、及びプログラム
Martín et al. Clonespot: Fast detection of android repackages
CN111695113B (zh) 终端软件安装合规性检测方法、装置和计算机设备
US11436331B2 (en) Similarity hash for android executables
TWI750252B (zh) 記錄網站存取日誌的方法和裝置
CN110020246B (zh) 一种终端的标识信息生成方法及相关设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17802060

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17802060

Country of ref document: EP

Kind code of ref document: A1