CN104657665A - File processing method - Google Patents

File processing method Download PDF

Info

Publication number
CN104657665A
CN104657665A CN201510108614.6A CN201510108614A CN104657665A CN 104657665 A CN104657665 A CN 104657665A CN 201510108614 A CN201510108614 A CN 201510108614A CN 104657665 A CN104657665 A CN 104657665A
Authority
CN
China
Prior art keywords
file
image
similarity
feature
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510108614.6A
Other languages
Chinese (zh)
Other versions
CN104657665B (en
Inventor
罗阳
陈虹宇
王峻岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Shenhu Technology Co ltd
Original Assignee
SICHUAN SHENHU TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SICHUAN SHENHU TECHNOLOGY Co Ltd filed Critical SICHUAN SHENHU TECHNOLOGY Co Ltd
Priority to CN201510108614.6A priority Critical patent/CN104657665B/en
Publication of CN104657665A publication Critical patent/CN104657665A/en
Application granted granted Critical
Publication of CN104657665B publication Critical patent/CN104657665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a file processing method which comprises the following steps: selecting signature files of predefined type from installation files, wherein the signature files of the predefined type comprise interface description profiles, audio files and image files; extracting features of the signature files by using a feature extracting step; based on the features, identifying whether the installation files are camouflaged or not by comparing the similarity and a preset threshold. According to the file processing method disclosed by the invention, the identification can be realized by extracting features of application installation file contents; in addition, the interference caused by camouflage and malicious modification of files and contents can be effectively resisted; the feature scale of file contents is reduced by using the feature extracting process, so the operation efficiency is improved.

Description

A kind of document handling method
Technical field
The present invention relates to file processing, particularly one application installation file security processing.
Background technology
In Mobile solution field, application program is submitted to application market by developer, and user is by application market down load application.But still cannot avoid the existence of Malware in official market; Security Assurance Mechanism is perfect not, causes the ratio of Malware to remain high.Wherein, embedding known codes and camouflage applications installation file are chief threats.Whether existing technical scheme adopts decompiling instrument or dynamic behaviour analysis tool to be applied behavior sequence, carries out pre-service obtain behavior sequence feature to behavior sequence, be applied by the quantized data pretended by the distance comparing behavior sequence feature.The method can identify the change of application code, but the extraction of behavior sequence feature is easily subject to the impact of Code Obfuscation Security Technology, thus has certain limitation when analyzing for practical problems.
Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of document handling method, comprising:
The tag file of predefined type is selected from installation file, the tag file of described predefined type comprises interface description file, audio file and image file, characteristic extraction step is utilized to extract the feature of above-mentioned tag file, based on described feature, whether pretended by the size identification installation file comparing similarity and predetermined threshold value.
Preferably, application installation file is described as gather app={exe; Lib; Profile; Image; Audio; Etc}, wherein exe represents the executable byte codes in installation file, the primary code library in lib representation program, and profile represents for the XML document that routine data stores and layout describes, image file in image representation program, the alternative document in etc representation program.
Preferably, in characteristic extraction procedure, when extracting the feature of image file,
First the picture size in installation file is reduced, and coloured image is converted to gray level image, calculate average gray level, image content features is extracted according to similarity hash algorithm, according to the brightness of image be patterned into and often open the fingerprint of Computer image genration character string as image, according to the similarity between the degree of approximation determination image between fingerprint;
Wherein downscaled images size is by image down to K × K pixel, and for removal of images resolution to the interference of similarity-rough set, the difference removing picture size and image scaled, only retain the essential information such as structure, brightness, K value is set to 128; And wherein, picture material similarity-rough set comprises the Hamming distance of calculated fingerprint.
In characteristic extraction procedure, when extracting with the feature of the interface description file of XML file form storage,
XML file similarity-rough set comprises structural similarity and content similarity, XML file is converted to tree construction, XML textural difference is obtained by the difference comparing tree, XML content deltas is obtained by the node difference comparing tree, interface description file stores according to pre-defined rule in the application, according to interface description file specification, obtain structure name list; Then extract architectural feature according to structure name list, the architectural feature in filtering interface description document and symbolic information, obtain content information; Finally calculate cryptographic hash to structure and content information, obtain Structural Eigenvalue and content characteristic, interface description file obtains a Hash array after treatment, thus the content similarity of interface description file is converted into the similarity comparing Hash array.
In characteristic extraction procedure, when extracting the feature of audio file,
Adopt file cryptographic hash as audio file feature, input audio file stream S, and predetermined constant character string M, calculate the MD5 cryptographic hash H1 of input audio file stream S, then will input audio file stream S to be added with predetermined constant character string M, and calculate the MD5 cryptographic hash H2 of addition result, H1 and H2 will be sued for peace, obtain final hash value, as the content characteristic of audio file.
The present invention compared to existing technology, has the following advantages:
The present invention proposes a kind of file processing, identify by extracting application installation file content characteristic, and effectively can resist the interference that the camouflage of file and catalogue and malicious modification bring, utilize characteristic extraction procedure to reduce file content Feature-scale, improve operation efficiency.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the document handling method according to the embodiment of the present invention.
Embodiment
Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.
Fig. 1 is the document handling method process flow diagram according to the embodiment of the present invention.Propose a kind of camouflage recognition methods of application program installation file.By analytical applications installation file attribute, select File type, extracts content characteristic, and adopts different Content Feature Extraction algorithms according to file type, gives weights to its similarity, thus improves accuracy and the operation efficiency of application program camouflage identification.
Application installation file exists with the form of compressed file, and inner with the form organize executable byte codes file of catalogue, certificate file and resource file, wherein executable byte codes is stored in class file; Certificate file is the signature file of application; Resource file comprises database file, function library file, XML file, image file etc.
In one embodiment, application installation file is described as gather app={exe; Lib; Profile; Image; Audio; Etc}, wherein exe represents the executable byte codes in installation file, the primary code library in lib representation program, and profile represents for the XML document that routine data stores and layout describes, image file in image representation program, the alternative document in etc representation program.Description according to set app: target of the present invention is the content characteristic according to associated documents such as exe, lib, profile, image, the camouflage identification of executive utility installation file.
Whether pretended to analyze installation file by file content accurately and efficiently, and realistic identification demand, the method that the present invention proposes puts forth effort to reach following three targets: 1) adapt to large data operation, quantity in application market is large, growth is fast, and the system framework of energy fast processing mass data is the basis adapting to large data operation; 2) select suitable tag file, have thousands of kinds of file types in installation file, the content extracting which file directly affects efficiency and the accuracy of camouflage identification; 3) feature extraction efficiently and accurately characteristics algorithm, the speed of extraction document content characteristic determines system effectiveness, and characteristics algorithm is the basic guarantee that guarantee system correctly can provide result of determination accurately simultaneously.
The present invention does not lose the accuracy of operation result while ensureing to raise the efficiency in the process of extraction document content characteristic, calculation document similarity.
First require algorithm for target can not be too complicated, if too complicated for target, so need to reduce this target, select wherein crucial key element and contrast; Secondly efficiency of algorithm is high; Finally, when developing algorithm process, to be optimized the running environment of algorithm as far as possible, reduce the intermediate steps of algorithm, cut down the content that may cause plenty of time and space consuming in algorithm.
First need to select suitable tag file, file in an application installation file is not from hundreds of to several thousand etc., as carried out feature extraction to the content of all files, easily cause the result that target is too complicated, analysis efficiency is low, and be easily subject to the interference of inserting discarded record.Therefore the present invention is according to ubiquity, representativeness and metrizability principle, and selection portion divides suitable file type as tag file, reduces Feature-scale, thus reduce operand when at utmost ensureing that tag file effectively represents application installation file.
Next, from installation file, extract the feature selected files, obtain the file interface of installation file, according to compressed file position offset orientation tag file, save and the step of decompress(ion) is carried out to improve operation efficiency to other irrelevant files.First the tag file in application is added up, different algorithm realization is contrasted according to statistical law, most suitable optimization is carried out to algorithm, most effective algorithm is adopted under the prerequisite ensureing accuracy, and in leaching process, apply multithreading scheme, rewrite the partial function not supporting multithreading, ensure the Thread safety of all computings, improve operation efficiency further.
Finally, carry out camouflage based on file content feature and identify, when measuring similarity algorithm design, according to the statistical nature of application, adopt Hash table counting, exchange for space consuming time-optimized.
By file content feature calculation file similarity, first suitable tag file to be selected from the file type of complexity.Suitable tag file needs to have following three features.Comprise the file of the type in most of installation file, if certain file type only exists at minority application memory, then cannot carry out similarity-rough set by such file content feature; File content feature has " signature " characteristic, can represent this application, and the file content feature extracted in different application has otherness; File content has range performance, and the file content distance in similar documents is near, on the contrary the file content distance in different file.In one embodiment, select interface description file, image file, audio file as tag file, can be described as appfile={image; Audio; Profile}, main thought is calculation document content characteristic similarity, analyzes similarity with this, and available following formula represents:
com(app1,app2)=com(appfile1,appfile2)。
The present invention's content characteristic of this three class file represents the feature of installation file.The set of every class file content characteristic contains the feature of this type of All Files, represents with following formula:
image f = Σ i = 1 n image f [ i ] ;
audio f = Σ i = 1 n audio f [ i ] ;
profile f = Σ i = 1 n profile f [ i ] .
N represents the quantity of documents that often kind of file type comprises, the content similarity of computed image, audio frequency, interface description file, often kind of feature of two methods is contrasted, file characteristic calculating formula of similarity can be derived as follows, represent that in installation file, file similarity is equivalent to the similarity of all the type in two methods installation file:
com _ image ( app 1 , app 2 ) = Σ i = 1 n Σ j = 1 m com ( ima ge f 1 [ i ] , image f 2 [ j ] ) ;
com _ audio ( app 1 , app 2 ) = Σ i = 1 n Σ j = 1 m com ( audi o f 1 [ i ] , audio f 2 [ j ] ) ;
com _ prifile ( app 1 , app 2 ) = Σ i = 1 n Σ j = 1 m com ( profil e f 1 [ i ] , profile f 2 [ j ] ) .
M represents the quantity of documents that often kind of file type comprises.
Be used alone image, audio frequency or interface description file content characteristic similarity representative application installation file similarity, result is not ideal enough, causes failing to report if threshold value arranges higher meeting; If threshold value arranges too low, wrong report can be caused.Therefore, the present invention gives weights to image, audio frequency and interface description file content similarity, and represent application installation file similarity by the Weighted Similarity of three kinds of file content features, Weighted Similarity formula is expressed as follows:
com(app1,app2)=com(appfile1,appfile2)=
com_image×α+com_audio×β+com_profile×γ。
Above formula represents that the similarity of application app1 and app2 is equivalent to the similarity of app1 and app2 internal file, is equivalent to the weighted value of image, sound, interface description file similarity in two installation files.α herein, value dynamic change according to the difference of com_image, com_audio, com_profile of beta, gamma.
Image, audio frequency and the quantity of interface description file in installation file differ, and certain applications do not comprise audio file, so fixing α, beta, gamma cannot calculation document similarity effectively.Embodiments of the invention utilize the method for Dynamic Weights: namely according to com_image, the size of com_audio, com_profile tri-value gives weights, determines three most suitable weights by study, be respectively 0.6,0.3,0.1, com_image, com_audio, the maximum weights of com_profile intermediate value are 0.6, and secondly weights are 0.3, and minimum weights are 0.1.
Whether can obtain the similarity of two files through above process, can judge whether two files belong to similar application by comparing similarity and the size of threshold value T, be namely simulated papers.
The present invention adopts file content feature to represent application characteristic, and the feature for different file proposes concrete feature extracting method and similarity algorithm.
At present, existing image similarity matching algorithm needs larger room and time expense, cannot be applied in large-scale calculations environment.And camouflage applications installation file adopts effect diagram picture in two ways usually: 1) modify in original image basis; 2) original image resolution is changed.Based on such consideration, in image content features leaching process, need to select a kind of algorithm, amendment image can be reduced and eliminate resolution and reduce the interference brought.Therefore, first the present invention reduces the picture size in installation file, and coloured image is converted to gray level image, calculate average gray level, image content features is extracted according to similarity hash algorithm, according to the brightness of image be patterned into and often open " fingerprint " of Computer image genration character string as image, the fingerprint of image is more similar then represents that 2 images are more similar.Computational complexity is reduced while improve accuracy.
Wherein downscaled images size is to K × K pixel by image down, this process is mainly used in removal of images resolution to the interference of similarity-rough set, the difference removing picture size and image scaled, only retain the essential information such as structure, brightness, K value here is generally set to 128.The image of 40 × 40 resolution occurs that in Mobile solution ratio is the highest.Picture material similarity-rough set needs the Hamming distance of calculated fingerprint, i.e. two different character numbers of fingerprint character string correspondence position, and K=40, then string length is K × K/8=200.The present invention simplifies this step, adopts the whether equal replacement Hamming distance of character string, and whether cost is whether unanimously similarity result can only show two finger images, cannot be similar by Hamming distance recognition image fingerprint.
Interface description file in installation file stores with XML file form, and therefore, the feature extraction of interface description file content is equal to XML file Content Feature Extraction.XML file similarity-rough set comprises structural similarity and content similarity 2 aspect, XML file is converted to tree construction, obtains XML textural difference by the difference comparing tree, obtain XML content deltas by the node difference comparing tree.
Interface description file stores according to pre-defined rule in the application, when known regimes, present invention employs a kind of simple structure and Content Feature Extraction method: first, according to interface description file specification, obtain structure name list; Then, extract architectural feature according to structure name list, the architectural feature in filtering interface description document and symbolic information, obtain content information; Finally cryptographic hash is calculated to structure and content information, obtain Structural Eigenvalue and content characteristic.Interface description file obtains a Hash array after treatment, thus the content similarity of interface description file is converted into the similarity comparing Hash array.
Find through carrying out analysis to the audio file in installation file, camouflage applications installation file bag does not carry out large amendment to audio file, and therefore the present invention adopts file cryptographic hash as audio file feature.Calculate audio file cryptographic hash.Less for its hash space extensive computing, Hash result easily collides.Therefore, the present invention proposes following hash method, greatly reduces Hash collision when ensureing arithmetic speed.Input audio file stream S, and predetermined constant character string M, calculate the MD5 cryptographic hash H1 of input audio file stream S, then will input audio file stream S to be added with predetermined constant character string M, and calculate the MD5 cryptographic hash H2 of addition result, H1 and H2 is sued for peace, obtains final hash value.The content characteristic of secondary cryptographic hash as audio file of audio file is obtained by above algorithm.
Application installation file content characteristic comprises image content features, interface description file content characteristic sum audio file content feature.Image content features is image " fingerprint " set; An interface description file content is characterized as a Hash set, and all interface description file content features in application installation file are made up of multiple Hash set; Audio file content is characterized as Hash set.Three kinds of file content characteristic sets all can be considered string assemble.Chosen content similarity of the present invention is as standard, and its computing method are: the ratio shared by set that the common factor element of set A and B is less in A and B.This method can weigh the similarity between the set of different length effectively.Content similarity L (A, B) is expressed as follows:
L(A,B)=|A∩B|/min(|A|,|B|)。
Thus, tag file set calculating formula of similarity, by file characteristic similarity formula and the content similarity derivation of equation, represents that file set similarity is equivalent to the content similarity of file set; File similarity calculates by the Weighted Similarity derivation of equation, represents that file similarity is equivalent to file set Similarity-Weighted value, i.e. the weighted value of three kinds of tag file content similarity.
File similarity is obtained, not by the interference that document directory structure changes by calculating tag file content similarity; And the similarity calculating method selected adopts set length less in two set as standard, the interference of inserting garbage files therefore effectively can be resisted.
According to a further aspect in the invention, also proposed the anti-dazzle system of a kind of Mobile solution, first adopt message digest algorithm to carry out the sampling of initialization fingerprint to each file of server, be stored in telesecurity database and local security file.Create false proof arrangement, the request of access that process client is submitted to.Analyze request of access, extract access path, fingerprint in application installation file fingerprint and storehouse is provided response scheme after comparing; Directly trace back to web page files, be applicable to the website of the dynamic and static state page.The mode calling local page snapshot and file verification contrast is adopted to recover by the pagefile pretended.
Further, system be mainly used in coordinating access request, camouflage identify, site file upgrade with event alarm 4 actions between relation.When system acceptance is to web access request, calls camouflage identification module and each HTTP request is analyzed, follow the trail of called file and access path; Adopt the digital finger-print of the false proof arrangement computing application installation file of carry in security component, original fingerprint in itself and safety zone is contrasted, judge whether application installation file is pretended; If do not pretended, Web server is with normal HTTP request response user access.Otherwise, enable emergency recovery module immediately, call local page snapshot response user, enable recovery module afterwards and call local backup replacement simulated papers, complete reparation.When enabling snapping technique, even if pagefile is pretended by hacker or resets, the page after camouflage also can not be misinformated to viewer by server, avoids causing bad consequence.System log (SYSLOG) camouflage daily record, notifies managerial personnel with SMS or E-mail mode.In server, each server file will be locked after enabling anti-camouflage, cannot upgrade without permission; FTP or SSL mode can be adopted to upgrade after authentication unlocks.Local fingerprint base, backup file and snapshot and remote library file carry out in good time synchronous, to ensure data consistent.
System realizes false proof installation system by providing the integrality of protection site file, monitoring and process HTTP request of access, fast quick-recovery simulated papers, alarm and credible issue five functional.Thus, by system service end, client and publishing side 3 parts.
(1) service end.Take database as the communication that hinge completes between multiple client.There is provided the storage of file backup, snapshot, the process of site file initialize digital fingerprint, all kinds of daily record for each client and pretend the preservation of warning information.Service end only opens the port with client and database communication in the course of the work, to provide the security of system to greatest extent; This structure will lay the first stone for system transplantation.
(2) client.Being installed on when not changing legacy network topological structure in shielded server, setting up trusted communications with service end and publishing side.Client comprises initialization, interviewed file monitor and tracking, site listing and locks, pretends to identify, pretends to recover and local resource backs up six functions, is the core of whole anti-dazzle system.
When enabling first, initialization will be carried out to the protected site file of server, and gather the digital finger-print of each file, be stored in the safety database of service end, and by its back-up storage in local file fingerprint; For ensureing the safety of local file, symmetric key cryptography AES is adopted to be encrypted to local digital fingerprint and backup file and snapshot.When after the more newer command receiving publishing side, unlock its protection catalogue, the digital finger-print being updated file is upgraded.False proof die-filling piece when processing customer page request of access, calculate its fingerprint according to interviewed pagefile name and access path and contrast with the fingerprint in safety zone, if unanimously, responding; Otherwise enable camouflage recover and event alarm module, perform processing procedure afterwards, and the source IP of record access request, source port and destination slogan, camouflage process ID, revised context, structure warning message notify managerial personnel.For the request of quick customer in response end, recover module and first read page snapshot alleviation user access; Replace simulated papers after getting local backup file decryption again, when local backup file is destroyed, backup file will be issued from service end and recover, to process catastrophic event.
(3) publishing side.Mainly complete the issue of new server and the renewal of original server file.Publishing side is by after client certificate, and client creates new website according to request or unlocks requested website, completes issuing command; At the end of client relock website.
System initialization process is set up with the catalogue of characteristic information name after comprising each client submission server feature information in service end server appointed area.In order to ensure its uniqueness, characteristic information adopt the IP address of client computer, CPUID, hard disk ID form character string cryptographic hash represent.Service end sets up unified database, stores each site file fingerprint, daily record and warning information.Client configures local operating conditions after completing and verifying with service end link information; specify in current server the file type needing website and the different website protected; with the ciphertext of site name and creation-time for filename creates local security catalogue; be used for storage backup and snapshot document, store the XML document of finger print data, daily record and alarm data.
First file pre-processing assembly calls crypto engine, adopts public key encryption algorithm RSA to generate pair of secret keys; PKI adopts after AES encryption and is stored in backup server, then the PKI copy after encryption is saved in local security catalogue together with private key, and exchanges PKI with backup server in time, for data syn-chronization and website recovery provide working environment.Then traversal engine is called after reading the site listing that need protect and file type; the server file of traversal regulation suffix; it is unique, irreversible digital finger-print to adopt MD5 algorithm to calculate; fingerprint results is pressed certain data structure stored in database; press site name again and generate XML document stored in local security file, for false proof arrangement contrast.Finally the station data traveled through is adopted the public key encryption of client, be stored in wait in local security catalogue and carry out synchronous with backup server.Whole
Camouflage identifying utilizes security component to develop false proof arrangement, analyzes, the data submitted to by HTTP client to HTTP request, and extract access path and filename, monitor in real time its integrality, the legitimacy that file changes is verified.Adopt the false proof arrangement of kernel inside technological development, and set up mapping relations in the mapping table by security components interfaces, serviced device is loaded in the process space, completes the calculating to each interviewed Fingerprint of Web Page and original fingerprint contrast work.
When server receives HTTP request, first request application installation file is followed the trail of, then the cryptographic hash of computing application installation file, finally call fingerprint contrast assembly; Contrast with current calculated fingerprint after reading the original fingerprint deciphering of applying installation file in local security region, if coupling, reply HTTP request, otherwise enter Recovery processing and emergency response flow process.Emergency response assembly, after receiving contrast failure command, generates html format text response HTTP request, the efficiency of Deterministic service device HTTP request response and quality after the local snapshot document deciphering of the same name of system call; To call in local security region source document with the fastest speed after having responded, after AES deciphering, replacements recovery is carried out to simulated papers, to the full extent for file provides safe guarantee; If recover unsuccessfully, to forbid current file, request is redirected to specified page.While file access pattern, system log (SYSLOG) camouflage daily record, by the mode of SMS or Email for managerial personnel send a warning message, for data analysis in the future and management provide foundation.Snapshot calls and accesses redirection process required time and occur in several milliseconds, and requestor cannot receive by the response contents of the camouflage page.Client is called the digital finger-print of file bottom filtration drive module to the file of the shielded website of current server and stated type by certain cycle duration and is calculated, contrasts, identifies, to guarantee the similarity of digital finger-print everywhere.
In sum, the present invention proposes a kind of file processing, identifying by extracting application installation file content characteristic, and effectively can resist the interference that the camouflage of file and catalogue and malicious modification bring, utilize characteristic extraction procedure to reduce file content Feature-scale, improve operation efficiency.
Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (5)

1. a document handling method, for identifying the application program installation file of camouflage, is characterized in that, comprise:
The tag file of predefined type is selected from installation file, the tag file of described predefined type comprises interface description file, audio file and image file, characteristic extraction step is utilized to extract the feature of above-mentioned tag file, based on described feature, whether pretended by the size identification installation file comparing similarity and predetermined threshold value.
2. method according to claim 1, is characterized in that, also comprises:
Application installation file is described as gather app={exe; Lib; Profile; Image; Audio; Etc}, wherein exe represents the executable byte codes in installation file, the primary code library in lib representation program, and profile represents for the XML document that routine data stores and layout describes, image file in image representation program, the alternative document in etc representation program.
3. method according to claim 2, is characterized in that, in characteristic extraction procedure, when extracting the feature of image file,
First the picture size in installation file is reduced, and coloured image is converted to gray level image, calculate average gray level, image content features is extracted according to similarity hash algorithm, according to the brightness of image be patterned into and often open the fingerprint of Computer image genration character string as image, according to the similarity between the degree of approximation determination image between fingerprint;
Wherein downscaled images size is by image down to K × K pixel, and for removal of images resolution to the interference of similarity-rough set, the difference removing picture size and image scaled, only retain the essential information such as structure, brightness, K value is set to 128; And wherein, picture material similarity-rough set comprises the Hamming distance of calculated fingerprint.
4. method according to claim 3, is characterized in that, in characteristic extraction procedure, when extracting with the feature of the interface description file of XML file form storage,
XML file similarity-rough set comprises structural similarity and content similarity, XML file is converted to tree construction, XML textural difference is obtained by the difference comparing tree, XML content deltas is obtained by the node difference comparing tree, interface description file stores according to pre-defined rule in the application, according to interface description file specification, obtain structure name list; Then extract architectural feature according to structure name list, the architectural feature in filtering interface description document and symbolic information, obtain content information; Finally calculate cryptographic hash to structure and content information, obtain Structural Eigenvalue and content characteristic, interface description file obtains a Hash array after treatment, thus the content similarity of interface description file is converted into the similarity comparing Hash array.
5. method according to claim 4, is characterized in that, in characteristic extraction procedure, when extracting the feature of audio file,
Adopt file cryptographic hash as audio file feature, input audio file stream S, and predetermined constant character string M, calculate the MD5 cryptographic hash H1 of input audio file stream S, then will input audio file stream S to be added with predetermined constant character string M, and calculate the MD5 cryptographic hash H2 of addition result, H1 and H2 will be sued for peace, obtain final hash value, as the content characteristic of audio file.
CN201510108614.6A 2015-03-12 2015-03-12 A kind of document handling method Active CN104657665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510108614.6A CN104657665B (en) 2015-03-12 2015-03-12 A kind of document handling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510108614.6A CN104657665B (en) 2015-03-12 2015-03-12 A kind of document handling method

Publications (2)

Publication Number Publication Date
CN104657665A true CN104657665A (en) 2015-05-27
CN104657665B CN104657665B (en) 2017-12-08

Family

ID=53248776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510108614.6A Active CN104657665B (en) 2015-03-12 2015-03-12 A kind of document handling method

Country Status (1)

Country Link
CN (1) CN104657665B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224367A (en) * 2015-09-30 2016-01-06 浪潮电子信息产业股份有限公司 A kind of installation method of software and device
CN105653984A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 File fingerprint check method and apparatus
WO2016197710A1 (en) * 2015-11-27 2016-12-15 中兴通讯股份有限公司 Method and device for identifying fake software interface for mobile terminal
CN107323114A (en) * 2017-06-22 2017-11-07 珠海汇金科技股份有限公司 Intrusion detection method, system and the print control instrument of print control instrument
CN107992599A (en) * 2017-12-13 2018-05-04 厦门市美亚柏科信息股份有限公司 File comparison method and system
CN108123934A (en) * 2017-12-06 2018-06-05 深圳先进技术研究院 A kind of data integrity verifying method towards mobile terminal
CN108491458A (en) * 2018-03-02 2018-09-04 深圳市联软科技股份有限公司 A kind of sensitive document detection method, medium and equipment
CN109564613A (en) * 2016-07-27 2019-04-02 日本电气株式会社 Signature creation equipment, signature creation method, the recording medium for recording signature creation program and software determine system
CN111160123A (en) * 2019-12-11 2020-05-15 桂林长海发展有限责任公司 Airplane target identification method and device and storage medium
CN113590144A (en) * 2021-08-16 2021-11-02 北京字节跳动网络技术有限公司 Dependency processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158967A (en) * 2007-11-16 2008-04-09 北京交通大学 Quick-speed audio advertisement recognition method based on layered matching
CN101369268A (en) * 2007-08-15 2009-02-18 北京书生国际信息技术有限公司 Storage method for document data in document warehouse system
CN102968439A (en) * 2012-10-11 2013-03-13 微梦创科网络科技(中国)有限公司 Method and device for sending microblogs
CN103400076A (en) * 2013-07-30 2013-11-20 腾讯科技(深圳)有限公司 Method, device and system for detecting malicious software on mobile terminal
CN104091152A (en) * 2014-06-30 2014-10-08 南京理工大学 Method for detecting pedestrians in big data environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369268A (en) * 2007-08-15 2009-02-18 北京书生国际信息技术有限公司 Storage method for document data in document warehouse system
CN101158967A (en) * 2007-11-16 2008-04-09 北京交通大学 Quick-speed audio advertisement recognition method based on layered matching
CN102968439A (en) * 2012-10-11 2013-03-13 微梦创科网络科技(中国)有限公司 Method and device for sending microblogs
CN103400076A (en) * 2013-07-30 2013-11-20 腾讯科技(深圳)有限公司 Method, device and system for detecting malicious software on mobile terminal
CN104091152A (en) * 2014-06-30 2014-10-08 南京理工大学 Method for detecting pedestrians in big data environment

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224367A (en) * 2015-09-30 2016-01-06 浪潮电子信息产业股份有限公司 A kind of installation method of software and device
WO2016197710A1 (en) * 2015-11-27 2016-12-15 中兴通讯股份有限公司 Method and device for identifying fake software interface for mobile terminal
CN105653984A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 File fingerprint check method and apparatus
CN105653984B (en) * 2015-12-25 2019-04-19 北京奇虎科技有限公司 File fingerprint method of calibration and device
CN109564613A (en) * 2016-07-27 2019-04-02 日本电气株式会社 Signature creation equipment, signature creation method, the recording medium for recording signature creation program and software determine system
CN107323114B (en) * 2017-06-22 2019-08-16 珠海汇金科技股份有限公司 Intrusion detection method, system and the print control instrument of print control instrument
CN107323114A (en) * 2017-06-22 2017-11-07 珠海汇金科技股份有限公司 Intrusion detection method, system and the print control instrument of print control instrument
CN108123934A (en) * 2017-12-06 2018-06-05 深圳先进技术研究院 A kind of data integrity verifying method towards mobile terminal
CN108123934B (en) * 2017-12-06 2021-02-19 深圳先进技术研究院 Mobile-end-oriented data integrity verification method
CN107992599A (en) * 2017-12-13 2018-05-04 厦门市美亚柏科信息股份有限公司 File comparison method and system
CN108491458A (en) * 2018-03-02 2018-09-04 深圳市联软科技股份有限公司 A kind of sensitive document detection method, medium and equipment
CN111160123A (en) * 2019-12-11 2020-05-15 桂林长海发展有限责任公司 Airplane target identification method and device and storage medium
CN111160123B (en) * 2019-12-11 2023-06-09 桂林长海发展有限责任公司 Aircraft target identification method, device and storage medium
CN113590144A (en) * 2021-08-16 2021-11-02 北京字节跳动网络技术有限公司 Dependency processing method and device

Also Published As

Publication number Publication date
CN104657665B (en) 2017-12-08

Similar Documents

Publication Publication Date Title
CN104657665A (en) File processing method
US11750641B2 (en) Systems and methods for identifying and mapping sensitive data on an enterprise
US11082443B2 (en) Systems and methods for remote identification of enterprise threats
Lee et al. Blockchain based privacy preserving multimedia intelligent video surveillance using secure Merkle tree
Khan et al. Cloud log forensics: Foundations, state of the art, and future directions
US9584543B2 (en) Method and system for web integrity validator
Chen et al. Detecting android malware using clone detection
US9356965B2 (en) Method and system for providing transparent trusted computing
CN112217835B (en) Message data processing method and device, server and terminal equipment
Ahsan et al. Class: cloud log assuring soundness and secrecy scheme for cloud forensics
Wu et al. A countermeasure to SQL injection attack for cloud environment
CN104657504A (en) Fast file identification method
US9390287B2 (en) Secure data scanning method and system
Khan et al. Digital forensics and cyber forensics investigation: security challenges, limitations, open issues, and future direction
CN111291001A (en) Reading method and device of computer file, computer system and storage medium
Ruiz et al. The leakage of passwords from home banking sites: A threat to global cyber security?
WO2023146737A1 (en) Multi-variate anomalous access detection
Bates et al. Secure and trustworthy provenance collection for digital forensics
Daniel et al. ES-DAS: An enhanced and secure dynamic auditing scheme for data storage in cloud environment
Wang et al. Cloud data integrity verification algorithm based on data mining and accounting informatization
CN113127919A (en) Data processing method and device, computing equipment and storage medium
CN117459327B (en) Cloud data transparent encryption protection method, system and device
Cho et al. Guaranteeing the integrity and reliability of distributed personal information access records
Sun et al. On the Development of a Protection Profile Module for Encryption Key Management Components
Resul et al. Cryptolog: A new approach to provide log security for digital forensics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230613

Address after: F13, Building 11, Zone D, New Economic Industrial Park, No. 99, West Section of Hupan Road, Xinglong Street, Tianfu New District, Chengdu, Sichuan, 610000

Patentee after: Sichuan Shenhu Technology Co.,Ltd.

Address before: 610041 No. 5, floor 1, unit 1, building 19, No. 177, middle section of Tianfu Avenue, high tech Zone, Chengdu, Sichuan Province

Patentee before: SICHUAN CINGHOO TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right