CN109816038A - A kind of Internet of Things firmware program classification method and its device - Google Patents
A kind of Internet of Things firmware program classification method and its device Download PDFInfo
- Publication number
- CN109816038A CN109816038A CN201910098931.2A CN201910098931A CN109816038A CN 109816038 A CN109816038 A CN 109816038A CN 201910098931 A CN201910098931 A CN 201910098931A CN 109816038 A CN109816038 A CN 109816038A
- Authority
- CN
- China
- Prior art keywords
- tree
- driver
- firmware
- character string
- readable character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The invention discloses a kind of Internet of Things firmware program classification method and its devices, including extracting the readable character string in firmware;Driver tree according to readable character string building firmware;The root node of driver tree is firmware number, and the second node layer is Program Type, and third node layer is the information type of readable character string, and the 4th node layer is the content of corresponding readable character string;The difference degree numerical value of corresponding node between every two driver trees is successively calculated, and records calculated result;Calculated result includes the mark, the mark of corresponding node calculated and its difference degree numerical value of two driver trees calculated;It screens to obtain the maximum top n driver tree of ambient density according to calculated result and be clustered as cluster centre, obtain several firmware classifications, for subsequent foundation firmware classification progress firmware reparation.The present invention considers the similarity degree between whole readable character strings in classification, and the accuracy of classification is high, reduces workload when subsequent firmware is repaired.
Description
Technical field
The present invention relates to firmware recovery technique fields, more particularly to a kind of Internet of Things firmware program classification method and its dress
It sets.
Background technique
Firmware (Firmware) refers to the equipment " driver " saved inside equipment, and by firmware, operating system could be pressed
The device drives of sighting target standard realize that the run action, such as CD-ROM drive, CD writer etc. of specific machine have internal firmware.Firmware is load
Appoint the software of the most basic bottom work of a system.And in hardware device, due to some hardware devices in addition to firmware with
It is formed outside without other softwares, therefore firmware also just decides the function and performance of hardware device.
Testing process in the production link of hardware product can find out hardware there are the problem of.But this test wrapper
Section can only find out existing problem or loophole.After a test, product comes into operation, because hardware device is networked, attacker is logical
Cross network attack hardware system, Internet of Things firmware may be made to generate new loophole, this new loophole be testing process be can not
It finds out.There are problems that loophole, conventional method are replacement new model equipment hardware device at present, but due to hardware
Equipment usage amount is huge, is limited to the cost problem of hardware device, and the processing method of most of companies still reformats equipment,
Then proceed to using.Hardware device originally usually all uses in the Intranet of company, that is, is not coupled on Internet, into
And form isolation physically.But with the development of technology of Internet of things, each hardware device will connected network communication.In this feelings
Under condition, if there is the loophole on hardware, and is excavated by the criminal on network, serious threat will be brought to production safety.
So picking out problematic firmware and repairing, this work is very crucial.
When carrying out firmware reparation, due to this problem program have been used a period of time, may with this program production at
Product hardware is ten hundreds of, each to detect and is confirmed whether there are loophole and patching bugs, this workload is too
It is huge.The difference of the hardware platform as locating for the firmware, the compiler option is not when compiler used is different and compiler
With selection, even identical firmware program, as these reasons finally generate different assembly code and machine code.
It therefore, is that firmware is classified to reduce a kind of existing method of workload, specific method is to be divided into firmware program
Then smaller file section is respectively compared the similarity that each file section is in two firmwares, as long as there is a file section similarity
The two firmwares are then just classified as one kind by height.But under this mode, since firmware is there are many file sections, be easy so that
There can be many different multiplexing codes in every a kind of firmware after subsequent classification, that is, belong to the phase between a kind of firmware code
Low like spending, classification accuracy is low.
Therefore, how to provide a kind of Internet of Things firmware program classification method that classification accuracy is high and its device is this field
The current problem to be solved of technical staff.
Summary of the invention
The object of the present invention is to provide a kind of Internet of Things firmware program classification method and its device, by product tree come
Whole readable code sections of firmware are organized, so that in view of the similarity degree between whole readable character strings when classification, from
And the accuracy of classification is improved, and then reduce workload when subsequent firmware is repaired.
In order to solve the above technical problems, the present invention provides a kind of Internet of Things firmware program classification methods, comprising:
Extract the readable character string in each firmware to be sorted;
The driver tree of the firmware is constructed according to the readable character string;The root node of the driver tree is institute
The number of firmware is stated, the second node layer of the driver tree is program part type belonging to readable character string, the drive
The third node layer of dynamic program tree is the information type of readable character string;4th node layer is the interior of corresponding readable character string
Hold;
The difference degree numerical value of corresponding node between every two driver trees in whole driver trees is successively calculated, and
Record calculated result;The calculated result includes mark, the corresponding node calculated of two driver trees calculated
Mark and its difference degree numerical value;
Screen to obtain the maximum top n driver tree of ambient density as cluster centre progress according to the calculated result
Cluster, obtains several firmware classifications, carries out firmware analysis reparation according to the firmware classification for subsequent;N is positive integer.
Preferably, after the readable character string extracted in each firmware to be sorted, according to the readable character string
Before the driver tree for constructing the firmware, further includes:
Judge whether the readable character string is readable character string relevant to platform or readable word relevant with chained library
Symbol string, if so, the part readable character string is deleted, if it is not, continuing to judge next extracted readable character string, until mentioning
The whole readable character strings judgement taken finishes;
Correspondingly, the readable character string after subsequent foundation judgement constructs the driver tree of the firmware;
Wherein, judge the readable character string whether be readable character string relevant to platform process are as follows:
Judge whether the obtained information quantity of the readable character string is greater than preset platform dependent thresholds, if so, described
Readable character string is readable character string relevant to platform, and otherwise, the readable character string is not readable word relevant to platform
Symbol string;The obtained information quantity of the readable character string specifically:
Wherein, IG (s) is obtained information quantity;CiFor i-th of target platform;P(Ci) it is target platform CiIn binary system text
The ratio of part Zhan total binary file;P (s) is the total binary file of the binary file Zhan containing readable character string s
Ratio;P(s,Ci) it is target platform CiIt and include the ratio of the total binary file of binary file Zhan of readable character string s.
Preferably, the process for calculating the difference degree numerical value of corresponding node between two driver trees specifically:
According to nodal distance relational expression, successively calculate every in the first layer, the second layer and third layer of two driver trees
Difference degree numerical value between a corresponding node;
The nodal distance relational expression are as follows:
Wherein,Driver tree is formed by for i-th of firmware;Driver tree is formed by for j-th of firmware;ForWithIn in same position corresponding node v difference degree numerical value;ForInterior joint v's
The set of all child nodes;Wherein,
Preferably, described to screen to obtain the maximum top n driver tree conduct of ambient density according to the calculated result
The process of cluster centre includes:
Determine whole differences in the calculated result between every driver tree and other whole driver trees
Degree numerical value;
Count the difference for being less than pre-determined distance threshold value in the corresponding whole difference degree numerical value of every driver tree
The number of degree numerical value, the ambient density number as this driver tree;
All driver trees are ranked up according to the sequence of ambient density number from big to small, top n is selected to drive
Dynamic program tree is as cluster centre.
Preferably, each corresponding section in the first layer, the second layer and third layer for successively calculating two driver trees
Difference degree numerical value between point, and after recording calculated result, further includes:
According to layer distance relation formula, calculates the layer distance of respective layer between every two driver trees and saved;
Wherein, the layer distance relation formula are as follows:
Wherein,ForWithL layers of layer distance;ForL layers of all nodes collection
It closes;
Wherein, βvFor the corresponding weight of node v,ForThe set of all child nodes of interior joint v, w are v's
Father node.
Preferably, described according to layer distance relation formula, calculate the layer distance of respective layer between every two driver trees
Later, further includes:
According to tree distance relation formula, the tree distance between every two driver trees is calculated;The nodal distance and described
Tree distance is the difference degree numerical value;Wherein, the tree distance relation formula are as follows:
Wherein,ForWithBetween tree distance, γ is common ratio;H (φ) is the height of driver tree,
The value of H (φ) is { 1,2,3 };ωlIt is l layers of layers apart from weight coefficient;Wherein:
Preferably, it is described select top n driver tree as cluster centre after, further includes:
Judge whether the tree distance between any two cluster centre is greater than default tree distance threshold, if it is not, by current
The N+1 driver tree as cluster centre, and ambient density number lesser one in two cluster centres currently judged
A cluster centre is placed in last position of sorting, and repeats the above process later;Until the tree between any two cluster centre
Distance is all larger than the default tree distance threshold.
In order to solve the above technical problems, the present invention also provides a kind of Internet of Things firmware program sorters, comprising:
Extraction module, for extracting the readable character string in each firmware to be sorted;
Structure tree constructs module, for constructing the driver tree of the firmware according to the readable character string;The drive
The root node of dynamic program tree is the number of the firmware, and the second node layer of the driver tree is belonging to readable character string
Program part type, the third node layer of the driver tree are the information type of readable character string;4th node layer is pair
The content for the readable character string answered;
Distance calculation module, for corresponding node between every two driver trees in the whole driver trees of successively calculating
Difference degree numerical value, and record calculated result;The calculated result includes the mark of two driver trees calculated, institute
The mark and its difference degree numerical value of the corresponding node of calculating;
Cluster module obtains the maximum top n driver tree work of ambient density for screening according to the calculated result
It is clustered for cluster centre, obtains several firmware classifications, carry out firmware analysis reparation according to the firmware classification for subsequent;
N is positive integer.
The present invention provides a kind of Internet of Things firmware program classification method and its devices, in the readable character string for extracting firmware
Afterwards, the driver tree of each firmware is constructed according to readable character string, the second node layer of driver tree is readable character string
Affiliated program part type, the third node layer of driver tree are the information type of readable character string;Every two are calculated later
The difference degree numerical value of corresponding node between driver tree, difference degree numerical value is for showing that the two corresponding nodes are wrapped
Containing the difference degree between content, and difference degree is smaller, and the content for showing that the two corresponding nodes are included is more similar;And it is interior
Rong Yue is similar, and it is closer also to be understood as the distance between the two corresponding nodes;Therefore, subsequent according to each driver tree
With the calculated result between other driver trees, it is used as in cluster to screen the maximum driver tree of top n ambient density
The heart is clustered, and firmware classification is completed, for subsequent progress firmware analysis reparation.As it can be seen that in the present invention, utilizing product tree construction
Carry out the readable character string of tissue whole, so that not just it is divided into one kind when only one section of similarity is higher when subsequent cluster, and
It is that the corresponding calculated result of whole node accounts in comprehensive product tree construction, that is, considers whole readable characters of firmware
Similarity degree between string, so that belonging to multiplexing as having the same as possible between a kind of firmware code after final classification
Code, that is, the similarity belonged between the code between a kind of firmware is as high as possible, to improve the accuracy of classification, in turn
Reduce workload when subsequent firmware is repaired.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of structural schematic diagram of driver tree provided by the invention;
Fig. 2 is a kind of flow chart of the process of Internet of Things firmware program classification method provided by the invention;
Fig. 3 is the flow chart of the process of another Internet of Things firmware program classification method provided by the invention;
Fig. 4 is a kind of structural schematic diagram of Internet of Things firmware program sorter provided by the invention.
Specific embodiment
Core of the invention is to provide a kind of Internet of Things firmware program classification method and its device, by product tree come
Whole readable code sections of firmware are organized, so that in view of the similarity degree between whole readable character strings when classification, from
And the accuracy of classification is improved, and then reduce workload when subsequent firmware is repaired.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Shown in Figure 2 the present invention provides a kind of Internet of Things firmware program classification method, Fig. 2 is provided by the invention
A kind of flow chart of the process of Internet of Things firmware program classification method;This method comprises:
Step s1: the readable character string in each firmware to be sorted is extracted;
In different hardware platforms, using different compilers, the different compiling option of selection, always have some program codes
Great variety will not occur, i.e., the readable character string in binary code, these readable character strings are in different translation and compiling environments
Under still keep similitude.Therefore, the present invention needs the character according to these with general character when classifying to firmware program
String, to classify to firmware.
Step s2: the driver tree according to readable character string building firmware;The root node of driver tree is firmware
Number, the second node layer of driver tree are program part type belonging to readable character string, the third layer of driver tree
Node is the information type of readable character string;4th node layer is the content of corresponding readable character string;I.e. by Internet of Things firmware
Program is successively pacified according to the content of the driven by program tree second layer and third node layer according to different readable character strings (code segment)
It is placed on the 4th layer (leaf node) of tree.
Product tree (Product Structure Tree, PST): being the material composition for describing a certain product and each portion
The tree-shaped figure of hierarchical structure of single cent part composition.It is by the product information in product data management, in conjunction between each components
Hierarchical relationship, form a kind of effective attribute management structure.Product tree is each by product according to the hierarchical relationship of the product
Kind components are organized according to certain hierarchical relationship, can clearly describe the relationship between product all parts, part,
Node on behalf component, part or component on tree, each node can belong to figure number, material, specification, the model of the component etc.
Property information and relevant documentation are related.In PST, root nodes stand product, branch node on behalf component or subassembly, leaf segment
Point represents part.The distinguishing hierarchy of product tree must reflect the function division and composition of product, and product must be taken into consideration in it
Production and business needs.After the completion of the General layout Plan of product, to realize that the function of product is drawn by product tree
Point, by product material object.Product structure tree hierachy will be determined according to product complexity.Simultaneously also because of management mode of enterprise not
With and difference, as soon as some enterprises indicate a serial product with one tree, one product of the enterprise also having uses one
Tree representation.
The characteristics of product tree is utilized in the present invention, constructs driver tree, or it can be appreciated that product is
Product tree when firmware driver.So that according to the restricting relation and semanteme between level between the program of firmware
Relationship is successively organized, in view of whole readable character strings that firmware includes when subsequent cluster, to guarantee to be presented to as far as possible
Each category code file of professional all guarantees to be generated by few Multiplexing module source code as far as possible on line.It is understood that
Will include a large amount of Multiplexing module source code in such in same class when the firmware code similarity for including is low, i.e., such
In existing lap is different between firmware two-by-two, such as firmware A and firmware B includes Multiplexing module source code in such
1, firmware A and firmware C then include Multiplexing module source code 2, etc.;Cause classification accuracy low in this way.
Shown in Figure 1, Fig. 1 is a kind of structural schematic diagram of driver tree provided by the invention;Wherein root node is
The number of Internet of Things firmware program file distinguishes different Internet of Things firmware files with this;Second node layer is LINUX embedded
The part formed in system driver and WINDOWS CE embedded system driver (constitutes the first subseries herein, incites somebody to action
The different program part of same firmware program separates);Each node indicates readable character string under various circumstances in third layer
Information category;The node that 4th node layer is made of the contents of program in firmware, these programs are according to the second layer and third layer
The content of node has respectively constituted different leaf nodes.According to the truth of Internet of Things firmware program, the 4th node layer is
True Internet of Things firmware program is constituted;For the node in the second layer and third layer, if its all child nodes is all sky, that
This node is also deleted.In this way, what is be finally constituted is exactly the driver tree of corresponding firmware program file.
Step s3: the difference degree of corresponding node between every two driver trees in whole driver trees is successively calculated
Numerical value, and record calculated result;Calculated result includes mark, the corresponding node calculated of two driver trees calculated
Mark and its difference degree numerical value;
Here corresponding node refers to that the node is identical in the upper location of this two driver trees.For example, this
When the node that calculates be GC group connector service routine in first driver tree second layer node, then its corresponding node
For the node of GC group connector service routine in second driver tree second layer.Difference degree numerical value (can also become node
Distance) for showing the difference degree between the included content of the two corresponding nodes, and difference degree is smaller, shows the two
The content that corresponding node is included is more similar;
Step s4: screen to obtain the maximum top n driver tree of ambient density as cluster centre according to calculated result
It is clustered, obtains several firmware classifications, carry out firmware analysis reparation according to firmware classification for subsequent;N is positive integer.
It is understood that ambient density, refers to being less than certain threshold value with the difference degree numerical value of the driver tree
Driver tree number, number is more, shows that the density around the driver tree is higher, i.e., the driver tree more connects
Nearly cluster centre.
The present invention provides a kind of Internet of Things firmware program classification methods, after the readable character string for extracting firmware, foundation
Readable character string constructs the driver tree of each firmware, and the second node layer of driver tree is journey belonging to readable character string
Prelude classifying type, the third node layer of driver tree are the information type of readable character string;Every two drivings journey is calculated later
The difference degree numerical value of corresponding node between sequence tree, difference degree numerical value for show the included content of the two corresponding nodes it
Between difference degree, and difference degree is smaller, and the content for showing that the two corresponding nodes are included is more similar;And content gets over phase
Seemingly, it is closer that the distance between the two corresponding nodes also are understood as;Therefore, subsequent according to each driver tree and other
Calculated result between driver tree carries out to screen the maximum driver tree of top n ambient density as cluster centre
Cluster completes firmware classification, for subsequent progress firmware analysis reparation.As it can be seen that in the present invention, using product tree construction come tissue
Whole readable character strings, so that not just it is divided into one kind when only one section of similarity is higher when subsequent cluster, but it is comprehensive
The corresponding calculated result of whole node accounts in product tree construction, that is, considers between whole readable character strings of firmware
Similarity degree so that belong to multiplexing code as having the same as possible between a kind of firmware code after final classification,
The similarity belonged between the code between a kind of firmware is as high as possible, to improve the accuracy of classification, and then reduces
Workload when subsequent firmware is repaired.
Wherein, the process of step s1 specifically:
From the data segment and code segment of firmware to be sorted, readable character string is extracted;Readable character string include variable name,
Output information, error message, Debugging message, version information and sign character.
It is understood that these readable character strings include variable name banners, output information output message,
Error message error message, Debugging message debugging message, version information version strings, symbol
Character symbol table strings (such as ACSLL table);O:output message in Fig. 1;B:banners;E:error
message;D:debugging message;V:version strings;S:symbol table strings.Readable character
String is all made of a~z, A~Z, still keeps similitude under different translation and compiling environments.These readable character strings, major part is all
It is stored in the data segment of binary code, sub-fraction is stored in the code segment of binary code.And in the coding of Internet of Things firmware
It is largely ASCLL coding form in form, and another part is the coding form of UNICODE.So extracting rank in code
Section, main object have in 4, corresponding method are as follows:
The ACSLL code of data segment;(extraction readable character can be extracted with string packet under Linux system by extracting it
Length is more than 6 character string).ACSLL code, information coding are exactly to be converted into certain symbolism for indicating information convenient for calculating
Machine or another symbolism of people identification and processing;Or in same system, it is changed by a kind of forms of information representations another
The process of kind forms of information representations.For example, people by gesture, expression, expression in the eyes, the simple actions such as speak and express certain emotion;
Ancient times fight to beat a drum to indicate to march, and expression of calling off a battle is withdrawn troops;Traffic lights are yellow, green, red to be respectively indicated slowly traveling, leads to
Row, no through traffic, etc., is all a kind of simple information coding.Information is with binary representation, this table on computers
It is just highly difficult to show that method allows people to understand.Therefore input and output device is equipped on computer, the main purpose of these equipment is exactly,
Information is shown human-readable understanding by the form that can be read with a kind of mankind on devices.To guarantee the mankind and setting
It is standby, it can be carried out correct information exchange between equipment and computer, the unified information exchange code of people's establishment, here it is
ASCII character table.The information symbol of input is translated by certain rule and is compiled by the binary system that " 0 " and " 1 " form by computer
Code, is handled to binary coding, and processing result is finally reduced into the symbol that we can identify, exports corresponding letter
Breath.Currently, the information coding that computer-internal generally uses is ASCII character.Standard ASCII character is made of 7 bits, is used
To indicate 26 English upper and lower case letters and some additional characters.
The ACSLL code of code segment;Here general code effect is all definition and storage local variable, or is function
It calls, the reorientation of address.In this stage, the function of code mainly completes function with the storage form of stack, as long as then knowing
Not Chu each stack, and by the contents extraction of stack come out.Identification stack can be by identifying that a series of pull instruction, stacking refer to
It enables to constitute a stack.Readable character string is divided into different roles, and a kind of method can be used: firstly, identifying continuous
The instruction of push class (entering stack instruction) pushing-type.Then, the operand that pushing-type instructs is extracted from these continuous pushing-type instructions, then
The stack structure identified for each go out the groups of operands by extracting at data flow.Finally, these data flows can be with structure
At readable character string.
The Unicode code of data segment;Show the information of this code position according to different hardware to extract corresponding code.
In identical hardware, the position of this code is relatively more fixed.Only one character set of Unicode, Chinese, Japanese, Korean three
Kind text occupies the part of 0x3000 to 0x9FFF in Unicode simultaneously.What Unicode was generallyd use at present is UCS-2 mark
Standard, it encodes a character with two bytes, for example the coding of Chinese character " warp " is 0x7ECF.Because character code generally uses 16
System indicates that in order to distinguish with the decimal system, hexadecimal is started with 0x, it is 32463, UCS- that 0x7ECF, which is converted into the decimal system,
2 with two bytes come code character, two bytes are exactly 16 binary systems, and 2 16 powers are equal to 65536, so UCS-2 is most
65536 characters can be encoded.The character from 0 to 127 is encoded as the character that ASCII is encoded, such as alphabetical " a "
Unicode coding is 0x0061, and the corresponding decimal system is 97, and the ASCII of " a " coding is 0x61, and the corresponding decimal system is also
97.Since Chinese character quantity is excessive, and UCS-2 can only at most indicate 65536 characters in Unicode, therefore Unicode can only lead to
The method for excluding some almost unused Chinese characters is crossed so that remaining Chinese characters in common use can be expressed.In order to indicate all Chinese characters,
Unicode also has UCS-4 specification, most of to come from country variant under this specification exactly with 4 bytes come code character
It can be expressed with the readable character in area.
The Unicode code of code segment;This code is very rare, so the present invention is to ignore this code.
Process of the program from source code to executable program is as follows:
One, precompile: the precompile instruction with " # " beginning in main processing source code file.Processing rule is seen below:
1. deleting all #define, all macrodefinitions are unfolded.
2. handling all condition precompile instructions, such as " #if ", " #endif ", " #ifdef ", " #elif " and " #
else”。
3. handling " #include " precompile instruction, file content is substituted into its position, this process be recurrence into
Capable, it include alternative document in file.
4. deleting all annotations, " // " and "/* */".
5. retaining all #pragma compiler instructions, compiler needs to use them, such as: #pragma once be for
The file has been prevented to be repeated reference.
6. adding line number and file identification, the row number information of debugging is generated convenient for compiler when compiling, and produce when compiling
Raw compile error or warning are can to show line numbers.
Two, it compiles: xxx.i the or xxx.ii file generated after precompile, carrying out a series of morphological analyses, grammer point
After analysis, semantic analysis and optimization, corresponding assembly code file is generated.Mainly there is following process:
1. morphological analysis: using the algorithm for being similar to " finite state machine ", source code program is input in scanning machine, it will
Character string therein is divided into a series of mark.
2. syntactic analysis: syntax analyzer carries out syntactic analysis to the mark generated by scanner, generates syntax tree.By
The syntax tree of syntax analyzer output is a kind of using expression formula as the tree of node.
3. semantic analysis: syntax analyzer is the analysis completed to expression syntax level, and semantic analyzer is then right
Whether expression formula significant to be judged, the semanteme of analysis is static semantic --- compiling duration can semanteme by stages, relatively
The dynamic semantics answered are the semantemes that just can determine that in the runtime.Wherein, static semantic generally includes: the matching of statement and type,
The conversion of type, then when semantic analysis will be to check in terms of these, such as an int type is assigned to int* type,
Semantic analyzer can find that this type mismatches, and compiler will report an error.
4. optimization: the other optimization process * of * source code level, during entire syntax tree can be converted by source code optimizer
Between code --- the sequence of syntax tree indicates, very close to object code.There are many kinds of types for intermediate code, most commonly
" three-address code " and " P- code ", the wherein citation form of three-address code are as follows: x=y op z indicates variable y and z carrying out op
After operation, it is assigned to x, op operation can be addition subtraction multiplication and division etc..
5. Object Code Generator: intermediate code being converted into target machine code by code generator, is generated a series of
Code sequence --- assembler language indicates.
6. object code optimizes: object code optimizer optimizes above-mentioned target machine code: it is suitable to find
Addressing system is substituted multiplying using displacement, deletes extra instruction etc..
Three, it collects: assembly code is transformed into the instruction (machine code file) that machine can execute.
The assembly process of assembler is simpler for compiler, not complicated grammer, also without semanteme, less
Need to do optimization, it is only translated come assembly process has compilation one by one according to the table of comparisons of assembly instruction and machine instruction
Device as is completed.
Four, it links: the file in the same engineering is combined into a complete binary program.
Five, it loads: by binary program and combination of hardware, so as to run on a hardware platform.
Preferably, after step s1, before step s2, further includes:
Judge whether readable character string is readable character string relevant to platform or readable character string relevant with chained library,
If so, the part readable character string is deleted, if it is not, continue to judge next extracted readable character string, until extract
Whole readable character string judgements finish;
Correspondingly, the driver tree of the readable character string building firmware after subsequent foundation judgement;
Wherein, judge readable character string whether be readable character string relevant to platform process are as follows:
Judge whether to be greater than preset platform dependent thresholds by the obtained information quantity of readable character string that (threshold size here can
By adjusting in actual work, the present invention does not limit its occurrence), if so, readable character string is readable word relevant to platform
Symbol string, otherwise, readable character string is not readable character string relevant to platform;The obtained information quantity of readable character string specifically:
Wherein, IG (s) is obtained information quantity;CiFor i-th of target platform;P(Ci) it is target platform CiIn binary system text
The ratio of part Zhan total binary file;P (s) is the total binary file of the binary file Zhan containing readable character string s
Ratio;P(s,Ci) it is target platform CiIt and include the ratio of the total binary file of binary file Zhan of readable character string s.
It is understood that due in readable character string in addition to it includes have execute contents of program itself other than, also wrap
Containing some Partial Features as caused by hardware platform, encoder self-characteristic, this Partial Feature is not helpful for classifying,
The complexity that will increase classification instead reduces the accuracy of classification, therefore preferably deletes this partial data and filter, to mention
The accuracy of high-class reduces the calculation amount of classification.In addition, can also be by all label hardware platforms, compiler version and compiling
The instruction of device option all filters out.The above is only a kind of preferred embodiments, which content are specifically needed to filter, and how to carry out
Filtering can be set according to actual needs.
Specifically, buildroot tool may be used herein, then all files of cross compile are different targets
Platform creates a blacklist.Readable character string related with target platform and kernel level library, system is added inside blacklist
The sign character (these libraries general position under LINUX system is /lib ,/usr/lib) in the library of grade.Filter process is mainly
By in the code of extraction, the readable character string in blacklist is removed.Buildroot is a building insertion in Linux platform
The frame of formula linux system.Entire Buildroot is made of Makefile script and Kconfig configuration file.You can be with
It as compiling linux kernel, is configured by buildroot, menuconfig modification, compiling out one completely can be direct
Run in programming to machine linux system software (comprising in boot, kernel, rootfs and rootfs various libraries and
Application program).Certainly, the filtering that other tools carry out readable character string can also be used, this is not limited by the present invention.
In a specific embodiment, in step s3, the difference degree of corresponding node between two driver trees is calculated
The process of numerical value specifically:
Step s31: according to nodal distance relational expression, the first layers of two driver trees, the second layer and the are successively calculated
Difference degree numerical value in three layers between each corresponding node, and record calculated result;Nodal distance relational expression are as follows:
Wherein,Driver tree is formed by for i-th of firmware;Driver tree is formed by for j-th of firmware;ForWithIn in same position corresponding node v difference degree numerical value;ForInterior joint v's
The set of all child nodes;Wherein,
It is understood that corresponding node refers to driver tree first layer, the second layer and third node layer here.4th
The leaf node of layer is program that practical Internet of Things firmware extracts, third node layer be the denominator of its child nodes (i.e.
It is the denominator of the 4th node layer), the second node layer is the denominator of its child nodes again.It is like classification standard one
Sample, the second node layer and third node layer save different function in Internet of Things firmware, the code dehind of different location to the 4th layer
Point (leaf node).Therefore, not only can be because contents of program have differences between the 4th layer of corresponding node, the second layer, third layer
Node is also discrepant.Because of the difference of the 4th layer of specific procedure, cause third layer node can because its corresponding
The type of the information of four node layers is not present, and the node of corresponding third layer can also be not present.This has resulted in third layer section
The difference of point, thus third also has nodal distance at node;The reason of second layer, is same as above.And what nodal distance relational expression was related to
It is the information in its child node.So the distance of the first-level nodes is calculated according to the information of the second node layer, the second layer
Nodal distance information be by third layer node calculate come, and the distance of third node layer by the 4th node layer calculate from,
Therefore need to calculate the distance of first three node layer.This nodal distance relational expression has used jacard similarity algorithm.Jacard
Similarity is higher, apart from smaller.By calculate two driver trees between first layer, the second layer and each node of third layer it
Between difference degree numerical value, enable the calculated result finally obtained between this two driver trees to contain two as far as possible
Similarity degree information between person between whole readable character strings, so that subsequent according between every two driver trees
When calculated result is clustered, cluster result can improve the similitude of every class firmware program as far as possible, to reduce work people
Member carries out workload when firmware reparation.In addition, for every driver tree, often with an other driver tree
After being calculated, i.e., multiple groups calculated result can be obtained, every group of calculated result includes that the node identification of one group of corresponding node (is used to table
It is bright that currently calculate is node at which position of driver tree), the mark and difference of driver tree locating for it
Off course degree value.Therefore, after all calculating, every driver tree has the calculating knot that multiple groups include its own mark
Fruit.
In an advantageous embodiment, calculated result here can be recorded using label.I.e. each pair of one group of corresponding section
, i.e., can be tagged for driver tree locating for the group node after point calculates, label construction are as follows: < driver tree i,
Driver tree j, corresponding node, difference degree numerical value >, in addition, since calculated result is the meter between two driver trees
It calculates as a result, the label after therefore calculating can be configured on the two driver trees currently calculated respectively, at two
The label being arranged on driver tree is only that the mark sequence of driver tree is different, remaining is identical.Due to of the invention special
The core ideas of benefit is: finding the Internet of Things firmware program with equal modules multiplexing code, submits to after cluster professional on line
Personnel do leak analysis and reparation.This thought is applied in driver tree, that is, from root node to the 4th layer of leaf
The branch of node, whether having the distance of very little in different driver trees, (i.e. similarity is very for difference degree numerical value in other words
It is high).So the structure of label are as follows:<driver tree 1, driver tree 2, corresponding node, difference degree numerical value>, so set
Meter, can just distinguish the different branches from root node to leaf node.The effect of label:
Difference degree numerical value is greater than specific threshold value and (can be adjusted by specific working condition, this patent in this label
In with no restrictions) if, then it is corresponding solid that two driver trees in the label can be found according to the node location in the label
Similar program code between part.Since the structure of entire driver tree can be the journey of different location, different function in firmware
Sequence code is distributed in different leaf nodes.So when the difference degree numerical value of corresponding node in two different driving program trees
When very little (similarity of node is greater than threshold value), so that it may determine in this node, there is program module to be re-used, thus convenient
It is subsequent to be clustered.
Preferably, it in step s4, screens to obtain the maximum top n driver tree of ambient density according to calculated result
Process as cluster centre includes:
Determine whole difference degrees in the calculated result between every driver tree and other whole driver trees
Numerical value;
Counting (can be by specific less than pre-determined distance threshold value in the corresponding whole difference degree numerical value of every driver tree
Working condition adjusts, in the present invention with no restrictions) difference degree numerical value number, around this driver tree
Density number;
Whole driver trees are ranked up according to the sequence of ambient density number from big to small, top n is selected to drive journey
Sequence tree is as cluster centre.
It is understood that every driver tree can include multiple groups calculated result multiple labels in other words, by every group of meter
The difference degree numerical value calculated in result is compared with pre-determined distance threshold value respectively, and it is poor to record the whole that the driver tree includes
Less than the number of the difference degree numerical value of pre-determined distance threshold value, i.e. the driver tree interior joint and other drives in off course degree value
The difference degree numerical value of dynamic program tree interior joint is less than the number of pre-determined distance threshold value.The quantity is higher, shows the driver
Tree is more similar to other driver trees, apart from closer.Therefore, after being ranked up from big to small according to the quantity, sequence
It is more forward, then show that the density of the driver tree around the driver tree is higher, i.e., with the driver tree it is similar its
His number of driver tree is more, therefore, preferentially using the driver tree as cluster centre.And it is close around cluster centre
Degree is big, shows that there are programming reusability phenomenons between the corresponding firmware of cluster centre and most of firmware chosen, namely show
The corresponding firmware of the cluster centre belongs to the similarity degree height between a kind of firmware, and classification accuracy is high.And it is previously mentioned
Difference degree numerical value be less than pre-determined distance threshold value, then show the corresponding code segment of corresponding two nodes of the difference degree numerical value it
Between exist multiplexing phenomenon.This mode can comprehensively consider the similarity of the whole nodes and other firmwares in firmware, so that having
The different Internet of Things firmwares for being multiplexed the same module can be gathered in a cluster, and classification results are more accurate, and relatively existing
Method, that is, the method for using minhash and LSH, the cluster that the method that is mentioned can cluster in the present invention is more, in each cluster
Internet of Things firmware file is less, consequently facilitating staff carries out subsequent firmware analysis reparation, reduces the work of staff
Amount.
In addition, in addition to considering in different trees, in first layer, the second layer and third layer, the distance between corresponding node is (i.e.
Similarity degree), it is also contemplated that in different trees, all total distances for being in same node layer, i.e. layer distance.The viewpoint is
Possible different driving program tree has many places to be all multiplexed the same program module;The position 1 of one driver tree and driving journey
There is the program module of multiplexing in sequence tree 1, and position 2 has the program module of multiplexing with driver tree 2.And program module is
No multiplexing can be judged that layer distance is the similar journey in different driving program tree between respective layer according to layer distance
Degree.
Preferably, each corresponding section in the first layer, the second layer and third layer of two driver trees is successively calculated
Difference degree numerical value between point, and after recording calculated result, further includes:
Step s32: it according to layer distance relation formula, calculates the layer distance of respective layer between every two driver trees and carries out
It saves;
Wherein, layer distance relation formula are as follows:
Wherein,ForWithL layers of layer distance;ForL layers of all nodes collection
It closes;βvFor the corresponding weight of node v.
It is understood that the layer distance of so-called driver tree, exactly calculates the difference degree of all nodes of each layer
The summation of numerical value (i.e. nodal distance).For the node of first layer, the second layer and third layer, nodal distance is bigger, represents it
The jacard similarity of child nodes is lower, then the value that it corresponds to layer distance offer is bigger;On the contrary, the section of a node
For point apart from very little, the jacard similarity for representing its child nodes is higher, this nodal distance to the offer value of layer distance just very
Small (having ignored influence of this nodal distance to layer distance in the present invention).Specific way is by the different nodal point separations in weighted sum
From weight coefficient βvTo influence.In addition, since first layer is root node, the also as number of firmware, therefore itself not generation
The difference of two firmware contents of table.But from the relational expression of nodal distance it is found that each upper corresponding node of two driver trees
The distance between be to be calculated according to the set of its child node.So information of the nodal distance of first layer by the second node layer
It obtains, and the distance of the corresponding node on the second layer is obtained by third node layer, corresponding node distance is by the 4th layer in third layer
Nodal information obtains.The 4th layer of difference for reflecting driver, but its quantized value shows upper layer.
Wherein, βvFor the corresponding weight of node v,ForThe set of all child nodes of interior joint v, w are v's
Father node.
Relational expression according to above-mentioned weight coefficient, so that it may calculate different driving program tree layer apart from when, journey will be present
Influence of the node of sequence multiplexing to layer distance is ignored.Here threshold value still will be arranged according to the case where real work, node
Distance is less than threshold value, shows the case where there are programming reusabilities in the 4th layer of two driver trees of program, computation layer apart from when
0 just is set by the weight coefficient of this nodal distance, therefore there are the layer of two driven by program trees of programming reusability distance meeting very littles.
Thus it will be connected the case where the programming reusability of driver module with layer distance.If without programming reusability situation, different journeys
The layer distance of sequence driving tree will be very big.The programming reusability situation of i.e. two driven by program trees is more, and layer is apart from smaller.As excellent
Selection of land, after step s32, further includes:
Step s33: according to tree distance relation formula, the tree distance between every two driver trees is calculated;Wherein, distance is set
Relational expression are as follows:
Wherein,ForWithBetween tree distance, γ is common ratio;H (φ) is the height of driver tree,
The value of H (φ) is { 1,2,3 };ωlIt is l layers of layers apart from weight coefficient;Wherein:
In addition, in relational expression of the above-mentioned layer apart from weight coefficient, the adjustable ω of γl, therefore, can be according to actual work
The size of γ is selected as situation, bring influence has:
γ=0, then only root node can provide tree distance, ignore other layer distances to the offer amount of tree distance;
0 < γ < 1, then the layer of layer where the father node of a node will be than the layer of layer where this node apart from offer amount
It is big apart from offer amount;
γ=1, then layer provided by all layers (1,2,3) will be identical apart from offer amount;
γ > 1, then the layer of layer where the child node of a node will be greater than the layer of this node place layer apart from offer amount
Apart from offer amount.
Wherein, γ is greater than 1 in principle here, so that the layer distance of low layer is bigger to the contribution amount of tree distance.Cause
For from the structural analysis of tree, every driver tree can all have root node (driver file serial number), so first layer is to tree
The contribution of distance is minimum.The each node of the second layer, when all child nodes corresponding to only the second node layer are all sky, this
Layer structure can just have any different, and otherwise, this layer still depends on influence (the third layer section of its child nodes to the contribution of tree distance
Point).The nodal distance (nodal distance of third layer is practical reflect be the 4th layer of nodal information) of third layer plays tree distance
Conclusive influence is arrived, the nodal distance of this layer is all different, if there is different drives between different driver trees
Nodal distance of the dynamic program tree on this layer is close, illustrates there is programming reusability phenomenon in readable character string.Certainly, the above is only excellent
Scheme is selected, the present invention does not limit the specific value of γ.
It is understood that calculating different driving program by the layer distance of the different node layers using driver tree
The distance (tree distance) of tree.This method is not only allowed in the different nodes (program in different classes of Internet of Things firmware
The readable character string extracted) similarity, it is also contemplated that the semantic similarity of driver tree construction entirety.Accordingly, there exist
The tree distance of two driven by program trees of programming reusability can very little.Thus by the case where the programming reusability of driver module with
Number distance connects.If the tree distance of distinct program driving tree will be very big without programming reusability situation.That is two driven by program
The programming reusability situation of tree is more, sets apart from smaller.And tree distance is compared for layer distance, can more reflect that driver tree is whole
Between similarity degree.Therefore, it is subsequent can be analyzed according to the tree distance being calculated it is whole between each driver tree
Body similarity, so adjust cluster as a result, keeping cluster result more accurate (so that there is the different Internet of Things for being multiplexed the same module
Net firmware can be gathered in a cluster, convenient for the analysis of personnel on line, greatly reduce the workload of personnel on line).Tool
Body method may refer to following embodiment:
Preferably, in step s4, select top n driver tree as cluster centre after, further include it is following in
Hold, step s4 adjusted includes:
Step s41: screen to obtain the maximum top n driver tree of ambient density as in cluster according to calculated result
The heart;
Step s42: judging whether the tree distance between any two cluster centre is greater than default tree distance threshold, if not
It is, using the N+1 current driver tree as cluster centre, and ambient density number in two cluster centres currently judged
A lesser cluster centre is placed in last position of sorting, and repeats the above process later;Until any two cluster centre
Between tree distance be all larger than the default tree distance threshold;
Step s42: being clustered according to obtained N number of cluster centre, obtain several firmware classifications, solid for subsequent foundation
Part classification carries out firmware analysis reparation.
It is understood that although aforementioned cluster apart from this quantization according to layer.It is contemplated that program
The content of the second node layer in driving tree, due to the difference of unused Internet of Things firmware, the second layer between possible difference firmwares
In some node or certain nodes also can be deleted during generating the driver tree of corresponding firmware file.In order to fill
Divide integrally-built semantic (influence of the structure of different levels node to entirely setting) using driver tree.The present invention is being set
Layer has been counted on this quantitative relationship, and has devised the tree distance of different driver trees to quantify different Internet of Things
The difference of firmware file.The structure for making full use of driver tree at all levels is quasi- to improve the classification to different Internet of Things firmwares
True property.
Later after primarily determining cluster centre according to nodal distance, due in order to avoid overlapped between each class
Situation needs to guarantee that each cluster centre should set distance farther out between each other, therefore when counting all Internet of Things firmwares, if
In top n driver tree, if being less than default the case where setting distance threshold there are the tree distance between two cluster centres, table
The two bright cluster centre hypotelorisms, therefore, it is necessary to be adjusted.Since to preferably select ambient density number larger for cluster centre
Driver tree, therefore, in adjustment, the lesser cluster centre of ambient density number is replaced by selection.In addition, every
It after replacing a cluster centre, is required to repeat aforesaid operations to replaced N number of cluster centre again, until N number of poly-
Until the tree distance of class center between any two is all larger than default tree distance threshold.By aforesaid operations, enable to finally obtain
N number of cluster centre ambient density number it is big as far as possible, while between N number of cluster centre tree distance as far as possible, thus
It ensure that the accuracy that cluster centre is chosen.
Wherein, presetting tree distance threshold is the tree distance for two more different driver trees, is according to reality
Border works to determine, therefore can preferably take the expectation of the tree distance of all driver trees;And it is examined for real work
Consider, appropriate can reduce default tree distance threshold, to accelerate the selection process of cluster centre.Certainly, the present invention does not limit pre-
If setting the set-up mode and setting numerical value of distance threshold.
It is noted, of course, that shown in Figure 3, Fig. 3 is only a kind of specific implementation.Since primary Calculation is N number of
Cluster centre is calculated according to nodal distance, thus step 41 only need after step s31 carry out, step s41 and
The sequencing relationship present invention between step s32~s33 is not construed as limiting, and the two can also carry out parallel.That is, can be with
After N number of cluster centre is calculated first, then the operation of step s32~s33 is carried out, executes step s42 and step again later
s43;Or the operation of step s31~s33 can also be first carried out, carry out the operation of step s41~s43 again later;Alternatively, step
S41 and step s32~s33 can be carried out side by side, after being both finished, then execute step s42 and step s43.Specifically adopt
Which kind of it is not construed as limiting with the implementation present invention.
The last stage, after cluster centre determined above, with tree distance as the criterion distance of cluster,
The tree distance for calculating a driver tree Yu K cluster centre, it is minimum with the tree distance of which cluster centre, just by this object
Which kind of networking firmware file is classified as.By the file after classification, professional is given to analyze, the work of professional is reduced with this
It measures.
It certainly, then can be according to the interbed distance or node of each driving tree in not calculating the embodiment by tree distance
The size of distance, to judge which kind of Internet of Things firmware file should be classified as, to complete cluster operation.
The present invention also provides a kind of Internet of Things firmware program sorters, and shown in Figure 4, Fig. 4 provides for the present invention
A kind of Internet of Things firmware program sorter structural schematic diagram.The device includes:
Extraction module 1, for extracting the readable character string in each firmware to be sorted;
Structure tree constructs module 2, for the driver tree according to readable character string building firmware;The root of driver tree
Node is the number of firmware, and the second node layer of driver tree is program part type belonging to readable character string, drives journey
The third node layer of sequence tree is the information type of readable character string;4th node layer is the content of corresponding readable character string;
Distance calculation module 3, for successively calculating corresponding between every two driver trees in whole driver trees save
The difference degree numerical value of point, and record calculated result;Calculated result includes the mark of two driver trees calculated, is counted
The mark and its difference degree numerical value of the corresponding node of calculation;
Cluster module 4, for screening to obtain the maximum top n driver tree of ambient density as poly- according to calculated result
Class center is clustered, several firmware classifications are obtained, and carries out firmware analysis reparation according to firmware classification for subsequent;N is positive whole
Number.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
Specific work process, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
Above several specific embodiments are only the preferred embodiment of the present invention, and above several specific embodiments can be with
Any combination, the embodiment obtained after combination is also within protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, relevant speciality technical staff deduced out in the case where not departing from spirit of that invention and concept thereof other change
Into and variation, should all be included in the protection scope of the present invention.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Claims (8)
1. a kind of Internet of Things firmware program classification method characterized by comprising
Extract the readable character string in each firmware to be sorted;
The driver tree of the firmware is constructed according to the readable character string;The root node of the driver tree is described solid
The number of part, the second node layer of the driver tree are program part type belonging to readable character string, the driving journey
The third node layer of sequence tree is the information type of readable character string;4th node layer is the content of corresponding readable character string;
The difference degree numerical value of corresponding node between every two driver trees in whole driver trees is successively calculated, and is recorded
Calculated result;The calculated result includes the mark of two driver trees calculated, the mark of corresponding node calculated
And its difference degree numerical value;
It screens to obtain the maximum top n driver tree of ambient density as cluster centre according to the calculated result and be gathered
Class obtains several firmware classifications, carries out firmware analysis reparation according to the firmware classification for subsequent;N is positive integer.
2. the method according to claim 1, wherein the readable character extracted in each firmware to be sorted
After string, before the driver tree that the firmware is constructed according to the readable character string, further includes:
Judge whether the readable character string is readable character string relevant to platform or readable character string relevant with chained library,
If so, the part readable character string is deleted, if it is not, continue to judge next extracted readable character string, until extract
Whole readable character string judgements finish;
Correspondingly, the readable character string after subsequent foundation judgement constructs the driver tree of the firmware;
Wherein, judge the readable character string whether be readable character string relevant to platform process are as follows:
Judge whether the obtained information quantity of the readable character string is greater than preset platform dependent thresholds, if so, described readable
Character string is readable character string relevant to platform, and otherwise, the readable character string is not readable character string relevant to platform;
The obtained information quantity of the readable character string specifically:
Wherein, IG (s) is obtained information quantity;CiFor i-th of target platform;P(Ci) it is target platform CiIn binary file account for
The ratio of total binary file;P (s) is the ratio of the total binary file of the binary file Zhan containing readable character string s;
P(s,Ci) it is target platform CiIt and include the ratio of the total binary file of binary file Zhan of readable character string s.
3. the method according to claim 1, wherein described calculate corresponding node between two driver trees
The process of difference degree numerical value specifically:
According to nodal distance relational expression, it is each right in the first layer, the second layer and third layer of two driver trees successively to calculate
Answer the difference degree numerical value between node;
The nodal distance relational expression are as follows:
Wherein,Driver tree is formed by for i-th of firmware;Driver tree is formed by for j-th of firmware;ForWithIn in same position corresponding node v difference degree numerical value;ForInterior joint v's
The set of all child nodes;Wherein,
4. according to the method described in claim 3, it is characterized in that, described screen to obtain ambient density according to the calculated result
Maximum top n driver tree includes: as the process of cluster centre
Determine whole difference degrees in the calculated result between every driver tree and other whole driver trees
Numerical value;
Count the difference degree for being less than pre-determined distance threshold value in the corresponding whole difference degree numerical value of every driver tree
The number of numerical value, the ambient density number as this driver tree;
All driver trees are ranked up according to the sequence of ambient density number from big to small, top n is selected to drive journey
Sequence tree is as cluster centre.
5. according to the method described in claim 4, it is characterized in that, it is described successively calculate two driver trees first layer,
Difference degree numerical value in the second layer and third layer between each corresponding node, and after recording calculated result, further includes:
According to layer distance relation formula, calculates the layer distance of respective layer between every two driver trees and saved;
Wherein, the layer distance relation formula are as follows:
Wherein,ForWithL layers of layer distance;ForL layers of all nodes set;
Wherein, βvFor the corresponding weight of node v,ForThe set of all child nodes of interior joint v, the father that w is v save
Point.
6. according to the method described in claim 5, it is characterized in that, described according to layer distance relation formula, every two drivings of calculating
Between program tree after the layer distance of respective layer, further includes:
According to tree distance relation formula, the tree distance between every two driver trees is calculated;The nodal distance and it is described tree away from
From for the difference degree numerical value;Wherein, the tree distance relation formula are as follows:
Wherein,ForWithBetween tree distance, γ is common ratio;H (φ) is the height of driver tree, H (φ)
Value be { 1,2,3 };ωlIt is l layers of layers apart from weight coefficient;Wherein:
7. according to the method described in claim 6, it is characterized in that, described select top n driver tree as cluster centre
Later, further includes:
Judge whether the tree distance between any two cluster centre is greater than default tree distance threshold, if it is not, by current N
+ 1 driver tree is as cluster centre, and the lesser cluster of ambient density number in two cluster centres currently judged
Center is placed in last position of sorting, and repeats the above process later;Until the tree distance between any two cluster centre is
Greater than the default tree distance threshold.
8. a kind of Internet of Things firmware program sorter characterized by comprising
Extraction module, for extracting the readable character string in each firmware to be sorted;
Structure tree constructs module, for constructing the driver tree of the firmware according to the readable character string;The driving journey
The root node of sequence tree is the number of the firmware, and the second node layer of the driver tree is program belonging to readable character string
Some types, the third node layer of the driver tree are the information type of readable character string;4th node layer is corresponding
The content of readable character string;
Distance calculation module, for successively calculating the difference of corresponding node between every two driver trees in whole driver trees
Off course degree value, and record calculated result;The calculated result includes the mark of two driver trees calculated, is calculated
Corresponding node mark and its difference degree numerical value;
Cluster module, for screening to obtain the maximum top n driver tree of ambient density as poly- according to the calculated result
Class center is clustered, several firmware classifications are obtained, and carries out firmware analysis reparation according to the firmware classification for subsequent;N is
Positive integer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910098931.2A CN109816038B (en) | 2019-01-31 | 2019-01-31 | Internet of things firmware program classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910098931.2A CN109816038B (en) | 2019-01-31 | 2019-01-31 | Internet of things firmware program classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109816038A true CN109816038A (en) | 2019-05-28 |
CN109816038B CN109816038B (en) | 2022-07-29 |
Family
ID=66606193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910098931.2A Active CN109816038B (en) | 2019-01-31 | 2019-01-31 | Internet of things firmware program classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109816038B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837642A (en) * | 2019-11-14 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Malicious program classification method, device, equipment and storage medium |
CN111507400A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Application classification method and device, electronic equipment and storage medium |
CN117574457A (en) * | 2024-01-15 | 2024-02-20 | 深圳欧税通技术有限公司 | Data security storage method and system suitable for cross-border payment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729580A (en) * | 2014-01-27 | 2014-04-16 | 国家电网公司 | Method and device for detecting software plagiarism |
CN104537279A (en) * | 2014-12-22 | 2015-04-22 | 中国科学院深圳先进技术研究院 | Sequence clustering method and device |
CN105975392A (en) * | 2016-04-29 | 2016-09-28 | 国家计算机网络与信息安全管理中心 | Duplicated code detection method and device based on abstract syntax tree |
CN106599686A (en) * | 2016-10-12 | 2017-04-26 | 四川大学 | Malware clustering method based on TLSH character representation |
-
2019
- 2019-01-31 CN CN201910098931.2A patent/CN109816038B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729580A (en) * | 2014-01-27 | 2014-04-16 | 国家电网公司 | Method and device for detecting software plagiarism |
CN104537279A (en) * | 2014-12-22 | 2015-04-22 | 中国科学院深圳先进技术研究院 | Sequence clustering method and device |
CN105975392A (en) * | 2016-04-29 | 2016-09-28 | 国家计算机网络与信息安全管理中心 | Duplicated code detection method and device based on abstract syntax tree |
CN106599686A (en) * | 2016-10-12 | 2017-04-26 | 四川大学 | Malware clustering method based on TLSH character representation |
Non-Patent Citations (3)
Title |
---|
YU Q: "A feature selection approach based on a similarity measure for software defect prediction", 《FRONTIERS OF INFORNATION TECHNOLOGY & ELECTRONIC》 * |
于巧: "基于机器学习的软件缺陷预测方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
傅艺绮: "基于机器学习的软件缺陷预测方法与工具", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837642A (en) * | 2019-11-14 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Malicious program classification method, device, equipment and storage medium |
CN110837642B (en) * | 2019-11-14 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Malicious program classification method, device, equipment and storage medium |
CN111507400A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Application classification method and device, electronic equipment and storage medium |
CN111507400B (en) * | 2020-04-16 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Application classification method, device, electronic equipment and storage medium |
CN117574457A (en) * | 2024-01-15 | 2024-02-20 | 深圳欧税通技术有限公司 | Data security storage method and system suitable for cross-border payment |
Also Published As
Publication number | Publication date |
---|---|
CN109816038B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11221832B2 (en) | Pruning engine | |
US11797298B2 (en) | Automating identification of code snippets for library suggestion models | |
US11354225B2 (en) | Automating identification of test cases for library suggestion models | |
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
CN109697162B (en) | Software defect automatic detection method based on open source code library | |
US11494181B2 (en) | Automating generation of library suggestion engine models | |
Gil et al. | Micro patterns in Java code | |
US20170192758A1 (en) | Method and apparatus for migration of application source code | |
US8875110B2 (en) | Code inspection executing system for performing a code inspection of ABAP source codes | |
CN111459799B (en) | Software defect detection model establishing and detecting method and system based on Github | |
de Freitas Farias et al. | A contextualized vocabulary model for identifying technical debt on code comments | |
WO2019075390A1 (en) | Blackbox matching engine | |
CN109948345A (en) | A kind of method, the system of intelligence contract Hole Detection | |
US20080320054A1 (en) | Database and Software Conversion System and Method | |
CN109816038A (en) | A kind of Internet of Things firmware program classification method and its device | |
CN109241104B (en) | AISQL resolver in decision-making distributed database system and implementation method thereof | |
US20200226232A1 (en) | Method of selecting software files | |
Nichols et al. | Syntax-based improvements to plagiarism detectors and their evaluations | |
CN108897572B (en) | Complex type reconstruction method based on variable association tree | |
Flanagan | Effective Static Debugging | |
CN112131120A (en) | Source code defect detection method and device | |
CN113536308B (en) | Binary code tracing method for multi-granularity information fusion under software gene view angle | |
CN113722421B (en) | Contract auditing method and system and computer readable storage medium | |
CN110928535A (en) | Derivative variable deployment method, device, equipment and readable storage medium | |
Chauhan et al. | Vulnerability Detection in Source Code using Deep Representation Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |