CN112965724A - Method and system for determining loading base address range of firmware - Google Patents

Method and system for determining loading base address range of firmware Download PDF

Info

Publication number
CN112965724A
CN112965724A CN202110300970.3A CN202110300970A CN112965724A CN 112965724 A CN112965724 A CN 112965724A CN 202110300970 A CN202110300970 A CN 202110300970A CN 112965724 A CN112965724 A CN 112965724A
Authority
CN
China
Prior art keywords
address
packet
target
cumulative distribution
distribution frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110300970.3A
Other languages
Chinese (zh)
Other versions
CN112965724B (en
Inventor
朱瑞瑾
张宝峰
毛军捷
谭毓安
高金萍
许源
熊琦
贾炜
孙亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Information Technology Security Evaluation Center
Original Assignee
China Information Technology Security Evaluation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Information Technology Security Evaluation Center filed Critical China Information Technology Security Evaluation Center
Priority to CN202110300970.3A priority Critical patent/CN112965724B/en
Publication of CN112965724A publication Critical patent/CN112965724A/en
Application granted granted Critical
Publication of CN112965724B publication Critical patent/CN112965724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method and a system for determining a loading base address range of firmware, wherein the method comprises the following steps: adding an absolute address loaded by each target instruction in the target firmware to the first address multiple set, and performing deduplication processing and sequencing processing on the absolute address to obtain a second address set; dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address of the second address set, counting the cumulative distribution frequency of the number of the absolute addresses of each group, constructing a cumulative distribution frequency graph according to the cumulative distribution frequency graph, determining a first target address group with the largest group slope in the cumulative distribution frequency graph, and determining the next address group as a second target address group; and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, the efficiency of determining the loading base address range is improved, and the accuracy of determining the loading base address range is ensured.

Description

Method and system for determining loading base address range of firmware
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a system for determining a loading base address range of firmware.
Background
In order to ensure the safety of the embedded system, the safety evaluation of the firmware of the embedded system needs to be performed by utilizing a disassembling mode, but the loading base address of the firmware cannot be directly obtained by disassembling the firmware.
Currently, the way to determine the loading base address of the firmware is as follows: a reverse engineer scans the firmware by using a specific tool, determines the loading base address range of the firmware according to own experience and intuition, and then positions the firmware to obtain the loading base address. However, on one hand, the reverse engineer needs to spend a lot of time processing the firmware to determine the loading base address range, which is inefficient, and on the other hand, the determination of the loading base address range depends on the experience and intuition of the reverse engineer, which cannot ensure the accuracy of the determined loading base address range.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for determining a loading base address range of a firmware, so as to solve the problems of low efficiency and incapability of ensuring accuracy in the current method for determining a loading base address range.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiment of the present invention discloses a method for determining a loading base address range of a firmware, where the method includes:
searching target instructions used for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction to a preset first address multiple set;
carrying out deduplication processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set;
dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, wherein the plurality of address groups are ordered according to the corresponding group value range;
counting the cumulative distribution frequency of the number of absolute addresses contained in each address packet;
constructing a cumulative distribution frequency graph according to the grouping value range of each address grouping and the cumulative distribution frequency, wherein the abscissa of the cumulative distribution frequency graph is the central point of the grouping value range of each address grouping, and the ordinate of the cumulative distribution frequency graph is the cumulative distribution frequency;
calculating the grouping slope of each address grouping in the cumulative distribution frequency graph based on the cumulative distribution frequency and the grouping value range corresponding to each address grouping;
determining the address packet with the largest packet slope as a first target address packet and determining the address packet next to the first target address packet as a second target address packet;
and determining that the grouping value range of the first target address grouping or the second target address grouping is the loading base address range of the target firmware.
Preferably, the calculating a packet slope of each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and the packet value range corresponding to each address packet includes:
using Gi=(Pi+1-Pi)/(Ci+1-Ci) Calculating a packet slope, G, for each of the address packets in the cumulative distribution frequency mapiFor the packet slope of the ith address packet, PiIs as followsCumulative distribution frequency, C, of the number of absolute addresses contained in i address packetsiThe center point of the value range for the packet of the ith address packet.
Preferably, the counting the cumulative distribution frequency of the number of absolute addresses included in each address packet includes:
for each address packet, using the number of absolute addresses contained in said address packet, in combination with Li=Si+Li-1Determining a cumulative number of absolute addresses, L, of said address packetsiFor the cumulative number of absolute addresses of the ith address packet, SiNumber of absolute addresses contained for the ith address packet, L0Is 0;
for each address packet, using the cumulative number of absolute addresses of said address packet, in combination with Pi=LiK, determining the cumulative distribution frequency, P, of the number of absolute addresses contained in said address packetiAnd K is the cumulative distribution frequency of the number of absolute addresses contained in the ith address group, and the total number of the absolute addresses contained in the second address set.
Preferably, the dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address included in the second address set includes:
determining the number V of the groups of the second address set by using V ═ max/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware;
and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.
Preferably, the searching for a target instruction in the target firmware for loading an absolute address and adding the absolute address loaded by each target instruction to a preset first address multiple set includes:
and searching LDR instructions in target firmware, and adding the absolute address loaded by each LDR instruction to a preset first address multiple set.
Preferably, after determining the loading base address range of the target firmware, the method further includes:
and determining the loading base address of the target firmware by using the loading base address range.
A second aspect of the present invention discloses a system for determining a loading base address range of firmware, including:
the searching unit is used for searching target instructions used for loading absolute addresses in target firmware and adding the absolute addresses loaded by each target instruction to a preset first address multiple set;
the processing unit is used for carrying out duplication removal processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set;
the dividing unit is used for dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, and the plurality of address groups are ordered according to the corresponding group value range;
a counting unit, configured to count a cumulative distribution frequency of the number of absolute addresses included in each address packet;
a building unit, configured to build a cumulative distribution frequency map according to the packet value range of each address packet and the cumulative distribution frequency, where an abscissa of the cumulative distribution frequency map is a central point of the packet value range of each address packet, and a ordinate of the cumulative distribution frequency map is the cumulative distribution frequency;
a calculating unit, configured to calculate a packet slope of each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and a packet value range corresponding to each address packet;
a first determination unit configured to determine that the address packet having the largest packet slope is a first target address packet and determine that the address packet next to the first target address packet is a second target address packet;
a second determining unit, configured to determine that a packet value range of the first target address packet or the second target address packet is a loading base address range of the target firmware.
Preferably, the computing unit is specifically configured to: using Gi=(Pi+1-Pi)/(Ci+1-Ci) Calculating a packet slope, G, for each of the address packets in the cumulative distribution frequency mapiFor the packet slope of the ith address packet, PiCumulative distribution frequency, C, of the number of absolute addresses contained in the ith address packetiThe center point of the value range for the packet of the ith address packet.
Preferably, the statistical unit is specifically configured to: for each address packet, using the number of absolute addresses contained in said address packet, in combination with Li=Si+Li-1Determining a cumulative number of absolute addresses, L, of said address packetsiFor the cumulative number of absolute addresses of the ith address packet, SiNumber of absolute addresses contained for the ith address packet, L0Is 0; for each address packet, using the cumulative number of absolute addresses of said address packet, in combination with Pi=LiK, determining the cumulative distribution frequency, P, of the number of absolute addresses contained in said address packetiAnd K is the cumulative distribution frequency of the number of absolute addresses contained in the ith address group, and the total number of the absolute addresses contained in the second address set.
Preferably, the dividing unit is specifically configured to: determining the number V of the groups of the second address set by using V ═ max/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware; and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.
Based on the method and the system for determining the loading base address range of the firmware provided by the embodiment of the invention, the method comprises the following steps: searching target instructions used for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction to a preset first address multiple set; carrying out deduplication processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set; dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set; counting the cumulative distribution frequency of the number of absolute addresses contained in each address packet; constructing a cumulative distribution frequency chart according to the grouping value range and the cumulative distribution frequency of each address group; calculating the packet slope of each address packet in the cumulative distribution frequency map, determining the address packet with the maximum packet slope as a first target address packet, and determining the next address packet of the first target address packet as a second target address packet; and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. In the scheme, the second address set is obtained by performing deduplication processing and sequencing processing on the absolute address loaded by the target instruction in the target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute number of addresses of each address packet is calculated. The method comprises the steps of constructing a cumulative distribution frequency graph by utilizing a grouping value range and a cumulative distribution frequency of each address grouping, calculating a grouping slope of each address grouping in the cumulative distribution frequency graph, determining a first target address grouping with the largest grouping slope, and determining the next address grouping of the first target address grouping as a second target address grouping. And taking the grouping value range of the first target address grouping or the second target address grouping as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, the efficiency of determining the loading base address range is improved, and the accuracy of determining the loading base address range is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating firmware loading according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for determining a loading base address range of firmware according to an embodiment of the present invention;
FIG. 3 is a flowchart of a search target instruction according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a machine code format of an LDR instruction according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a cumulative distribution frequency plot provided in accordance with an embodiment of the present invention;
fig. 6 is a block diagram of a system for determining a loading base address range of firmware according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, currently, when determining the loading base address of the firmware, the method mainly depends on the experience and intuition of the reverse engineer itself to determine the loading base address range of the firmware, and then locate the loading base address. However, on one hand, this method requires a lot of time, which results in low efficiency of determining the loading base address range, and on the other hand, whether the determined loading base address range is accurate depends on the experience and intuition of the reverse engineer, and accuracy cannot be guaranteed.
Therefore, an embodiment of the present invention provides a method and a system for determining a loading base address range of firmware, where a second address set is obtained by performing deduplication processing and sorting processing on an absolute address loaded by a target instruction in target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute number of addresses of each address packet is calculated. The method comprises the steps of constructing a cumulative distribution frequency graph by utilizing a grouping value range and a cumulative distribution frequency of each address grouping, calculating a grouping slope of each address grouping in the cumulative distribution frequency graph, determining a first target address grouping with the largest grouping slope, and determining the next address grouping of the first target address grouping as a second target address grouping. And taking the grouping value range of the first target address grouping or the second target address grouping as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, so that the efficiency of determining the loading base address range is improved, and the determination accuracy of the loading base address range is ensured.
It should be noted that the loading base address according to the embodiment of the present invention is a starting position of firmware mapped to an embedded system memory, as shown in the firmware loading diagram shown in fig. 1, which shows a schematic diagram of firmware mapped to an embedded system memory.
It should be further noted that, as a result of research by the inventors, in target firmware (such as ARM firmware), when an absolute address is operated on, the absolute address is usually loaded into a register by using a target instruction (such as an LDR instruction), and the loaded absolute address points to a memory-mapped range of the target firmware. Therefore, the loading base address range of the target firmware can be determined by the following method for determining the loading base address range of the firmware according to the embodiment of the present invention, which is described in detail in the following embodiments.
Referring to fig. 2, a flowchart of a method for determining a loading base address range of firmware according to an embodiment of the present invention is shown, where the method for determining the loading base address range of firmware includes:
step S201: target instructions used for loading absolute addresses in the target firmware are searched, and the absolute addresses loaded by the target instructions are added to a preset first address multiple set.
It should be noted that the target instruction is used to load an absolute address, such as: LDR instructions are used to load an absolute address, typically an absolute address of a string address, a function address, or a structure address, etc.
It can be understood that there are a large number of instructions in the target firmware, so that a target instruction for loading an absolute address needs to be searched from all instructions of the target firmware, in the process of the specific implementation step S201, whether each instruction is a target instruction is determined one by one from the start position of the target firmware, when a target instruction is searched, the absolute address loaded by the target instruction is calculated, and the calculated absolute address of the target instruction is added to the first address multiple set. In the above manner, each target instruction of the target firmware is obtained by searching, and the absolute comparison loaded by each target instruction is added to the first address multi-set.
In a specific implementation, LDR instructions in target firmware are searched, and an absolute address loaded by each LDR instruction is added to a preset first address multiple set, where the target firmware may be ARM firmware or other types of firmware, and the target instruction may be an LDR instruction or other instructions capable of being used to load an absolute address, and the target firmware and the target instruction are not specifically limited herein.
To better explain how to find the target instruction in the target firmware, the description is made by the contents shown in fig. 3, and it should be noted that fig. 3 is only used for illustration.
Referring to fig. 3, a flowchart of searching for a target instruction according to an embodiment of the present invention is shown, where the target firmware is an ARM firmware and the target instruction is an LDR instruction, as an example, the method includes the following steps:
step S301: and judging whether the current instruction of the ARM firmware is an LDR instruction. If yes, go to step S301, otherwise go to step S304.
In the process of implementing step S301, it is determined from the start position of the ARM firmware whether each instruction is an LDR instruction one by one, where the current instruction is an instruction currently used for determining the content. If the current instruction is the LDR instruction, step S302 is executed, and if the current instruction is not the LDR instruction, step S304 is executed.
Step S302: the absolute address loaded by the LDR instruction is calculated.
Step S303: storing the absolute address loaded by the LDR instruction into the first address multi-set.
Step S304: whether the current instruction is the last instruction of the ARM firmware. If the current instruction is the last instruction of the ARM firmware, the process is ended, and if the current instruction is not the last instruction of the ARM firmware, the next instruction is skipped to and the step S301 is executed.
It can be understood that, when calculating the absolute address of the target instruction, the absolute address loaded by the target instruction is calculated by using the machine code of the target instruction, and to better explain how to calculate the absolute address loaded by the target instruction, taking the target firmware as ARM firmware and the target instruction as LDR instruction as an example, the following description is made by referring to the schematic diagram of the machine code format of the LDR instruction shown in fig. 4.
Machine code format of LDR instruction loading immediate to register in ARM state as shown in fig. 4, assuming: in the ARM state, the LDR instruction loads a character string 'aSystem', the memory address of the LDR instruction is 0x20004864, and the machine code of the LDR instruction is 'E59F 00B 4'.
By analyzing the machine code format and machine code of the LDR instruction, Rd (0000) is obtained2=R0,imm12=(000010110100)20xB 4. Address is found for an ARM state LDR instruction (PC)&0xFFFFFFFC)+imm12。
Because the ARM processor adopts 3-stage pipeline technology, in the ARM state, the value of PC is PC +8, so the address sought by the LDR instruction is as in formula (1).
Figure BDA0002986259580000081
The search address of the LDR instruction is 0x20004920 according to the above formula (1), and assuming that 4 bytes starting at the memory address 0x20004920 are (E0350020), and the ARM firmware in this example is small-end storage, the absolute address loaded by the LDR instruction is 0x200035E0, and the content stored at the address 0x200035E0 is the actual content of the above character string "as system".
Step S202: and carrying out deduplication processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set.
As can be seen from the above step S201, the absolute addresses loaded by all target instructions of the target firmware are stored in the first address multiple set. That is, the first address multi-set stores all absolute addresses in the target firmware, where duplicate absolute addresses may occur.
In the process of implementing step S202 specifically, deduplication processing is performed on absolute addresses included in the first address multiple set, only one absolute address is reserved for repeated absolute addresses, and sorting processing (sorting according to the size of the absolute addresses) is performed on the absolute addresses of the first address multiple set after deduplication processing is performed, so as to obtain a second address set.
It will be appreciated that each element in the first address manifold and the second address set is an absolute address.
To better explain how to perform the deduplication processing and the sorting processing on the first address multi-set, the following processes a1 through A3 are explained.
Assume that the first multiple set of addresses is M, which contains the absolute addresses loaded by all target instructions in the target firmware.
A1, set N is set to be empty.
For each absolute address in the first address multi-set MmiProcess a2 is performed in a loop.
A2, absolute address miNot belonging to set N, will absolute address miAdd to set N.
It should be noted that the process a2 is used to remove the repeated absolute addresses in the first address multi-set M.
A3, after the process A2 is executed on all absolute addresses in the first address multiple set M, the absolute addresses in N are sorted according to the size of the absolute addresses, and a second address set is obtained.
Step S203: and dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set.
It should be noted that each address group has a corresponding group range, the group range is obtained by dividing based on the value range of the absolute address included in the second address set, and the plurality of address groups are ordered according to their corresponding group value ranges.
As can be known from the content of the foregoing step S202, the absolute addresses in the second address set are the absolute addresses that have been subjected to deduplication and sorting, so that the absolute addresses of the second address set have corresponding value ranges, the value range of the absolute addresses of the second address set is (min, max), max is the maximum value among the absolute addresses included in the second address set, and min is the minimum value among the absolute addresses included in the second address set.
In the process of implementing step S203 specifically, the value range of the absolute address included in the second address set is divided into a plurality of sub-ranges according to the file size of the target firmware, and each sub-range is a group value range of one address group.
In a specific implementation, the grouping number V of the second address set is determined by using formula (2), in formula (2), max is a maximum value of absolute addresses included in the second address set, min is a minimum value of absolute addresses included in the second address set, and filesize is a file size of the target firmware.
V=(max-min)/filesize (2)
And according to the grouping number V, grouping the absolute addresses contained in the second address set by taking the file size of the target firmware as the length of a grouping value range to obtain V address groups, wherein the value of the absolute address in each address group is in the grouping value range of the address group.
Such as: assuming that a second address set N is {0, 1, 2, 3, 5, 11, 23, 44, 55, 67, 79, 99, 100}, which has 13 elements (absolute addresses) in total, where the absolute addresses of the second address set N have a value range of (min is 0, max is 100), it is assumed that the file size of the target firmware is 20. The number of packets for the second address set N can be calculated as (100-0)/20-5 using equation (2).
Dividing the absolute address of the second address set N into 5 groups by taking the length of the group value range as 20 (the file size of the target firmware), so as to obtain 5 groups of address groups, wherein the group value range and the contained absolute address of each group of address groups are shown in table 1.
Table 1:
grouped value ranges Absolute address contained
[0、20] 0、1、2、3、5、11
[21、40] 23
[41、60] 44、55
[61、80] 67、79
[81、100] 99、100
As can be seen from table 1, after the second address set is divided into groups, the value of the absolute address included in each group of address groups is within the group value range of the address group.
Step S204: and counting the cumulative distribution frequency of the number of absolute addresses contained in each address packet.
In the process of implementing step S204, the cumulative number of absolute addresses included in each address packet is determined, and then the cumulative distribution frequency of the number of absolute addresses included in each address packet is calculated by using the cumulative number of absolute addresses included in each address packet and the total number of absolute addresses included in the second address set.
The specific implementation mode is as follows: determining the accumulated number of the absolute addresses of the address packet by utilizing the number of the absolute addresses contained in the address packet and combining a formula (3) for each address packet, wherein in the formula (3), LiFor the cumulative number of absolute addresses of the ith address packet, SiNumber of absolute addresses contained for the ith address packet, L0Is 0.
Li=Si+Li-1 (3)
The cumulative number of absolute addresses of each address packet is calculated by formula (3).
Determining the cumulative distribution frequency of the absolute address number contained in the address packet by using the cumulative number of the absolute addresses of the address packet in combination with formula (4) for each address packet, wherein in formula (4), PiThe cumulative distribution frequency of the absolute address number contained in the ith address group is K, and the total number of the absolute addresses contained in the second address set is K.
Pi=Li/K (4)
Such as: with reference to table 1 and its corresponding example in step S203, how to calculate the cumulative number of absolute addresses of an address group and the cumulative distribution frequency of the absolute address number is explained with reference to formula (3) and formula (4) with reference to the content shown in table 2, where the total number of absolute addresses included in the second address set N is 13.
Table 2:
Figure BDA0002986259580000111
it should be noted that the contents shown in table 1 and table 2 are only for illustration.
Step S205: and constructing a cumulative distribution frequency graph according to the grouping value range and the cumulative distribution frequency of each address grouping.
In the process of specifically implementing step S205, a cumulative distribution frequency map is constructed by using the central point of the packet value range of each address packet as an abscissa value and using the cumulative distribution frequency of the number of absolute addresses included in each address packet as an ordinate value.
That is, the abscissa of the cumulative distribution frequency map is the center point of the packet value range of each address packet, and the ordinate of the cumulative distribution frequency map is the cumulative distribution frequency.
Step S206: and calculating the packet slope of each address packet in the cumulative distribution frequency graph based on the cumulative distribution frequency and the packet value range corresponding to each address packet.
In the process of implementing step S206 specifically, the packet slope of each address packet in the constructed cumulative distribution frequency map is calculated using formula (5), where G is formula (5)iFor the packet slope of the ith address packet, PiCumulative distribution frequency, C, of the number of absolute addresses contained in the ith address packetiThe center point of the value range for the packet of the ith address packet.
Gi=(Pi+1-Pi)/(Ci+1-Ci) (5)
Step S207: the address packet with the largest packet slope is determined as a first destination address packet, and the next address packet of the first destination address packet is determined as a second destination address packet.
In the process of implementing step S207 specifically, after the packet slope of each address packet is calculated, the address packet with the largest packet slope is selected as the first destination address packet.
As can be seen from the content of step S203, the plurality of address groups are sorted according to their corresponding group value ranges, and the next address group of the first destination address group is used as the second destination address group.
Step S208: and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware.
In the process of implementing step S208 specifically, the packet value range of the first destination address packet (with the maximum packet slope) or the second destination address packet is used as the loading base address range of the destination firmware.
It can be understood that a corresponding selection condition is preset, and when the first target address packet meets the selection condition, the packet value range of the first target address packet is taken as the loading base address range of the target firmware. And when the first target address grouping does not meet the selection condition, taking the grouping value range of the second target address grouping as the loading base address range of the target firmware.
Such as: and setting the selection condition as a condition corresponding to the normal condition, and taking the packet value range of the first target address packet as the loading base address range of the target firmware when the first target address packet meets the selection condition, namely under the normal condition. When the first target address grouping does not meet the selection condition, namely under special conditions, the grouping value range of the second target address grouping is used as the loading base address range of the target firmware.
For another example: assume that the packet span of each address packet is 10 in length and that the packet span of the first destination address packet (with the largest packet slope) is [21,30] and the packet span of the second destination address packet is [31,40 ]. Under normal conditions, the packet value range [21,30] of the first target address packet is the loading address value range of the target firmware. Correspondingly, under special conditions, the packet value range [31,40] of the second target address packet is the load address value range of the target firmware.
Preferably, after determining the loading base address range of the target firmware, the loading base address range is used in combination with a specified loading base address determination method to determine the loading base address of the target firmware.
Preferably, after determining the loading base address of the target firmware, the correctness of the determined loading base address can be verified by using a corresponding disassembling tool.
Such as: assuming that the loading base address of the determined target firmware is 0xC3421000, the file of the target firmware is loaded by IDA Pro (some disassembly tool), the processor type corresponding to the target firmware is set, and the loading base address is 0xC 3421000. And when the acquired data meets the preset inspection requirement, determining that the loading base address of the target firmware determined through the steps is correct. The preset inspection requirements are as follows: the disassembled code structure is clear and meaningful, the character string cross reference matching is determined to be correct, the structure of the binary function is complete, and the binary function has a matched preamble and tail sound.
In the embodiment of the present invention, the second address set is obtained by performing deduplication processing and sorting processing on an absolute address loaded by a target instruction in target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute number of addresses of each address packet is calculated. The method comprises the steps of constructing a cumulative distribution frequency graph by utilizing a grouping value range and a cumulative distribution frequency of each address grouping, calculating a grouping slope of each address grouping in the cumulative distribution frequency graph, determining a first target address grouping with the largest grouping slope, and determining the next address grouping of the first target address grouping as a second target address grouping. And taking the grouping value range of the first target address grouping or the second target address grouping as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, the efficiency of determining the loading base address range is improved, and the accuracy of determining the loading base address range is ensured.
To better illustrate how the load base address range of the target firmware is determined, the contents shown by procedures B1 through B6 are illustrated.
B1, assume that 28191 LDR instructions were recognized from the target firmware, i.e., the first address multi-set contains the absolute address of the 28191 LDR instruction load.
B2, after deduplication processing and sorting processing are carried out on 28191 absolute addresses in the first address multiple set, a second address set containing 12656 absolute addresses is obtained, wherein all the absolute addresses in the second address set are sorted.
B3, assuming that the file size of the target firmware is 5679356 bytes (5.41MB), that is, the length of the packet value range of each address packet is 5679356, and assuming that the minimum value of the absolute address in the second address set is 1 and the maximum value is 4294967295, in combination with the above equation (2), the number of packets of the second address set is: (4294967295-1)/5679356 ═ 756.2 (rounded up), i.e., the second set of addresses is divided into 757 address packets.
B4, dividing 12656 absolute addresses of the second address set into 757 address groups, and combining the above formula (3) and formula (4), counting the cumulative number of absolute addresses of each address group, and calculating the cumulative distribution frequency of the number of absolute addresses of each address group according to the cumulative number of each address group.
B5, constructing a cumulative distribution frequency graph according to the grouping value range and the cumulative distribution frequency of each address grouping, wherein the cumulative distribution frequency graph is a schematic diagram of the cumulative distribution frequency graph shown in fig. 5.
B6, calculating the packet slope of each address packet in the cumulative distribution frequency chart shown in fig. 5, determining a first destination address packet with the largest packet slope, and determining that the next address packet of the first destination address packet is a second destination address packet, the first destination address packet and the second destination address packet being as the parts circled by ellipses in fig. 5, wherein the packet value range of the first destination address packet is [533859465,539538821], and the packet value range of the second destination address packet is [539538821,545218177 ].
Assuming that the first destination address packet meets the preset selection condition, the packet value range of the first destination address packet is taken as the loading base address range of the destination firmware, and in the specific implementation, [533859465,539538821] is converted into hexadecimal to obtain [0x1FD20C89,0x2028B585], and the loading base address range of the destination firmware is [0x1FD20C89,0x2028B585 ].
Note that "30%" in fig. 5 (533859465,539538821, 30%) is a cumulative distribution frequency.
Corresponding to the method for determining the loading base address range of the firmware provided by the embodiment of the present invention, referring to fig. 6, the embodiment of the present invention further provides a structural block diagram of a system for determining the loading base address range of the firmware, where the system for determining the loading base address range of the firmware includes: a searching unit 601, a processing unit 602, a dividing unit 603, a counting unit 604, a constructing unit 605, a calculating unit 606, a first determining unit 607 and a second determining unit 608;
the searching unit 601 is configured to search for target instructions in the target firmware for loading absolute addresses, and add the absolute addresses loaded by each target instruction to a preset first address multiple set.
In a specific implementation, the search unit 601 is specifically configured to: and searching LDR instructions in the target firmware, and adding the absolute address loaded by each LDR instruction to a preset first address multiple set.
The processing unit 602 is configured to perform deduplication processing and sorting processing on absolute addresses included in the first address multi-set, so as to obtain a second address set.
The dividing unit 603 is configured to divide the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address included in the second address set, where the plurality of address groups are ordered according to their corresponding group value ranges.
In a specific implementation, the dividing unit 603 is specifically configured to: determining the number of packets of the second set of addresses using equation (2) above; and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the number of the groups.
The counting unit 604 is configured to count a cumulative distribution frequency of the number of absolute addresses included in each address packet.
In a specific implementation, the statistical unit 604 is specifically configured to: determining the accumulated number of the absolute addresses of the address packets by utilizing the number of the absolute addresses contained in the address packets and combining the formula (3) aiming at each address packet; for each address packet, the cumulative distribution frequency of the number of absolute addresses contained in the address packet is determined using the cumulative number of absolute addresses of the address packet in combination with the above equation (4).
A calculating unit 606, configured to calculate a packet slope of each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and the packet value range corresponding to each address packet.
In a specific implementation, the computing unit is specifically configured to: using equation (5), the packet slope of each address packet in the cumulative distribution frequency map is calculated.
A first determining unit 607 for determining the address packet with the largest packet slope as the first destination address packet and determining the next address packet of the first destination address packet as the second destination address packet.
The second determining unit 608 is configured to determine that a packet value range of the first destination address packet or the second destination address packet is a loading base address range of the destination firmware.
In the embodiment of the present invention, the second address set is obtained by performing deduplication processing and sorting processing on an absolute address loaded by a target instruction in target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute number of addresses of each address packet is calculated. The method comprises the steps of constructing a cumulative distribution frequency graph by utilizing a grouping value range and a cumulative distribution frequency of each address grouping, calculating a grouping slope of each address grouping in the cumulative distribution frequency graph, determining a first target address grouping with the largest grouping slope, and determining the next address grouping of the first target address grouping as a second target address grouping. And taking the grouping value range of the first target address grouping or the second target address grouping as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, the efficiency of determining the loading base address range is improved, and the accuracy of determining the loading base address range is ensured.
Preferably, in combination with the content shown in fig. 6, the determining system further includes:
and the base address determining unit is used for determining the loading base address of the target firmware by using the loading base address range.
In summary, embodiments of the present invention provide a method and a system for determining a loading base address range of firmware, where a second address set is obtained by performing deduplication processing and sorting processing on an absolute address loaded by a target instruction in target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute number of addresses of each address packet is calculated. The method comprises the steps of constructing a cumulative distribution frequency graph by utilizing a grouping value range and a cumulative distribution frequency of each address grouping, calculating a grouping slope of each address grouping in the cumulative distribution frequency graph, determining a first target address grouping with the largest grouping slope, and determining the next address grouping of the first target address grouping as a second target address grouping. And taking the grouping value range of the first target address grouping or the second target address grouping as the loading base address range of the target firmware. The reverse engineer is not required to determine the loading base address range, the efficiency of determining the loading base address range is improved, and the accuracy of determining the loading base address range is ensured.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for determining a loading base address range of firmware, the method comprising:
searching target instructions used for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction to a preset first address multiple set;
carrying out deduplication processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set;
dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, wherein the plurality of address groups are ordered according to the corresponding group value range;
counting the cumulative distribution frequency of the number of absolute addresses contained in each address packet;
constructing a cumulative distribution frequency graph according to the grouping value range of each address grouping and the cumulative distribution frequency, wherein the abscissa of the cumulative distribution frequency graph is the central point of the grouping value range of each address grouping, and the ordinate of the cumulative distribution frequency graph is the cumulative distribution frequency;
calculating the grouping slope of each address grouping in the cumulative distribution frequency graph based on the cumulative distribution frequency and the grouping value range corresponding to each address grouping;
determining the address packet with the largest packet slope as a first target address packet and determining the address packet next to the first target address packet as a second target address packet;
and determining that the grouping value range of the first target address grouping or the second target address grouping is the loading base address range of the target firmware.
2. The method of claim 1, wherein the calculating a packet slope for each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and a packet value range corresponding to each address packet comprises:
using Gi=(Pi+1-Pi)/(Ci+1-Ci) Calculating a packet slope, G, for each of the address packets in the cumulative distribution frequency mapiFor the packet slope of the ith address packet, PiCumulative distribution frequency, C, of the number of absolute addresses contained in the ith address packetiThe center point of the value range for the packet of the ith address packet.
3. The method of claim 1, wherein said counting the cumulative distribution frequency of the absolute number of addresses included in each of said address packets comprises:
for each address packet, using the number of absolute addresses contained in said address packet, in combination with Li=Si+Li-1Determining a cumulative number of absolute addresses, L, of said address packetsiFor the cumulative number of absolute addresses of the ith address packet, SiNumber of absolute addresses contained for the ith address packet, L0Is 0;
for each address packet, using the cumulative number of absolute addresses of said address packet, in combination with Pi=LiK, determining the cumulative distribution frequency, P, of the number of absolute addresses contained in said address packetiAnd K is the cumulative distribution frequency of the number of absolute addresses contained in the ith address group, and the total number of the absolute addresses contained in the second address set.
4. The method according to claim 1, wherein the dividing the second address set into a plurality of address groups according to a file size of the target firmware and a value range of an absolute address included in the second address set comprises:
determining the number V of the groups of the second address set by using V ═ max/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware;
and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.
5. The method of claim 1, wherein the searching for target instructions in target firmware for loading absolute addresses and adding the absolute address loaded by each target instruction to a preset first address multi-set comprises:
and searching LDR instructions in target firmware, and adding the absolute address loaded by each LDR instruction to a preset first address multiple set.
6. The method of claim 1, wherein after determining the load base address range of the target firmware, further comprising:
and determining the loading base address of the target firmware by using the loading base address range.
7. A system for determining a loading base address range of firmware, the system comprising:
the searching unit is used for searching target instructions used for loading absolute addresses in target firmware and adding the absolute addresses loaded by each target instruction to a preset first address multiple set;
the processing unit is used for carrying out duplication removal processing and sequencing processing on absolute addresses contained in the first address multiple set to obtain a second address set;
the dividing unit is used for dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, and the plurality of address groups are ordered according to the corresponding group value range;
a counting unit, configured to count a cumulative distribution frequency of the number of absolute addresses included in each address packet;
a building unit, configured to build a cumulative distribution frequency map according to the packet value range of each address packet and the cumulative distribution frequency, where an abscissa of the cumulative distribution frequency map is a central point of the packet value range of each address packet, and a ordinate of the cumulative distribution frequency map is the cumulative distribution frequency;
a calculating unit, configured to calculate a packet slope of each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and a packet value range corresponding to each address packet;
a first determination unit configured to determine that the address packet having the largest packet slope is a first target address packet and determine that the address packet next to the first target address packet is a second target address packet;
a second determining unit, configured to determine that a packet value range of the first target address packet or the second target address packet is a loading base address range of the target firmware.
8. The system according to claim 7, wherein the computing unit is specifically configured to: using Gi=(Pi+1-Pi)/(Ci+1-Ci) Calculating a packet slope, G, for each of the address packets in the cumulative distribution frequency mapiFor the packet slope of the ith address packet, PiCumulative distribution frequency, C, of the number of absolute addresses contained in the ith address packetiThe center point of the value range for the packet of the ith address packet.
9. The system according to claim 7, wherein the statistical unit is specifically configured to: for each address packet, using the number of absolute addresses contained in said address packet, in combination with Li=Si+Li-1Determining a cumulative number of absolute addresses, L, of said address packetsiFor the cumulative number of absolute addresses of the ith address packet, SiNumber of absolute addresses contained for the ith address packet, L0Is 0; for each address packet, using the cumulative number of absolute addresses of said address packet, in combination with Pi=LiK, determining the cumulative distribution frequency, P, of the number of absolute addresses contained in said address packetiAnd K is the cumulative distribution frequency of the number of absolute addresses contained in the ith address group, and the total number of the absolute addresses contained in the second address set.
10. The system according to claim 7, wherein the partitioning unit is specifically configured to: determining the number V of the groups of the second address set by using V ═ max/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware; and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.
CN202110300970.3A 2021-03-22 2021-03-22 Method and system for determining loading base address range of firmware Active CN112965724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110300970.3A CN112965724B (en) 2021-03-22 2021-03-22 Method and system for determining loading base address range of firmware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110300970.3A CN112965724B (en) 2021-03-22 2021-03-22 Method and system for determining loading base address range of firmware

Publications (2)

Publication Number Publication Date
CN112965724A true CN112965724A (en) 2021-06-15
CN112965724B CN112965724B (en) 2024-06-07

Family

ID=76278264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110300970.3A Active CN112965724B (en) 2021-03-22 2021-03-22 Method and system for determining loading base address range of firmware

Country Status (1)

Country Link
CN (1) CN112965724B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116880858A (en) * 2023-09-06 2023-10-13 北京华云安信息技术有限公司 Method, device, equipment and storage medium for acquiring actual base address of firmware

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5487146A (en) * 1994-03-08 1996-01-23 Texas Instruments Incorporated Plural memory access address generation employing guide table entries forming linked list
CN101246427A (en) * 2007-02-15 2008-08-20 凌阳科技股份有限公司 Method for relocated loading application program and address relocation device
CN101911024A (en) * 2008-01-11 2010-12-08 国际商业机器公司 Dynamic address translation with frame management
CN103733195A (en) * 2011-07-08 2014-04-16 起元技术有限责任公司 Managing storage of data for range-based searching
CN105278916A (en) * 2014-07-09 2016-01-27 英特尔公司 Apparatuses and methods for generating a suppressed address trace
CN107861729A (en) * 2017-11-08 2018-03-30 中国信息安全测评中心 A kind of firmware loads localization method, device and the electronic equipment of plot
CN109214149A (en) * 2018-09-11 2019-01-15 中国人民解放军战略支援部队信息工程大学 A kind of MIPS firmware base address automated detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5487146A (en) * 1994-03-08 1996-01-23 Texas Instruments Incorporated Plural memory access address generation employing guide table entries forming linked list
CN101246427A (en) * 2007-02-15 2008-08-20 凌阳科技股份有限公司 Method for relocated loading application program and address relocation device
CN101911024A (en) * 2008-01-11 2010-12-08 国际商业机器公司 Dynamic address translation with frame management
CN103733195A (en) * 2011-07-08 2014-04-16 起元技术有限责任公司 Managing storage of data for range-based searching
CN105278916A (en) * 2014-07-09 2016-01-27 英特尔公司 Apparatuses and methods for generating a suppressed address trace
CN107861729A (en) * 2017-11-08 2018-03-30 中国信息安全测评中心 A kind of firmware loads localization method, device and the electronic equipment of plot
CN109214149A (en) * 2018-09-11 2019-01-15 中国人民解放军战略支援部队信息工程大学 A kind of MIPS firmware base address automated detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ABNOR: "ARM 汇编寻址方式", HTTPS://WWW.CNBLOGS.COM/XIAOJIANG1025/P/5951461.HTML *
RUIQING XIAO 等: ""Recognizing the Data Type of Firmware Data Segments With Deep Learnin"", HTTPS://IEEEXPLORE.IEEE.ORG/STAMP/STAMP.JSP?ARNUMBER=9060884 *
朱瑞瑾: "ARM设备固件装载基址定位的研究", 中国博士学位论文全文数据库 (信息科技辑), pages 137 - 21 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116880858A (en) * 2023-09-06 2023-10-13 北京华云安信息技术有限公司 Method, device, equipment and storage medium for acquiring actual base address of firmware

Also Published As

Publication number Publication date
CN112965724B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN107608860B (en) Method, device and equipment for classified storage of error logs
KR20130044290A (en) Method and apparatus for analyzing and detecting malicious software
CN109756533B (en) Mirror image acceleration method and device and server
CN114793182B (en) Intelligent park distributed network security risk assessment method and device
EP2811408A1 (en) Data management device, data management method, data management program, and information processing device
CN112965724A (en) Method and system for determining loading base address range of firmware
CN112052413B (en) URL fuzzy matching method, device and system
CN107944931A (en) Seed user expanding method, electronic equipment and computer-readable recording medium
US8812516B2 (en) Determining top N or bottom N data values and positions
CN113268439A (en) Memory address searching method and device, electronic equipment and storage medium
CN113946566B (en) Web system fingerprint database construction method and device and electronic equipment
CN111858607A (en) Data processing method and device, electronic equipment and computer readable medium
CN110334104B (en) List updating method and device, electronic equipment and storage medium
CN114048136A (en) Test type determination method, device, server, medium and product
US9201982B2 (en) Priority search trees
CN115964002B (en) Electric energy meter terminal archive management method, device, equipment and medium
CN116303418A (en) Mass rule processing method, system and storage medium based on Internet of things platform
CN110019829B (en) Data attribute determination method and device
CN115004667B (en) Information pushing method, device, electronic equipment and computer readable medium
CN108173689B (en) Output system of load balancing data
JP5139335B2 (en) Data search device, data search method, and data search program
CN111161798A (en) Reassembling method and reassembling device for metagenome and terminal equipment
CN109783523B (en) Data processing method, device, equipment and storage medium
CN107085571B (en) Method and device for executing check rule
CN111723229B (en) Data comparison method, device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant