CN112965724B

CN112965724B - Method and system for determining loading base address range of firmware

Info

Publication number: CN112965724B
Application number: CN202110300970.3A
Authority: CN
Inventors: 朱瑞瑾; 张宝峰; 毛军捷; 谭毓安; 高金萍; 许源; 熊琦; 贾炜; 孙亚飞
Original assignee: China Information Technology Security Evaluation Center
Current assignee: China Information Technology Security Evaluation Center
Filing date: 2021-03-22
Publication date: 2024-06-07
Anticipated expiration: 2041-03-22

Abstract

The invention provides a method and a system for determining the loading base address range of firmware, wherein the method comprises the following steps: adding the absolute address loaded by each target instruction in the target firmware into a first address multiple set, and performing de-duplication processing and sequencing processing on the absolute address to obtain a second address set; dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address of the second address set, counting the cumulative distribution frequency of the absolute address number of each group, constructing a cumulative distribution frequency diagram according to the cumulative distribution frequency, determining a first target address group with the largest group slope in the cumulative distribution frequency diagram, and determining the next address group as a second target address group; and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The loading base range is not required to be determined by a reverse engineer, so that the efficiency of determining the loading base range is improved, and the accuracy of determining the loading base range is ensured.

Description

Method and system for determining loading base address range of firmware

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a system for determining a loading base address range of firmware.

Background

In order to ensure the safety of the embedded system, the firmware of the embedded system needs to be subjected to safety evaluation in a disassembling mode, but the loading base address of the firmware cannot be directly obtained by disassembling the firmware.

The current way to determine the loading base of firmware is: the reverse engineer scans the firmware by using a specific tool, and determines the loading base address range of the firmware according to own experience and intuition, so as to obtain the loading base address by positioning. However, on the one hand, the reverse engineer needs to spend a lot of time processing the firmware to determine the loading base range, and the efficiency of determining the loading base range is low, on the other hand, the determination of the loading base range depends on experience and intuition of the reverse engineer, and the accuracy of the determined loading base range cannot be ensured.

Disclosure of Invention

In view of this, the embodiment of the invention provides a method and a system for determining a loading base address range of firmware, so as to solve the problems of low efficiency, failure to guarantee accuracy, and the like in the current method for determining the loading base address range.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

an embodiment of the present invention in a first aspect discloses a method for determining a loading base address range of firmware, where the method includes:

Searching target instructions for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction into a preset first address multiple set;

Performing de-duplication processing and sequencing processing on absolute addresses contained in the first address multi-set to obtain a second address set;

Dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, and sequencing the address groups according to the group value range corresponding to the address groups;

counting the cumulative distribution frequency of the absolute address number contained in each address packet;

Constructing a cumulative distribution frequency chart according to the packet value range of each address packet and the cumulative distribution frequency, wherein the abscissa of the cumulative distribution frequency chart is the central point of the packet value range of each address packet, and the ordinate of the cumulative distribution frequency chart is the cumulative distribution frequency;

Calculating the packet slope of each address packet in the cumulative distribution frequency chart based on the cumulative distribution frequency and the packet value range corresponding to each address packet;

Determining the address packet with the largest packet slope as a first target address packet, and determining the next address packet of the first target address packet as a second target address packet;

and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware.

Preferably, the calculating a packet slope of each address packet in the cumulative distribution frequency map based on the cumulative distribution frequency and the packet value range corresponding to each address packet includes:

Using G _i＝(P_i+1-P_i)/(C_i+1-C_i), calculating the packet slope of each address packet in the cumulative distribution frequency chart, G _i is the packet slope of the ith address packet, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and C _i is the center point of the packet value range of the ith address packet.

Preferably, the counting the cumulative distribution frequency of the absolute address number contained in each address packet includes:

For each address packet, determining the accumulated number of the absolute addresses of the address packet by using the number of the absolute addresses contained in the address packet and combining L _i＝S_i+L_i-1, wherein L _i is the accumulated number of the absolute addresses of the ith address packet, S _i is the number of the absolute addresses contained in the ith address packet, and L ₀ is 0;

For each address packet, the cumulative distribution frequency of the absolute address number contained in the address packet is determined by using the cumulative number of the absolute addresses of the address packet and combining P _i＝L_i/K, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and K is the total number of the absolute addresses contained in the second address set.

Preferably, the dividing the second address set into a plurality of address packets according to the file size of the target firmware and the value range of the absolute address contained in the second address set includes:

Determining the packet number V of the second address set by using V= (max-min)/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware;

and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.

Preferably, the searching for the target instruction used for loading the absolute address in the target firmware, and adding the absolute address loaded by each target instruction to the preset first address multi-set includes:

Searching LDR instructions in target firmware, and adding absolute addresses loaded by each LDR instruction into a preset first address multi-set.

Preferably, after determining the loading base address range of the target firmware, the method further includes:

And determining the loading base address of the target firmware by utilizing the loading base address range.

A second aspect of an embodiment of the present invention discloses a system for determining a loading base address range of firmware, the system including:

The searching unit is used for searching target instructions for loading absolute addresses in target firmware and adding the absolute addresses loaded by each target instruction into a preset first address multiple set;

The processing unit is used for carrying out duplication removal processing and sequencing processing on the absolute addresses contained in the first address multiple set to obtain a second address set;

The dividing unit is used for dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set, and the address groups are ordered according to the group value range corresponding to the address groups;

a statistics unit, configured to count a cumulative distribution frequency of the absolute address number included in each address packet;

the building unit is used for building a cumulative distribution frequency chart according to the group value range of each address group and the cumulative distribution frequency, wherein the abscissa of the cumulative distribution frequency chart is the central point of the group value range of each address group, and the ordinate of the cumulative distribution frequency chart is the cumulative distribution frequency;

A calculating unit, configured to calculate a packet slope of each address packet in the cumulative distribution frequency chart based on the cumulative distribution frequency and the packet value range corresponding to each address packet;

A first determining unit, configured to determine that the address packet with the largest packet slope is a first destination address packet, and determine that a next address packet of the first destination address packet is a second destination address packet;

And the second determining unit is used for determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware.

Preferably, the computing unit is specifically configured to: using G _i＝(P_i+1-P_i)/(C_i+1-C_i), calculating the packet slope of each address packet in the cumulative distribution frequency chart, G _i is the packet slope of the ith address packet, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and C _i is the center point of the packet value range of the ith address packet.

Preferably, the statistics unit is specifically configured to: for each address packet, determining the accumulated number of the absolute addresses of the address packet by using the number of the absolute addresses contained in the address packet and combining L _i＝S_i+L_i-1, wherein L _i is the accumulated number of the absolute addresses of the ith address packet, S _i is the number of the absolute addresses contained in the ith address packet, and L ₀ is 0; for each address packet, the cumulative distribution frequency of the absolute address number contained in the address packet is determined by using the cumulative number of the absolute addresses of the address packet and combining P _i＝L_i/K, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and K is the total number of the absolute addresses contained in the second address set.

Preferably, the dividing unit is specifically configured to: determining the packet number V of the second address set by using V= (max-min)/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware; and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.

Based on the method and the system for determining the loading base address range of the firmware provided by the embodiment of the invention, the method comprises the following steps: searching target instructions for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction into a preset first address multiplex; performing de-duplication processing and sequencing processing on absolute addresses contained in the first address multi-set to obtain a second address set; dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set; counting the cumulative distribution frequency of the absolute address number contained in each address packet; constructing a cumulative distribution frequency chart according to the packet value range and the cumulative distribution frequency of each address packet; calculating the packet slope of each address packet in the cumulative distribution frequency chart, determining the address packet with the largest packet slope as a first target address packet, and determining the next address packet of the first target address packet as a second target address packet; and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. In the scheme, the second address set is obtained by performing duplicate removal processing and sorting processing on the absolute address loaded by the target instruction in the target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute address numbers of each address packet is calculated. Constructing a cumulative distribution frequency chart by utilizing the packet value range and the cumulative distribution frequency of each address packet, calculating the packet slope of each address packet in the cumulative distribution frequency chart, determining a first target address packet with the largest packet slope, and determining the next address packet of the first target address packet as a second target address packet. And taking the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The loading base range is not required to be determined by a reverse engineer, so that the efficiency of determining the loading base range is improved, and the accuracy of determining the loading base range is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a firmware loading diagram according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for determining a loading base address range of firmware according to an embodiment of the present invention;

FIG. 3 is a flowchart of a search target instruction according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an LDR instruction machine code format according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a cumulative distribution frequency chart according to an embodiment of the present invention;

fig. 6 is a block diagram of a system for determining a loading base address range of firmware according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As known from the background art, currently, when determining the loading base address of the firmware, the loading base address range of the firmware is determined mainly by the experience and intuition of the reverse engineer, so as to obtain the loading base address by positioning. However, this approach requires a lot of time, which results in a low efficiency in determining the loading base range, and on the other hand, whether the determined loading base range is accurately dependent on experience and intuition of a reverse engineer, which cannot guarantee accuracy.

Therefore, the embodiment of the invention provides a method and a system for determining the loading base address range of firmware, which are used for obtaining a second address set by carrying out de-duplication processing and sequencing processing on absolute addresses loaded by target instructions in target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute address numbers of each address packet is calculated. Constructing a cumulative distribution frequency chart by utilizing the packet value range and the cumulative distribution frequency of each address packet, calculating the packet slope of each address packet in the cumulative distribution frequency chart, determining a first target address packet with the largest packet slope, and determining the next address packet of the first target address packet as a second target address packet. And taking the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The loading base range is not required to be determined by a reverse engineer, so that the efficiency of determining the loading base range is improved and the accuracy of determining the loading base range is ensured.

It should be noted that, in the embodiment of the present invention, the loading base address is the starting position of mapping firmware to the embedded system memory, as shown in the firmware loading schematic diagram shown in fig. 1, which illustrates the mapping of firmware to the embedded system memory.

It should be further noted that, in the target firmware (such as ARM firmware), when the absolute address is operated, the absolute address is generally loaded into the register by using the target instruction (such as LDR instruction), and the loaded absolute address points to the memory mapping range of the target firmware. Therefore, the foregoing principles may be utilized to determine the loading base address range of the target firmware by the following method for determining the loading base address range of the firmware according to the embodiments of the present invention, and the detailed description will be given in the following embodiments.

Referring to fig. 2, a flowchart of a method for determining a loading base address range of firmware according to an embodiment of the present invention is shown, where the determining method includes:

Step S201: searching target instructions for loading absolute addresses in target firmware, and adding the absolute addresses loaded by each target instruction into a preset first address multi-set.

It should be noted that, the target instruction is used to load an absolute address, for example: LDR instructions are used to load absolute addresses, typically those of string addresses, function addresses, or structure addresses, etc.

It will be understood that there are a large number of instructions in the target firmware, so it is necessary to search for the target instruction for loading the absolute address from all the instructions in the target firmware, in the process of implementing step S201, determine whether each instruction is a target instruction one by one from the start position of the target firmware, calculate the absolute address loaded by the target instruction when the target instruction is searched, and add the absolute address of the calculated target instruction to the first address multiple set. By the method, each target instruction of the target firmware is obtained through searching, and absolute contrast loaded by each target instruction is added into the first address multi-set.

In a specific implementation, the LDR instructions in the target firmware are searched, and the absolute address loaded by each LDR instruction is added to the preset first address multi-set, where the target firmware may be ARM firmware, or may be other types of firmware, and the target instruction may be an LDR instruction, or may be other instructions capable of being used to load the absolute address, and the target firmware and the target instruction are not specifically limited herein.

To better explain how to find the target instruction in the target firmware, it should be noted that fig. 3 is only used for illustration.

Referring to fig. 3, a flowchart of searching a target instruction provided by an embodiment of the present invention is shown, where taking an example that a target firmware is ARM firmware and a target instruction is an LDR instruction, the method includes the following steps:

step S301: and judging whether the current instruction of the ARM firmware is an LDR instruction or not. If yes, go to step S301, if no, go to step S304.

In the specific implementation process of step S301, starting from the start position of the ARM firmware, it is determined whether each instruction is an LDR instruction one by one, where the current instruction is an instruction currently used for performing the foregoing determination. If the current instruction is an LDR instruction, step S302 is executed, and if the current instruction is not an LDR instruction, step S304 is executed.

Step S302: the absolute address loaded by the LDR instruction is calculated.

Step S303: the absolute address loaded by the LDR instruction is stored in a first address multiset.

Step S304: whether the current instruction is the last instruction of ARM firmware. If the current instruction is the last instruction of the ARM firmware, ending the flow, if the current instruction is not the last instruction of the ARM firmware, jumping to the next instruction and returning to execute the step S301.

It can be understood that when calculating the absolute address of the target instruction, the absolute address loaded by the target instruction is calculated by using the machine code of the target instruction, and in order to better explain how to calculate the absolute address loaded by the target instruction, taking the target firmware as the ARM firmware and the target instruction as the LDR instruction as examples, the machine code format schematic diagram of the LDR instruction shown in fig. 4 is explained by the following description.

The machine code format of an LDR instruction that loads an immediate into a register in ARM state is as shown in fig. 4, assuming: in ARM state, the LDR instruction loads character string "aSystem", the memory address of LDR instruction is 0x20004864, and the machine code of LDR instruction is "E59F00B4".

By parsing the machine code format and machine code of the LDR instruction, rd= (0000) ₂＝R0,imm12＝(000010110100)₂ =0xb4 is obtained. The LDR instruction in ARM state looks for an address of (PC &0 xFFFFFFFC) +imm12.

Since the ARM processor adopts 3-stage pipeline technology, in the ARM state, the value of PC is pc=current+8, so the address of LDR instruction is found as in formula (1).

According to the above formula (1), the seek address of the LDR instruction is 0x20004920, and assuming that the 4 bytes starting at the memory address 0x20004920 are (E0350020) and the ARM firmware in this example is a small-end storage, the absolute address loaded by the LDR instruction is 0x200035E0, and the content stored at the address 0x200035E0 is the actual content of the character string "aSystem".

Step S202: and performing de-duplication processing and sequencing processing on the absolute addresses contained in the first address multi-set to obtain a second address set.

As can be seen from the above content of step S201, the absolute addresses loaded by all the target instructions of the target firmware are stored in the first address multiset. That is, the first address multiset stores all absolute addresses in the target firmware, where duplicate absolute addresses may occur.

In the specific implementation process of step S202, the absolute addresses included in the first address multiple set are subjected to deduplication, and only one absolute address is reserved for the repeated absolute addresses, and the absolute addresses of the first address multiple set after the deduplication are subjected to sorting (sorting according to the size of the absolute addresses) to obtain the second address set.

It will be appreciated that each element in the first address multiset and the second address set is an absolute address.

To better explain how the first address multiset is deduplicated and ordered, the following process A1 through process A3 is explained.

Let the first address multiset be M, which contains the absolute addresses loaded by all target instructions in the target firmware.

A1, emptying the set N.

Process A2 is performed in a loop for each absolute address M _i in the first address multiset M.

A2, if the absolute address m _i does not belong to the set N, adding the absolute address m _i into the set N.

It should be noted that the procedure A2 is used to remove the absolute address repeatedly appearing in the first address multiple set M.

A3, after executing the process A2 on all the absolute addresses in the first address multiple set M, sorting the absolute addresses in the N according to the sizes of the absolute addresses to obtain a second address set.

Step S203: and dividing the second address set into a plurality of address groups according to the file size of the target firmware and the value range of the absolute address contained in the second address set.

It should be noted that, each address packet has a corresponding packet range, where the packet range is obtained by dividing the packet range based on the value range of the absolute address included in the second address set, and the plurality of address packets are ordered according to the packet value range corresponding to the address packet.

As can be seen from the foregoing content of step S202, the absolute addresses in the second address set are the absolute addresses after the duplicate and rank removal, so the absolute addresses in the second address set have corresponding value ranges, the value ranges of the absolute addresses in the second address set are (min, max), max is the maximum value of the absolute addresses in the second address set, and min is the minimum value of the absolute addresses in the second address set.

In the specific implementation process of step S203, according to the file size of the target firmware, the value range of the absolute address included in the second address set is divided into a plurality of sub-ranges, and each sub-range is a packet value range of one address packet.

In a specific implementation, the number V of packets of the second address set is determined by using a formula (2), where in the formula (2), max is a maximum value of absolute addresses contained in the second address set, min is a minimum value of absolute addresses contained in the second address set, and filesize is a file size of the target firmware.

V＝(max-min)/filesize (2)

And according to the number V of the packets, taking the file size of the target firmware as the length of a packet value range, and grouping the absolute addresses contained in the second address set to obtain V address packets, wherein the value of the absolute address in each address packet is in the packet value range of the address packet.

Such as: let n= {0, 1, 2, 3, 5, 11, 23, 44, 55, 67, 79, 99, 100}, which has a total of 13 elements (absolute addresses), wherein the absolute address of the second address set N has a value range of (min=0, max=100), and let the file size of the target firmware be 20. The number of packets of the second address set N can be calculated as (100-0)/20=5 using equation (2).

The absolute address of the second address set N is divided into 5 groups with the length of the packet value range being 20 (the file size of the target firmware), and 5 groups of address packets are obtained, the packet value range of each group of address packets and the absolute address included are as shown in table 1.

Table 1:

Grouping value range	Absolute address contained
		[0、20]	0、1、2、3、5、11
[21、40]	23
		[41、60]	44、55
[61、80]	67、79
		[81、100]	99、100

As can be seen from table 1, after the second address set is divided into packets, the absolute address included in each address packet is within the packet value range of the address packet.

Step S204: and counting the cumulative distribution frequency of the absolute address number contained in each address packet.

In the specific implementation process of step S204, the cumulative number of absolute addresses contained in each address packet is determined first, and then the cumulative distribution frequency of the absolute address numbers contained in each address packet is calculated by using the cumulative number of absolute addresses contained in each address packet and the total number of absolute addresses contained in the second address set.

The specific implementation mode is as follows: for each address packet, the accumulated number of the absolute addresses of the address packet is determined by using the absolute address number contained in the address packet and combining the formula (3), wherein in the formula (3), L _i is the accumulated number of the absolute addresses of the ith address packet, S _i is the number of the absolute addresses contained in the ith address packet, and L ₀ is 0.

L_i＝S_i+L_i-1 (3)

The cumulative number of absolute addresses of each address packet is calculated by the formula (3).

For each address packet, determining the cumulative distribution frequency of the absolute address number contained in the address packet by using the cumulative number of the absolute addresses of the address packet and combining the formula (4), wherein in the formula (4), P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and K is the total number of the absolute addresses contained in the second address set.

P_i＝L_i/K (4)

Such as: in connection with table 1 and its corresponding example in step S203, by the contents shown in table 2, in connection with the formulas (3) and (4), explanation is made on how to calculate the cumulative number of absolute addresses of the address group and the cumulative distribution frequency of the absolute address numbers, wherein the total number of absolute addresses contained in the second address set N is 13.

Table 2:

it should be noted that the contents shown in tables 1 and 2 above are only for illustration.

Step S205: and constructing a cumulative distribution frequency chart according to the packet value range and the cumulative distribution frequency of each address packet.

In the specific implementation process of step S205, a cumulative distribution frequency chart is constructed by taking the central point of the packet value range of each address packet as an abscissa value and the cumulative distribution frequency of the absolute address number contained in each address packet as an ordinate value.

That is, the abscissa of the cumulative distribution frequency chart is the center point of the packet value range of each address packet, and the ordinate of the cumulative distribution frequency chart is the cumulative distribution frequency.

Step S206: and calculating the packet slope of each address packet in the cumulative distribution frequency chart based on the cumulative distribution frequency and the packet value range corresponding to each address packet.

In the specific implementation process of step S206, the packet slope of each address packet in the constructed cumulative distribution frequency chart is calculated by using formula (5), where in formula (5), G _i is the packet slope of the ith address packet, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and C _i is the central point of the packet value range of the ith address packet.

G_i＝(P_i+1-P_i)/(C_i+1-C_i) (5)

Step S207: the address packet with the largest packet slope is determined to be the first target address packet, and the next address packet of the first target address packet is determined to be the second target address packet.

In the specific implementation process of step S207, after the packet slope of each address packet is calculated, the address packet with the largest packet slope is selected as the first target address packet.

As can be seen from the above-mentioned content of step S203, the plurality of address packets are sorted according to the packet value ranges corresponding to the plurality of address packets, and the next address packet of the first destination address packet is regarded as the second destination address packet.

Step S208: and determining the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware.

In the process of concretely implementing step S208, the packet value range of the first target address packet (the packet gradient is the largest) or the second target address packet is taken as the loading base address range of the target firmware.

It is understood that the corresponding selection condition is preset, and when the first target address packet meets the selection condition, the packet value range of the first target address packet is used as the loading base address range of the target firmware. When the first target address packet does not meet the selection condition, the packet value range of the second target address packet is taken as the loading base address range of the target firmware.

Such as: the selection condition is set to a condition corresponding to a normal case, and when the first target address packet satisfies the selection condition, that is, in the normal case, the packet value range of the first target address packet is set as the loading base address range of the target firmware. When the first target address packet does not satisfy the selection condition, that is, in a special case, the packet value range of the second target address packet is taken as the loading base address range of the target firmware.

And, for example: assume that the length of the packet value range of each address packet is 10, and that the packet value range of the first target address packet (packet gradient is largest) is [21,30], and that the packet value range of the second target address packet is [31,40]. In a normal case, the packet value range [21,30] of the first target address packet is the loading address value range of the target firmware. Correspondingly, in special cases, the packet value range [31,40] of the second target address packet is the loading address value range of the target firmware.

Preferably, after determining the load base address range of the target firmware, the load base address of the target firmware is determined by using the load base address range in combination with a specified load base address determination method.

Preferably, after determining the load base of the target firmware, the correctness of the determined load base can be verified using a corresponding disassembly tool.

Such as: assuming that the determined loading base address of the target firmware is 0xC3421000, the file of the target firmware is loaded with IDA Pro (some disassembly tool), the processor type corresponding to the target firmware is set, and the loading base address is set to 0xC3421000. When the acquired data meets the preset checking requirement, the loading base address of the target firmware determined by the steps is determined to be correct. The preset test requirements are as follows: the disassembled code structure is clear and meaningful, the character string cross reference matching is determined to be correct, the disassembled code structure has a complete binary function structure, and the binary function has a matched preamble and tail sound.

In the embodiment of the invention, the second address set is obtained by performing de-duplication processing and sequencing processing on the absolute address loaded by the target instruction in the target firmware. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute address numbers of each address packet is calculated. Constructing a cumulative distribution frequency chart by utilizing the packet value range and the cumulative distribution frequency of each address packet, calculating the packet slope of each address packet in the cumulative distribution frequency chart, determining a first target address packet with the largest packet slope, and determining the next address packet of the first target address packet as a second target address packet. And taking the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The loading base range is not required to be determined by a reverse engineer, so that the efficiency of determining the loading base range is improved, and the accuracy of determining the loading base range is ensured.

To better explain how the load base range of the target firmware is determined, it is illustrated by the contents shown in processes B1 to B6.

B1, it is assumed that 28191 LDR instructions are identified from the target firmware, i.e., the first address multiset contains the absolute addresses of 28191 LDR instruction loads.

And B2, performing de-duplication processing and sequencing processing on 28191 absolute addresses in the first address multiplex set to obtain a second address set containing 12656 absolute addresses, wherein sequencing of the absolute addresses in the second address set is completed.

B3, assuming that the file size of the target firmware is 5679356 bytes (5.41 MB), that is, the length of the packet value range of each address packet is 5679356, and assuming that the minimum value of the absolute address in the second address set is 1, the maximum value is 4294967295, and the number of packets in the second address set is as follows in combination with the above formula (2): (4294967295-1)/5679356 = 756.2 (rounding up), i.e. dividing the second set of addresses into 757 address packets.

B4, dividing 12656 absolute addresses of the second address set into 757 address groups, combining the formula (3) and the formula (4), counting the accumulated number of the absolute addresses of each address group, and calculating the accumulated distribution frequency of the absolute address number of each address group according to the accumulated number of each address group.

B5, constructing a cumulative distribution frequency chart according to the packet value range and the cumulative distribution frequency of each address packet, wherein the cumulative distribution frequency chart is shown as a schematic diagram of the cumulative distribution frequency chart in FIG. 5.

B6, calculating the packet slope of each address packet in the cumulative distribution frequency chart shown in fig. 5, determining a first target address packet with the largest packet slope, and determining the next address packet of the first target address packet as a second target address packet, wherein the first target address packet and the second target address packet are as the parts marked by ellipses in fig. 5, the packet value range of the first target address packet is [533859465,539538821], and the packet value range of the second target address packet is [539538821,545218177].

Assuming that the first target address packet meets the preset selection condition, taking the packet value range of the first target address packet as the loading base address range of the target firmware, in a specific implementation, converting [533859465,539538821] into hexadecimal to obtain [0x1fd20c89,0x2028b585], and taking the loading base address range of the target firmware as [0x1fd20c89,0x2028b585].

In fig. 5, "30%" in (533859465,539538821, 30%) is the cumulative distribution frequency.

Corresponding to the method for determining the loading base address range of the firmware provided in the above embodiment of the present invention, referring to fig. 6, the embodiment of the present invention further provides a structural block diagram of a system for determining the loading base address range of the firmware, where the determining system includes: a search unit 601, a processing unit 602, a dividing unit 603, a statistics unit 604, a construction unit 605, a calculation unit 606, a first determination unit 607, and a second determination unit 608;

the searching unit 601 is configured to search target instructions in the target firmware for loading absolute addresses, and add the absolute address loaded by each target instruction to a preset first address multi-set.

In a specific implementation, the search unit 601 is specifically configured to: searching LDR instructions in target firmware, and adding absolute addresses loaded by each LDR instruction into a preset first address multi-set.

The processing unit 602 is configured to perform deduplication processing and ordering processing on absolute addresses included in the first address multiple set, to obtain a second address set.

The dividing unit 603 is configured to divide the second address set into a plurality of address packets according to the file size of the target firmware and the value range of the absolute address included in the second address set, where the plurality of address packets are ordered according to the packet value range corresponding to the plurality of address packets.

In a specific implementation, the dividing unit 603 specifically is configured to: determining the number of packets of the second set of addresses using equation (2) above; and dividing the second address set into V address groups according to the number of the groups and taking the file size of the target firmware as the length of the group value range.

A statistics unit 604, configured to count a cumulative distribution frequency of the absolute address numbers included in each address packet.

In a specific implementation, the statistics unit 604 is specifically configured to: for each address packet, determining the accumulated number of absolute addresses of the address packet by using the absolute address number contained in the address packet and combining the formula (3); for each address packet, the cumulative distribution frequency of the absolute address numbers contained in the address packet is determined by using the cumulative number of the absolute addresses of the address packet in combination with the above formula (4).

A calculating unit 606, configured to calculate a packet slope of each address packet in the cumulative distribution frequency chart based on the cumulative distribution frequency and the packet value range corresponding to each address packet.

In a specific implementation, the computing unit is specifically configured to: using equation (5), the packet slope of each address packet in the cumulative distribution frequency map is calculated.

A first determining unit 607 is configured to determine an address packet with the largest packet slope as a first destination address packet, and determine a next address packet of the first destination address packet as a second destination address packet.

A second determining unit 608, configured to determine a packet value range of the first target address packet or the second target address packet as a loading base address range of the target firmware.

Preferably, in combination with the content shown in fig. 6, the determining system further comprises:

and the base address determining unit is used for determining the loading base address of the target firmware by utilizing the loading base address range.

In summary, the embodiment of the invention provides a method and a system for determining a loading base address range of firmware, which are used for performing deduplication processing and ordering processing on absolute addresses loaded by target instructions in target firmware to obtain a second address set. The second address set is divided into a plurality of address packets, and a cumulative distribution frequency of the absolute address numbers of each address packet is calculated. Constructing a cumulative distribution frequency chart by utilizing the packet value range and the cumulative distribution frequency of each address packet, calculating the packet slope of each address packet in the cumulative distribution frequency chart, determining a first target address packet with the largest packet slope, and determining the next address packet of the first target address packet as a second target address packet. And taking the packet value range of the first target address packet or the second target address packet as the loading base address range of the target firmware. The loading base range is not required to be determined by a reverse engineer, so that the efficiency of determining the loading base range is improved, and the accuracy of determining the loading base range is ensured.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining a loading base range of firmware, the method comprising:

Determining a packet value range of the first target address packet or the second target address packet as a loading base address range of the target firmware;

The counting of the cumulative distribution frequency of the absolute address number contained in each address packet comprises the following steps:

2. The method of claim 1, wherein calculating a packet slope of each of the address packets in the cumulative distribution frequency map based on the cumulative distribution frequency and a packet value range corresponding to each of the address packets comprises:

3. The method of claim 1, wherein the dividing the second set of addresses into a plurality of address packets according to the file size of the target firmware and the range of absolute addresses contained in the second set of addresses comprises:

4. The method of claim 1, wherein searching for target instructions in target firmware for loading absolute addresses and adding the absolute address loaded by each of the target instructions to a preset first address multi-set comprises:

5. The method of claim 1, further comprising, after determining the load base range of the target firmware:

6. A system for determining a loading base range of firmware, the system comprising:

a second determining unit, configured to determine a packet value range of the first target address packet or the second target address packet as a loading base address range of the target firmware;

The statistical unit is specifically configured to: for each address packet, determining the accumulated number of the absolute addresses of the address packet by using the number of the absolute addresses contained in the address packet and combining L _i＝S_i+L_i-1, wherein L _i is the accumulated number of the absolute addresses of the ith address packet, S _i is the number of the absolute addresses contained in the ith address packet, and L ₀ is 0; for each address packet, the cumulative distribution frequency of the absolute address number contained in the address packet is determined by using the cumulative number of the absolute addresses of the address packet and combining P _i＝L_i/K, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and K is the total number of the absolute addresses contained in the second address set.

7. The system according to claim 6, wherein the computing unit is specifically configured to: using G _i＝(P_i+1-P_i)/(C_i+1-C_i), calculating the packet slope of each address packet in the cumulative distribution frequency chart, G _i is the packet slope of the ith address packet, P _i is the cumulative distribution frequency of the absolute address number contained in the ith address packet, and C _i is the center point of the packet value range of the ith address packet.

8. The system according to claim 6, wherein the dividing unit is specifically configured to: determining the packet number V of the second address set by using V= (max-min)/filesize, wherein max is the maximum value of absolute addresses contained in the second address set, min is the minimum value of absolute addresses contained in the second address set, and filesize is the file size of the target firmware; and dividing the second address set into V address groups by taking the file size of the target firmware as the length of a group value range according to the group number.