CN116069338A - Function library reference detection method, device, equipment and readable storage medium - Google Patents

Function library reference detection method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN116069338A
CN116069338A CN202310211634.0A CN202310211634A CN116069338A CN 116069338 A CN116069338 A CN 116069338A CN 202310211634 A CN202310211634 A CN 202310211634A CN 116069338 A CN116069338 A CN 116069338A
Authority
CN
China
Prior art keywords
function
header file
library
item
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310211634.0A
Other languages
Chinese (zh)
Other versions
CN116069338B (en
Inventor
朱劲松
万振华
王颉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seczone Technology Co Ltd
Original Assignee
Seczone Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seczone Technology Co Ltd filed Critical Seczone Technology Co Ltd
Priority to CN202310211634.0A priority Critical patent/CN116069338B/en
Publication of CN116069338A publication Critical patent/CN116069338A/en
Application granted granted Critical
Publication of CN116069338B publication Critical patent/CN116069338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4434Reducing the memory space required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • G06F16/152File search processing using file content signatures, e.g. hash values
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a function library reference detection method, a function library reference detection device, function library reference detection equipment and a readable storage medium. The method comprises the following steps: acquiring first characteristic information of an item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function; matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to a binary function library; and determining an objective function library referenced by the item to be detected in the database according to the matching result. By implementing the scheme, the header file information and the bottommost function of the item to be detected are respectively matched with the header file information and the derived function in the preset database, and the objective function library referenced by the item to be detected can be determined according to the matching result, so that the accuracy of detection is effectively improved, the data volume required to be prepared is small, and the storage space is effectively saved.

Description

Function library reference detection method, device, equipment and readable storage medium
Technical Field
The present disclosure relates to the field of electronic technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting function library references.
Background
For the method of detecting the referenced function library in the C language source code item, the current industry adopts the schemes of source code similarity detection and packet manager identification.
The source code similarity detection is to collect a large amount of source codes of open source projects, extract the contents of all functions in the projects, perform processing such as space deletion and variable name normalization on each function, calculate hash values of the processed function contents, store the hash values in a warehouse, and store function library names and versions corresponding to the hash values in the warehouse. When detecting which components are referred to by a certain item, the item is processed in the same way to obtain the hash value of each function, the hash values are taken into a library for searching, and the names and versions of the referred function library are obtained through the number, proportion and the like of the matched hash values.
The package manager identifies the components in a solution that identifies the contents of package management files, such as vcppkg, in text form, describing libraries that need to be downloaded or referenced, some containing version information.
However, the scheme of detecting the similarity of the source codes can only detect the condition that the source codes of the function library are included, for example, in a certain C language project, the source codes of the open source library zlib are included, so that zlib can be detected theoretically. But cannot be detected if it is a referenced binary library because the C-language item only calls the zlib function and does not contain the same or similar function as the zlib item. And the scheme needs a large amount of data as a reserve to detect, so that the space consumption is large; the detection accuracy is low, because the problem of mutual reference of an open source function library exists, for example, an open source item A refers to an open source item B in a source code form, and when a certain C language item refers to the item A, the item A and the item B can be detected; the package manager identifies the component scheme, requires that the package management file exists in the C language item, and if not, the referenced function library cannot be detected. The package manager is not a popular solution in the C language project and therefore package management files are not always available for detection.
Disclosure of Invention
The main purpose of the embodiments of the present application is to provide a method, an apparatus, a device, and a storage medium for detecting a function library reference, which at least can solve the problems of lower detection accuracy and larger space consumption when detecting the referenced function library in the related art.
To achieve the above object, a first aspect of an embodiment of the present invention provides a method for detecting a function library reference, including:
acquiring first characteristic information of an item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function;
matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to a binary function library;
and determining an objective function library referenced by the item to be detected in the binary database according to the matching result.
A second aspect of an embodiment of the present application provides a function library reference detection apparatus, including:
the acquisition module is used for acquiring first characteristic information of the item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function;
the matching module is used for matching the first characteristic information with second header file information and a derived function which are contained in a preset database corresponding to the binary function library;
and the determining module is used for determining an objective function library referenced by the item to be detected in the binary database according to the matching result.
A third aspect of the embodiments of the present application provides an electronic device, including: the method comprises a memory and a processor, wherein the processor is used for executing a computer program stored on the memory, and when the processor executes the computer program, the steps in the method for detecting the reference of the function library provided by the first aspect of the embodiment of the application are realized.
A fourth aspect of the present embodiment provides a computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements each step in the function library reference detection method provided in the first aspect of the present embodiment.
From the above, according to the method, the device, the equipment and the storage medium for detecting the function library reference provided by the scheme of the application, the first characteristic information of the item to be detected is obtained; wherein the first characteristic information includes: first header file information and a bottommost function; matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to the binary function library; and determining an objective function library referenced by the item to be detected in the database according to the matching result. By implementing the scheme, the header file information and the bottommost function of the item to be detected are respectively matched with the header file information and the derived function in the preset database, and the objective function library referenced by the item to be detected can be determined according to the matching result, so that the accuracy of detection is effectively improved, the data volume required to be prepared is small, and the storage space is effectively saved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other drawings may be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a basic flow diagram of a method for detecting reference to a function library according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a refinement flow of a method for detecting reference to a function library according to a second embodiment of the present application;
fig. 3 is a schematic block diagram of a function library reference detection device according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In order to solve the problems of lower detection accuracy and larger space consumption in detecting a referenced function library in the related art, a first embodiment of the present application provides a function library reference detection method, which is applied to a C language project, as shown in fig. 1, which is a basic flow diagram of the function library reference detection method provided in the present embodiment, where the function library reference detection method includes the following steps:
step 101, obtaining first characteristic information of an item to be detected.
Specifically, in the present embodiment, the first feature information includes: first header information and a bottommost function. And obtaining a function library referenced in the item to be detected through the header file contained in the item to be detected and the bottommost function in the called functions.
In some implementations of this embodiment, before the step of obtaining the first feature information of the item to be detected, the method further includes: acquiring first header file names contained in all source code files in a project to be detected; carrying out hash operation on the first header file name to obtain a first hash value; carrying out hash operation on all nested sub-header file names in the first header file name to obtain a second hash value; first header information is generated based on the first header file name, the first hash value, and the second hash value.
Specifically, in this embodiment, the header information in the to-be-detected item includes a header name, a first hash value, and a second hash value. The second hash value is obtained by performing hash operation on all sub-header files contained in the header file name, and the sub-header files are obtained by deleting comments, spaces and the like. The sub-header is a header nested in the header, e.g., several lines of code starting in the header zlib.h:
#ifndef ZLIB_H
#define ZLIB_H
#include "zconf.h"
from this, the sub-header file zconf.h referenced by zlib.h can be derived. And the sub-header files also comprise respective sub-header files, all nested sub-header files are found out, and then useless information such as notes and spaces are deleted, and hash operation is performed to obtain a second hash value. Because these nested sub-header files are ultimately pasted into the source code for compilation as part of the source code, all of them need to be found and still be identifiable after some header files have been modified to a small extent, but typically the header files will be consistent with those provided by their corresponding libraries and will not be modified.
Further, in some implementations of the present embodiment, the step of obtaining the first header file names included in all the source code files in the item to be detected includes: reading a header file list in an item to be detected; wherein the header file list includes: header file name and header file path; and acquiring the first header file names contained in all the source code files in the item to be detected according to the header file list.
Specifically, in the present embodiment, the first header file name in a source code file, such as a c file, is found from the header file list. By obtaining the header file it contains from the. C source code, for example:
#include "zlib.h"
int main(void){}
in the code, the zlib.h is the header file contained in the c source code, and the real path of the zlib.h can be obtained by comparing the zlib.h with the header file list, so that the corresponding relation between the source code file and the header file can be obtained. It should be noted that when a.c. file references a function such as a zlib library in a C language project, then the.c. file contains zlib.h. The head file name contained in the step c can prevent the situation that a certain head file is actually unreferenced in the project, for example, the function of the zlib library is unreferenced in the source code of the project, but the zlib.h is generated in the project, so that all the head files actually used by all the source code files in the project can be found out after the step, and the situation that misjudgment is caused by that the head files are contained and the functions in the function library corresponding to the head files are not referenced in the source code is avoided. If there are cases where the header file cannot be found, there may be two reasons: firstly, the project itself is imperfect, and the header file is truly absent; secondly, the header files are outside the project, are generally header files in specific environments, such as GCC environment and MSVC environment, and are installed, and then the header files are carried by the environment, are generally header files of a system library, and refer to the system library, and the system library is not required to be detected under the general condition, so that the header files which cannot be found are ignored; if the system library referenced by the project needs to be analyzed, the head files and the corresponding libraries in the environments can be directly analyzed and put in storage; if it is not necessary, only the third party library of references needs to be parsed, these unseen header files may be ignored.
Therefore, the header files in the source code files are found through the header file list, the corresponding sub-header files are found through the header files, and the corresponding relationship between the source code files and the header files and the corresponding relationship between the header files and the sub-header files can be obtained, for example, the header file A references the header file B, the header file B references the header file C, then the relationship of A-B-C exists, at the moment, the relationship of F-A-B-C exists, and then the relationship of F-A-C exists, namely, the header files A, B, C are all referenced by the source code file F.
In other implementations of this embodiment, before the step of obtaining the first feature information of the item to be detected, the method further includes: respectively acquiring a function with code realization in a source code file of an item to be detected and a called sub-function; comparing the function with the code realization with the sub-function; a sub-function that is different from the function with the code implementation is determined as the lowest layer function.
Specifically, in this embodiment, the ctags tool may be used to find all functions in the c file, such as:
C:\Users\Seczone\Desktop\TEST\3>ctags -f - --kinds-C=f --fields=neK 1.c
main1.c/^int main( void )$/;"functionline:15 end:25
read_me1.c/^ void read_me( void )$/;"functionline:4 end:13
wherein, the leftmost main and read_me are the identified function names, the second column is the file name, the third column is the function header, the function of the fourth column is the label (function type) of the sentence type, the number after line is the beginning line of the function, and the number after end is the ending line of the function.
For a called sub-function, for example, in the following code:
#include "stdio.h"
int fun( void )
{
return 1;
}
int main( void )
{
int a = fun();
printf("a=%d\n",a );
return 0;
}
in the code, main functions have sub-function calls, namely fun and printf, wherein fun is implemented in the code, and printf is not implemented. And the fun function has no subfunction call. Therefore, in the source code file, only fun and printf are called sub-functions.
In addition, there are many methods for finding the called sub-functions, and the method using the regular expression in this embodiment may be selected to find the called sub-functions, for example:
C:\Users\Seczone\Desktop\TEST\3>type 2.c | grep -n -P -o "\b\w+\s*(?=\()"
2:fun
6:main
8:fun
9:printf
wherein, the numbers on the left of the colon of the second row to the fourth row are the row numbers of the function in the code, the right is the function name, and the result of the combination of the ctags is that:
C:\Users\Seczone\Desktop\TEST\3>ctags -f - --c-kinds=f --fields=neKSt "2.c"
fun 2.c /^int fun(void)$/;" function line:2 typeref:typename:int signature:(void) end:5
main2.c /^int main(void)$/;" function line:6 typeref:typename:int
signature:(void) end:11
wherein, the initial lines of fun and main are 2 and 6 respectively, so that two can be eliminated from the result of the regular expression, and the called sub-functions are:
8:fun
9:printf
the present embodiment can put the function names found and the called sub-functions in the item in a list as shown in Table 1 (fun 1, fun2, fun3 are functions assumed to exist in the item, and the sub-functions in the called sub-function columns are all the results of the detection of the source code files in the item)
TABLE 1
Figure SMS_1
Next, comparing the function with code implementation with the called sub-function, if the "called sub-function" exists in a column with the function with code implementation "and the function not exists in a column, the bottom-most function in the item is obtained. As can be seen from table 1, the bottom-most functions of the source code file are: printf, calloc, sqrt, compress, uncompress. These bottom-most functions are not implemented in the project, then their implementation comes from an external library of functions.
Step 102, matching the first characteristic information with second header information and a derived function contained in a preset database corresponding to the binary function library.
Specifically, in this embodiment, a database is preset, where the database includes related information of a binary function library, and feature information in an item to be detected only needs to be matched with header file information and an export function in the database. The embodiment is mainly aimed at the situation that a binary library is referenced, for example, the C language item references to the zlib library, and the binary library file (for example, zlib. Dll) of the zlib library is referenced, and the source code of the zlib is not contained. Therefore, the scheme and the scheme for detecting the similarity of the source codes have complementary effects.
In some implementations of the present embodiment, before the step of matching the first feature information with the second header information and the derived function included in the preset database corresponding to the binary function library, the method further includes: respectively acquiring a preset binary library file name, a function library name corresponding to the binary library file name, a derived function and a second header file name; the export function is a function interface externally provided by the binary library file; carrying out hash operation on the second header file name to obtain a third hash value; carrying out hash operation on all nested sub-header file names in the second header file name to obtain a fourth hash value; generating second header file information based on the second header file name, the third hash value and the fourth hash value; and generating a database based on the function library name, the derived function and the second header file information.
Specifically, in this embodiment, the processing procedure of the third hash value and the fourth hash value is consistent with the processing procedure of the first hash value and the second hash value in the item to be detected. The binary function library comprises header file information such as header file names, library file information such as library file names (such as zlib. Dll) in binary form, function library names corresponding to the library file names (such as zlib, but the function library names are not necessarily consistent with the library file names), and derived functions of the function library; the export function is a function interface provided by the binary library to the outside, namely, the C item is only allowed to use the functions, and the related tool can be used for acquiring the export function of the binary function library, for example, the export function name of the zlib.dll can be acquired by executing a command line 'dumpbin/exports zlib.dll' under windows, and the dumpbin is tool software of MSVC under linux; the embodiment optionally uses a dumpbin tool to obtain the derived functions of the function library, for example:
C:\Users\Seczone\Desktop>dumpbin /exports z1ib. d11
Microsoft (R) COFF/PE Dumper Version 14.30.30709.0
Copyright (C) Microsoft Corporation.All rights reserved.
Dump of file zlib.dll
File Type: DLL
Section contains the following exports for zlib.dll
00000000 characteristics
59CDB16A time date stamp Fri Sep 29 10:35:22 2017
0.00 version
1 ordinal base
78 number of functions
78 number of names
ordinal hint RVA name
1 0 00001000 adler32
2 1 00001280 adler32_combine
3 2 000012A0 adler32_combine64
4 3 00001380 compress
5 4 000013A0 compress2
6 5 00001440 compressBound
7 6 00001460 crc32
8 7 00001480 crc32_combine
it can be seen that the result contains a lot of redundant content, and the following commands (grep needs to be installed additionally) can be used to extract the derived function from the output content:
C:\Users\Seczone\Desktop>dumbin /exports z1ib.d11 | grep -P -o "\d+\s+[0-9A-Z]+\s+[0-9A-Z]+\s+\w+" | grep -P -o "(?<=[_a-zA-Z])\w*[a-z]\w*"
the derived function can thus be extracted from the content of the output:
adler32
adler32_combine
adler32_combine64
compress
compress2
compressBound
crc32
crc32_combine
the principle of this approach is to extract a string of a specific rule by a regular expression. The header files are provided by the function library, and the binary function library is downloaded together when the binary function library is downloaded, so that the binary function library can be obtained directly because the function library is very difficult to use without the header files.
Further, in some implementations of the present embodiment, the step of matching the first feature information with the second header information and the derived function included in the preset database corresponding to the binary function library includes: matching the first hash value in the first header information with the third hash value in the second header information; when the first hash value matches the third hash value, the lowest layer function is matched with the derived function.
Specifically, in this embodiment, when matching is performed, the first hash value in the first header file information and the third hash value in the second header file information are first matched, so that a referenced function library can be approximately determined, and then the bottommost function and the derived function are matched, that is, whether the function library is actually referenced is detected by using the bottom function. In addition, the database also contains the head file names corresponding to the function libraries, so that the head files which are not matched in the project can be queried in the libraries through the names of the head files, and a part of the function libraries can also be matched.
And step 103, determining an objective function library referenced by the item to be detected in the database according to the matching result.
Specifically, in this embodiment, the objective function library in the database corresponding to the binary function library referenced by the item to be detected may be obtained according to the matching result.
In some implementations of the present embodiment, the step of determining, according to the matching result, an objective function library referenced by the item to be detected in the database includes: when the first hash value is matched with the third hash value, determining that the item to be detected possibly references an objective function library corresponding to the third hash value; when the bottommost function is matched with the derived function, determining that the true reference of the item to be detected corresponds to the target function library of the third hash value.
Specifically, in this embodiment, the corresponding hash values are first matched, and the corresponding hash value information may be listed in a table, so as to facilitate matching, for example, as shown in table 2:
TABLE 2
Figure SMS_2
By matching the hash value hash, the zlib.h, zconf.h referenced in the item can be derived, possibly from the zlib library, and next matching between functions is performed in order to further confirm that the library is indeed a call. The source code files associated with zlib.h and zconf.h are found first, then the lowest layer function is found, the lowest layer function is matched with the derived function of the zlib library, and corresponding function information can be listed in a table so as to be convenient for matching, for example, as shown in table 3:
TABLE 3 Table 3
Figure SMS_3
It can be seen from table 3 that there are compacts and uncompressions in the bottommost function calls of the source code files corresponding to zlib.h and zconf.h, which are matched with the export functions of the zlib library, so that the zlib library is confirmed to be the function library referenced by the item, and the compacts and uncompressions functions of the function library are confirmed to be called. The matching pattern can also be accurate to the specifically referenced function. From the above description, it can be known that the technical solution of the embodiment of the present application is almost suitable for any C language item, because when the C language code references the function library, the header file must be included, so that the detection can be performed through the header file. And the data volume to be prepared is small, and the space consumption is small.
Based on the technical scheme of the embodiment of the application, first characteristic information of the item to be detected is obtained; wherein the first characteristic information includes: first header file information and a bottommost function; matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to the binary function library; and determining an objective function library referenced by the item to be detected in the database according to the matching result. By implementing the scheme, the header file information and the bottommost function of the item to be detected are respectively matched with the header file information and the derived function in the preset database, and the objective function library referenced by the item to be detected can be determined according to the matching result, so that the accuracy of detection is effectively improved, the data volume required to be prepared is small, and the storage space is effectively saved.
The method in fig. 2 is a refined function library reference detection method provided in the second embodiment of the present application, where the function library reference detection method includes:
step 201, obtaining first header file names contained in all source code files in the item to be detected according to the header file list.
Specifically, in this embodiment, the header file list includes: header file name and header file path; the first head file name in the source code file, such as the c file, is found according to the head file list, so that all the head files actually used by all the source code files in the project can be found after the step, and the situation that misjudgment is caused by that the head files are contained and the functions in the function library corresponding to the head files are not referenced in the source codes is avoided.
Step 202, generating first header file information based on the first header file name and the first hash value and the second hash value corresponding to the first header file name.
Specifically, in this embodiment, the header information in the to-be-detected item includes a header name, a first hash value, and a second hash value. The second hash value is obtained by performing hash operation on all sub-header files contained in the header file name, and the sub-header files are obtained after processing such as deleting notes and spaces.
And 203, determining a sub-function different from the function with the code implementation in the source code file as a bottom-layer function.
Specifically, in this embodiment, by comparing a function implemented with a code with a called sub-function, if there is a function in the called sub-function that is not in the function implemented with the code, the bottom-most function in the item is obtained. These bottom-most functions are not implemented in the project, then their implementation comes from an external library of functions.
Step 204, generating second header file information based on the second header file name in the preset binary library file and the third hash value and the fourth hash value corresponding to the second header file name.
Step 205, generating a database corresponding to the binary function library based on the function library name corresponding to the binary library file name, the derived function and the second header file information.
Specifically, in this embodiment, the processing procedure of the third hash value and the fourth hash value is consistent with the processing procedure of the first hash value and the second hash value in the item to be detected. The database comprises header file information such as header file names, library file information such as library file names in binary forms, function library names corresponding to the library file names and derived functions of the function library; the export function is a function interface provided by the binary library externally, that is, the C item is only allowed to use the functions, and the export function of the binary function library can be obtained by using a related tool, for example, the export function name of the zlib.dll can be obtained by executing a command line 'dumpbin/exports zlib.dll' under windows, and the export function name can be checked by 'nm-D libzlib.so' under linux.
Step 206, judging whether the first hash value in the first header file information is matched with the third hash value in the second header file information; if yes, go to step 207, if no, go to step 210.
Step 207, determining that the item to be detected may reference an objective function library corresponding to the third hash value.
Specifically, in this embodiment, the function library referenced by the item to be detected may be initially screened out by matching the first hash value with the third hash value, so as to determine that the item may reference the objective function library.
Step 208, matching the bottommost function with the derived function in the binary function library.
Specifically, in this embodiment, when matching is performed, the first hash value in the first header file information and the third hash value in the second header file information are first matched, so that a referenced function library can be approximately determined, and then the bottommost function and the derived function are matched, that is, whether the function library is actually referenced is detected by using the bottom function. In addition, the database also contains the head file names corresponding to the function libraries, so that the head files which are not matched in the project can be queried in the libraries through the names of the head files, and a part of the function libraries can also be matched.
Step 209, when the bottommost function matches the derived function, determining that the real reference of the item to be detected corresponds to the objective function library of the third hash value.
Specifically, in this embodiment, when there are bottommost function call compress and uncompress in the source code files corresponding to the header files, for example, zlib.h and zconf.h, and the bottommost function call compress and uncompress are matched with the export functions of the zlib library, then the zlib library is confirmed to be the function library referenced by the item, and the comporess and uncompress functions of the function library are confirmed to be called. The matching pattern can also be accurate to the specifically referenced function.
Step 210, determining that the item to be detected does not reference the objective function library corresponding to the third hash value.
Specifically, in this embodiment, when the first hash value in the first header information does not match the third hash value of the second header, it may be determined that the objective function library in the database corresponding to the third hash value is not referenced in the item.
It should be understood that, the sequence number of each step in this embodiment does not mean the order of execution of the steps, and the execution order of each step should be determined by its functions and internal logic, and should not be construed as a unique limitation on the implementation process of the embodiments of the present application.
Based on the technical scheme of the embodiment of the application, first characteristic information of the item to be detected is obtained; wherein the first characteristic information includes: first header file information and a bottommost function; matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to the binary function library; and determining an objective function library referenced by the item to be detected in the database according to the matching result. By implementing the scheme, the header file information and the bottommost function of the item to be detected are respectively matched with the header file information and the derived function in the preset database, and the objective function library referenced by the item to be detected can be determined according to the matching result, so that the accuracy of detection is effectively improved, the data volume required to be prepared is small, and the storage space is effectively saved.
Fig. 3 is a schematic diagram of a function library reference detection device according to a third embodiment of the present application, where the function library reference detection device may be applied to the foregoing function library reference detection method. As shown in fig. 3, the function library reference detection device mainly includes:
an acquiring module 301, configured to acquire first feature information of an item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function;
the matching module 302 is configured to match the first feature information with second header file information and an export function included in a preset database corresponding to the binary function library;
and the determining module 303 is used for determining an objective function library referenced by the item to be detected in the database according to the matching result.
In some implementations of the present embodiment, the function library reference detection apparatus further includes: the first generation module is used for acquiring first header file names contained in all source code files in the item to be detected; carrying out hash operation on the first header file name to obtain a first hash value; carrying out hash operation on all nested sub-header file names in the first header file name to obtain a second hash value; first header information is generated based on the first header file name, the first hash value, and the second hash value.
In some implementations of the present embodiment, the function library reference detection apparatus further includes: the reading module is used for reading a header file list in the item to be detected; wherein the header file list includes: header file name and header file path; and acquiring the first header file names contained in all the source code files in the item to be detected according to the header file list.
In some implementations of the present embodiment, the function library reference detection apparatus further includes: the comparison module is used for respectively acquiring a function realized by codes in a source code file of the item to be detected and a called sub-function; comparing the function with the code realization with the sub-function; a sub-function that is different from the function with the code implementation is determined as the lowest layer function.
In some implementations of the present embodiment, the function library reference detection apparatus further includes: the second generation module is used for carrying out hash operation on the second header file name to obtain a third hash value; carrying out hash operation on all nested sub-header file names in the second header file name to obtain a fourth hash value; generating second header file information based on the second header file name, the third hash value and the fourth hash value; and generating a database based on the function library name, the derived function and the second header file information.
In some implementations of this embodiment, the matching module is specifically configured to: matching the first hash value in the first header information with the third hash value in the second header information; when the first hash value matches the third hash value, the lowest layer function is matched with the derived function.
In some implementations of this embodiment, the determining module is specifically configured to: when the first hash value is matched with the third hash value, determining that the item to be detected possibly references an objective function library corresponding to the third hash value; when the bottommost function is matched with the derived function, determining that the true reference of the item to be detected corresponds to the target function library of the third hash value.
According to the function library reference detection device provided by the embodiment, first characteristic information of an item to be detected is obtained; wherein the first characteristic information includes: first header file information and a bottommost function; matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to the binary function library; and determining an objective function library referenced by the item to be detected in the database according to the matching result. By implementing the scheme, the header file information and the bottommost function of the item to be detected are respectively matched with the header file information and the derived function in the preset database, and the objective function library referenced by the item to be detected can be determined according to the matching result, so that the accuracy of detection is effectively improved, the data volume required to be prepared is small, and the storage space is effectively saved.
In addition, in order to better understand the scheme of the application, the supplementary points are described herein, and for function libraries referred to by the C language project, most of function libraries exist in the form of dynamic libraries at present, even though static libraries (such as. A libraries and. Lib libraries) which look like links are actually linked with import libraries (the suffixes of the import libraries and the static libraries are consistent), and the import libraries exist for convenience of linking the dynamic libraries.
The dynamic library suffix in Linux is. So, and the dynamic library suffix in Windows is. DLL (lowercase of DLL). Not all functions in the dynamic library can be called, only the derived functions can be called by the item.
Because the derived function has only a name and no description of the number, type, meaning and the like of parameters, the use of the derived function is troublesome under the condition of no header file, the description of the function needs to be referred to independently, then a function pointer of the type is formulated, then the dynamic library is loaded into a memory through a function given by a system, the address of the function needing to be called is found, and then the function derived from the dynamic library is accessed through the function pointer.
The trouble can be avoided through the header file, and the export function of the dynamic library is declared in the header file, so that the export function can be directly called after the header file is introduced into the source code. The compiler, at compile time, finds that the function call has no source code implementation in the project, and therefore recognizes this as a function of the referenced function library, thus recording the function call into the import table in the file. When the linker finally generates the source code item into an executable program (such as. Exe), a corresponding dynamic library is found through the import table, and the export function (if the export function is not a default linked function library, the position of the linker function library needs to be actively told) of the export function is checked, and the export function and the import table function are corresponding and associated. After the program is run, the associated dynamic library is loaded into the memory together with the program, and the function call to the dynamic library can be normally executed because the export function of the dynamic library is already bound with the import table of the program.
Fig. 4 is an electronic device according to a fourth embodiment of the present application, where the electronic device may be configured to implement the function library reference detection method in the foregoing embodiment, and the method mainly includes:
memory 401, processor 402, and computer program 403 stored on memory 401 and executable on processor 402, memory 401 and processor 402 being connected by communication. The processor 402, when executing the computer program 403, implements the method of the first or second embodiment. Wherein the number of processors may be one or more.
The memory 401 may be a high-speed random access memory (RAM, random Access Memory) memory or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 401 is used for storing executable program codes, and the processor 402 is coupled with the memory 401.
Further, the embodiment of the application further provides a computer readable storage medium, which may be provided in the electronic device, and the computer readable storage medium may be a memory in the embodiment shown in fig. 4.
The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the function library reference detection method of the foregoing embodiment. Further, the computer-readable medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk, etc. which may store the program code.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a readable storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned readable storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The foregoing describes the method, apparatus, device and storage medium for detecting a function library reference provided in the present application, and those skilled in the art may, according to the ideas of the embodiments of the present application, change the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the present application.

Claims (10)

1. A method for detecting a reference to a function library, comprising:
acquiring first characteristic information of an item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function;
matching the first characteristic information with second header file information and a derived function contained in a preset database corresponding to a binary function library;
and determining an objective function library referenced by the item to be detected in the database according to the matching result.
2. The method for detecting reference to a function library according to claim 1, further comprising, before the step of obtaining the first feature information of the item to be detected:
acquiring first header file names contained in all source code files in the item to be detected;
performing hash operation on the first header file name to obtain a first hash value;
performing hash operation on all nested sub-header file names in the first header file name to obtain a second hash value;
the first header file information is generated based on the first header file name, the first hash value, and the second hash value.
3. The method for detecting function library reference according to claim 2, wherein the step of obtaining the first header file names contained in all source code files in the item to be detected comprises:
reading a header file list in the item to be detected; wherein the header file list includes: header file name and header file path;
and acquiring the first header file names contained in all source code files in the item to be detected according to the header file list.
4. The method for detecting reference to a function library according to claim 1, further comprising, before the step of obtaining the first feature information of the item to be detected:
respectively acquiring a function with code realization in a source code file of the item to be detected and a called sub-function;
comparing the function with the code realization with the sub-function;
the sub-function that is different from the function of the code implementation is determined to be the lowest layer function.
5. The method for detecting function library reference according to claim 1, wherein before the step of matching the first characteristic information with second header information and an export function included in a database corresponding to a binary function library, the method further comprises:
respectively acquiring a preset binary library file name, a function library name corresponding to the binary library file name, a derived function and a second header file name; the export function is a function interface externally provided by the binary library file;
carrying out hash operation on the second header file name to obtain a third hash value;
carrying out hash operation on all nested sub-header file names in the second header file name to obtain a fourth hash value;
generating the second header file information based on the second header file name, the third hash value, and the fourth hash value;
and generating the database based on the function library name, the derived function and the second header file information.
6. The method for detecting function library reference according to any one of claims 1 to 5, wherein the step of matching the first characteristic information with second header information and an export function included in a preset database corresponding to a binary function library includes:
matching a first hash value in the first header information with a third hash value in the second header information;
and when the first hash value is matched with the third hash value, matching the bottommost function with the derived function.
7. The method according to claim 6, wherein the step of determining the objective function library referenced by the item to be detected in the database according to the matching result comprises:
determining that the item to be detected may refer to an objective function library corresponding to the third hash value when the first hash value matches the third hash value;
and when the bottommost function is matched with the derivative function, determining that the real reference of the item to be detected corresponds to the target function library of the third hash value.
8. A function library reference detection apparatus, comprising:
the acquisition module is used for acquiring first characteristic information of the item to be detected; wherein the first characteristic information includes: first header file information and a bottommost function;
the matching module is used for matching the first characteristic information with second header file information and a derived function which are contained in a preset database corresponding to the binary function library;
and the determining module is used for determining an objective function library referenced by the item to be detected in the database according to the matching result.
9. An electronic device comprising a memory and a processor, wherein:
the processor is used for executing the computer program stored on the memory;
the processor, when executing the computer program, implements the steps of the function library reference detection method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the function library reference detection method of any of claims 1 to 7.
CN202310211634.0A 2023-03-07 2023-03-07 Function library reference detection method, device, equipment and readable storage medium Active CN116069338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310211634.0A CN116069338B (en) 2023-03-07 2023-03-07 Function library reference detection method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310211634.0A CN116069338B (en) 2023-03-07 2023-03-07 Function library reference detection method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116069338A true CN116069338A (en) 2023-05-05
CN116069338B CN116069338B (en) 2023-08-11

Family

ID=86173361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310211634.0A Active CN116069338B (en) 2023-03-07 2023-03-07 Function library reference detection method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116069338B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643893A (en) * 2016-07-22 2018-01-30 腾讯科技(深圳)有限公司 A kind of program detecting method and device
CN110286934A (en) * 2019-06-30 2019-09-27 潍柴动力股份有限公司 A kind of inspection method and device of static code
US20200065074A1 (en) * 2018-08-27 2020-02-27 Georgia Tech Research Corporation Devices, systems, and methods of program identification, isolation, and profile attachment
CN112148392A (en) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 Function call chain acquisition method and device and storage medium
CN113553301A (en) * 2021-06-24 2021-10-26 网易(杭州)网络有限公司 Header file processing method and device, computer readable storage medium and processor
CN114968247A (en) * 2021-02-22 2022-08-30 阿里巴巴集团控股有限公司 Pre-compilation method, apparatus and computer program product

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643893A (en) * 2016-07-22 2018-01-30 腾讯科技(深圳)有限公司 A kind of program detecting method and device
US20200065074A1 (en) * 2018-08-27 2020-02-27 Georgia Tech Research Corporation Devices, systems, and methods of program identification, isolation, and profile attachment
CN112148392A (en) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 Function call chain acquisition method and device and storage medium
CN110286934A (en) * 2019-06-30 2019-09-27 潍柴动力股份有限公司 A kind of inspection method and device of static code
CN114968247A (en) * 2021-02-22 2022-08-30 阿里巴巴集团控股有限公司 Pre-compilation method, apparatus and computer program product
CN113553301A (en) * 2021-06-24 2021-10-26 网易(杭州)网络有限公司 Header file processing method and device, computer readable storage medium and processor

Also Published As

Publication number Publication date
CN116069338B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
US9367304B2 (en) Integrated development environment-based repository searching in a networked computing environment
US7493596B2 (en) Method, system and program product for determining java software code plagiarism and infringement
KR101143027B1 (en) Self-describing software image update components
US5761510A (en) Method for error identification in a program interface
US6370549B1 (en) Apparatus and method for searching for a file
US8020156B2 (en) Bulk loading system and method
EP1672526A2 (en) File formats, methods, and computer program products for representing documents
US20070174307A1 (en) Graphic object themes
KR20050077422A (en) Auto version managing system and method for software
FR2824160A1 (en) DYNAMICALLY CONFIGURABLE GENERIC CONTAINER
US10649744B1 (en) Systems and methods for handling renaming of programming constructs in programming environments
Mikus et al. An analysis of disc carving techniques
US10203953B2 (en) Identification of duplicate function implementations
EP1828941B1 (en) Device for processing formally defined data
CN113434582B (en) Service data processing method and device, computer equipment and storage medium
CN116069338B (en) Function library reference detection method, device, equipment and readable storage medium
CN114969762A (en) Vulnerability information processing method, service device and vulnerability detection module
CN116700629B (en) Data processing method and device
CN111752549A (en) SQL function generation method and device
CN115495421A (en) Data formatting method, device, electronic equipment and medium
CN116401147B (en) Function library reference version detection method, equipment and storage medium
CN113536316A (en) Detection method and device for component dependence information
CN112445468A (en) Typescript type file generation method, device, equipment and computer readable storage medium
CN112632548A (en) Malicious android program detection method and device, electronic device and storage medium
CN116955209B (en) WebAsssembly virtual machine testing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant