CN106469049B - File scanning method and device - Google Patents

File scanning method and device Download PDF

Info

Publication number
CN106469049B
CN106469049B CN201510511052.XA CN201510511052A CN106469049B CN 106469049 B CN106469049 B CN 106469049B CN 201510511052 A CN201510511052 A CN 201510511052A CN 106469049 B CN106469049 B CN 106469049B
Authority
CN
China
Prior art keywords
file
dependent
symbol table
main
dependent file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510511052.XA
Other languages
Chinese (zh)
Other versions
CN106469049A (en
Inventor
张蓓
梅维一
严明
魏学峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510511052.XA priority Critical patent/CN106469049B/en
Publication of CN106469049A publication Critical patent/CN106469049A/en
Application granted granted Critical
Publication of CN106469049B publication Critical patent/CN106469049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for scanning files, which comprises the following steps: symbolizing a main file to generate a main file symbol table; according to a global file dependency relationship, symbolizing a dependent file on which the main file depends to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file; establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table; and scanning the unified symbol table and outputting a scanning result. The file scanning method provided by the embodiment of the invention can scan the unified symbol table of all the files with the dependency relationship when scanning the files through the file dependency relationship, thereby effectively avoiding false alarm and missing report.

Description

File scanning method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for scanning files.
Background
The static code analyzer not based on compiling has the advantages of high efficiency, independence on compiling environment and light weight. The main flow of the existing static analysis tool (such as cppcheck) which is not based on compilation is as follows: acquiring scanning path configuration, preprocessing each source file, performing symbolization processing on the preprocessed character strings, generating a symbol table, storing the symbol table, and checking the generated symbol table.
The mechanism is linear single file processing when preprocessing, symbolizing and generating a symbol table. Therefore, the cross-file scene is difficult to process, and more false reports and missing reports are caused.
Preprocessing is not controllable, and because static code analysis does not depend on compiling, an include path configured in a compiler or an environment variable cannot be obtained, so that a user is required to manually configure a directory of a dependent file. If the user does not have configuration or the configuration is not complete, the header file does not contain integrity and the macro definition cannot be found, so that the macro cannot be correctly expanded or the variable definition cannot be found, and the like, thereby causing more false reports and false reports.
Disclosure of Invention
The embodiment of the invention provides a file scanning method which can avoid false alarm and false alarm. The embodiment of the invention also provides a corresponding device.
The first aspect of the present invention provides a method for scanning a file, including:
symbolizing a main file to generate a main file symbol table;
according to a global file dependency relationship, symbolizing a dependent file on which the main file depends to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file;
establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table;
and scanning the unified symbol table and outputting a scanning result.
A second aspect of the present invention provides an apparatus for scanning a document, comprising:
the first processing unit is used for symbolizing the main file and generating a main file symbol table;
the second processing unit is used for symbolizing the dependent file on which the main file depends according to a global file dependency relationship, and generating a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependency statement between the main file and the dependent file;
the link establishing unit is used for establishing a link relation for the symbol table of the main file obtained by the processing of the first processing unit and the symbol table of the dependent file obtained by the processing of the second processing unit to obtain a unified symbol table;
a scanning unit, configured to scan the unified symbol table established by the link establishing unit;
and the output unit is used for outputting the scanning result of the scanning unit for scanning the uniform symbol.
The embodiment of the invention adopts a symbolized main file to generate a main file symbol table; according to a global file dependency relationship, symbolizing a dependent file on which the main file depends to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file; establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table; and scanning the unified symbol table and outputting a scanning result. Compared with the prior art that false reporting and missing reporting are more during file scanning of cross-file scenes, the file scanning method provided by the embodiment of the invention can scan the unified symbol tables of all files with dependency relationships during file scanning through the file dependency relationships, thereby effectively avoiding false reporting and missing reporting.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram of an embodiment of a method for scanning a document according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an example of an embodiment of the present invention;
FIG. 3 is another schematic diagram of an example of an embodiment of the invention;
FIG. 4 is another schematic illustration of an example in an embodiment of the invention;
FIG. 5 is another schematic illustration of an example in an embodiment of the invention;
FIG. 6 is another schematic illustration of an example of an embodiment of the invention;
FIG. 7 is another schematic illustration of an example in an embodiment of the invention;
FIG. 8 is another schematic illustration of an example in an embodiment of the invention;
FIG. 9 is a schematic diagram of another embodiment of a method for document scanning according to an embodiment of the present invention;
FIG. 10 is a diagram of an embodiment of a document scanning apparatus according to the embodiment of the invention;
FIG. 11 is a schematic diagram of another embodiment of a document scanning device in an embodiment of the invention;
FIG. 12 is a schematic diagram of another embodiment of a document scanning device in an embodiment of the present invention;
FIG. 13 is a schematic diagram of another embodiment of a document scanning apparatus according to an embodiment of the present invention;
FIG. 14 is a schematic diagram of another embodiment of a document scanning device in an embodiment of the invention;
fig. 15 is a schematic diagram of another embodiment of the document scanning device in the embodiment of the invention.
Detailed Description
The embodiment of the invention provides a file scanning method which can avoid false alarm and false alarm. The embodiment of the invention also provides a corresponding device. The following are detailed below.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of a method for scanning a file according to the present invention includes:
101. and symbolizing the main file to generate a main file symbol table.
Files in embodiments of the invention refer to source files, not other types of files
The source file is the result of saving the code written in the assembly language or the high-level language as a file.
In the embodiment of the present invention, symbolizing a main file refers to a process of symbolizing the main file to generate a symbol table corresponding to the main file, and the process of symbolizing the file belongs to the prior art.
The main file is relative to the dependent file in the embodiment of the present invention, and as shown in fig. 2, the file1 can be understood as the main file. The file2, file3, and other files shown in FIG. 2 may all be understood to be dependent files of file 1.
102. And according to a global file dependency relationship, symbolizing the dependent file depended by the main file to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file.
In the embodiment of the present invention, the dependent file depended on by the main file is symbolized, and the generation of the symbol table of the dependent file may be understood as symbolizing the file2, the file3, and other files shown in fig. 2, referring to fig. 2.
In the embodiment of the present invention, the timing sequence of step 101 and step 102 is not limited, and any preceding execution may be performed, or may be performed simultaneously.
103. And establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table.
The symbol table of the main file and the symbol table of the dependent file are connected together through a link relation, so that cross-file scanning can be realized during scanning, and the method is not limited to scanning the symbol table of the single file.
104. And scanning the unified symbol table and outputting a scanning result.
As shown in fig. 3, the BFunc function of b.cpp is called in a.cpp, and the constant i value of 5 is used as a parameter of the BFunc function.
The BFunc function implements access to array a with parameter i, the array size is 3, and when i exceeds the upper limit of the array, overflow will result.
If cppcheck scans b.cpp alone, no error will be reported, since the value of the incoming parameter i is not known to be 5, resulting in a false negative. In the embodiment of the invention, the link is established between the A.cpp and the B.cpp, the A.cpp and the B.cpp are scanned in sequence, the report missing is avoided, and the scanning result with the error when the output value i is 5 is output.
The embodiment of the invention adopts a symbolized main file to generate a main file symbol table; according to a global file dependency relationship, symbolizing a dependent file on which the main file depends to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file; establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table; and scanning the unified symbol table and outputting a scanning result. Compared with the prior art that false reporting and missing reporting are more during file scanning of cross-file scenes, the file scanning method provided by the embodiment of the invention can scan the unified symbol tables of all files with dependency relationships during file scanning through the file dependency relationships, thereby effectively avoiding false reporting and missing reporting.
Optionally, on the basis of the embodiment corresponding to fig. 1, in a first optional embodiment of the method for scanning a file according to the embodiment of the present invention, before symbolizing the dependent file that the main file depends on according to the global file dependency relationship, the method may further include:
traversing a global file, analyzing sentences marked with dependency in the code content of the global file, and establishing the dependency relationship of the global file, wherein the global file comprises the main file and the dependency file.
In the embodiment of the present invention, if there is a dependency relationship between files, for example: and the global file dependency relationship is established by identifying dependent sentences to generate a global dependency table.
Optionally, on the basis of the first optional embodiment corresponding to fig. 1, in a second optional embodiment of the file scanning method provided in the embodiment of the present invention, the traversing a global file, analyzing a dependent statement identified in code content of the global file, and establishing the global file dependency relationship, where the global file includes the main file and the dependent file, and may include:
acquiring a dependent file name depended by the main file from a header file of the main file;
according to the dependent file name, searching dependent file information corresponding to the dependent file name from the established corresponding relation between the file name and the file information;
and when the dependent file information corresponding to the dependent file name is found, adding the dependent file corresponding to the dependent file name into the dependency relationship of the main files, and establishing the global file dependency relationship according to the dependency relationship of each main file.
In the embodiment of the invention, all code files under the project folder to be scanned are traversed in a wide range, as shown in fig. 4, h, c, cpp and other source code files in fig. 4 are traversed.
The file information is recorded in a file basic data structure, each node is a file or a directory, and the nodes are linked through addresses to form a tree structure as shown in fig. 2, so as to record a global file directory structure.
Meanwhile, a piece of mapping map data is stored, key is the unique file identifier, value is the file information data, and when file information is searched, the file information can be quickly indexed through the unique identifier.
When the dependency relationship is established, each main file in the file tree structure can be traversed, the content of the main file is opened, the header information (# include information) in the main file is read, and the file name depended by the main file is extracted. In order to restore the compiling process to the maximum extent, the file names are acquired strictly according to the sequence of # include.
Searching the file information corresponding to the file name in the map, and adopting the following strategies to make a decision because the # include writing method is various and the dependent file is not necessarily in the project directory to be scanned (as shown in fig. 5):
if the only file is found through the file name, directly adding the dependent file into the dependency relationship list dependenceFileList of the main file, and ending;
if a plurality of files are found simultaneously through the file names and the # include has a directory path, matching the path to an upper layer, and if the path is matched to a unique file, directly adding the dependent file into the main file DependenceFileList, and ending; if the path is matched with a plurality of files or a plurality of files are found through file names, and the # include does not have a directory path, calculating the distance between the main file and the plurality of matched files on the directory structure, and if the file with the minimum path is obtained, directly adding the dependent file into a dependency relationship list dependenceFileList of the main file, and ending; and if a plurality of files with the same distance exist, selecting the file with the front # include sequence, adding the dependenceFileList, and ending.
Although the above strategy theoretically has the risk of finding wrong file dependence, the accuracy rate of finding file dependence by the above strategy reaches 90% through a plurality of item data verifications, and the reliability of finding dependent files by using a path matching and shortest path algorithm is proved again.
Optionally, on the basis of the embodiment corresponding to fig. 1, the first optional embodiment, or the second optional embodiment, in a third optional embodiment of the method for scanning a file according to the embodiment of the present invention, the symbolizing the dependent file depended by the main file according to the global file dependency relationship to generate the symbol table of the dependent file may include:
according to the global file dependency relationship, whether the dependent file depended on by the main file is signed or not is confirmed;
when the dependent file is not symbolized, symbolizing the dependent file, generating a symbol table of the dependent file, and caching the symbol table of the dependent file;
and when the dependent file is symbolized, acquiring a symbol table of the dependent file from the cache.
In the embodiment of the present invention, since the same file may be depended on by other files, when scanning the files according to the conventional flow, the file that is depended on is repeatedly symbolized for many times, which causes unnecessary performance and time loss. Therefore, the following symbolized caching strategy is adopted, and the efficiency is improved. Counting the number of times each dependent file is dependent according to the dependency relationship table, taking the dependent file X as an example, as shown in fig. 6:
112. when the dependent file X needs to be tokenized, the dependent counter dCnt of the dependent file X is checked.
The scan begins and the dependency counters for all dependent files are initialized to a value of-1.
113. When dCnt is equal to-1, it means that the symbol has not been formed, and the symbol is formed into the memory.
114. The dependency counter value of X is modified to the number of times it is depended upon.
115. dCnt! -1 indicates that the dependency file X has been signed.
116. And obtaining the symbol table depending on the file X from the cache.
After the dependent file X is symbolized, the symbol table of the dependent file X can be further managed, which can be understood with reference to fig. 7:
201. the X dependent main file scan is complete.
202. The number of times X depends on the counter is decremented by 1, i.e., dCnt-1.
203. The dependency counter value dCnt for X at this time is checked.
204. dCnt ═ 0, which indicates that X is no longer being used, step 205 is performed.
205. And deleting the symbolized cache data of the X.
206. dCnt >0, indicating that X may still be used, step 207 is performed.
207. And reserving the symbolized data of the X to a buffer.
The scheme provided by the embodiment of the invention can achieve balance and completeness of dependent call and effective utilization of the memory.
The caching strategy provided by the embodiment of the invention can avoid that header files which are repeatedly contained are symbolized for many times, caching is carried out after the first symbolization, and only calling from the cache is needed when the header files are used again subsequently, thereby avoiding repeated symbolization and improving scanning efficiency.
Optionally, on the basis of the embodiment corresponding to fig. 1 and the first or second optional embodiment, in a fourth optional embodiment of the method for scanning a file according to the embodiment of the present invention, the establishing a link relationship for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table may include:
and establishing a link relation of the characteristic parameters in the main file symbol table and the symbol table of the dependent file according to the unique variable identification in the main file symbol table and the symbol table of the dependent file to obtain a unified symbol table, wherein the unique variable identification is used for identifying the same characteristic parameter in the main file symbol table and the symbol table of the dependent file.
In the embodiment of the present invention, the link relationship between the characteristic parameters in the symbol table of the main file and the symbol table of the dependent file may be established by unique variable identifiers, as shown in fig. 8, and the dependency relationship between the dependent file player.h and the main file player.cpp is established by unique variable ids.
The global variable TSCDdependency is that varBase records the starting position of the variable id, and the varBase continuously increases with the increase of the variable, so that the id of each variable is unique. For example, variables a and b are defined in the A.cpp, variables c and d are defined in the B.cpp, varbase is 0, the A is signed first, the id of a is 1, the id of b is 2, and the increment of varbase is 2; following symbolization of B, c has id 3, d has id 4, varbse increments to 4, and so on.
The member variables are linked with the global variables, and because the declaration and the use of the local variables are in the same file, the local variables are symbolized together, and the linking is not needed after the symbolization. While declarations and uses of member variables and global variables may exist in different files, it is necessary to link the member variables and global variables in the master file with corresponding declarations in the dependent file after symbolization. The information of the declared class and member variable in each file needs to be recorded additionally in the symbolization process, so that when a member m is checked in the main file a.cpp, whether a corresponding class and member exist can be firstly checked from the record of the dependent file A.h, and if matching is possible, the variable id of m in the a.cpp can be set to the variable id of A.h when m is declared.
Optionally, on the basis of the embodiment corresponding to fig. 1 and the first or second optional embodiment, in a fifth optional embodiment of the method for scanning a file according to the embodiment of the present invention, when symbolizing the main file and the dependent file, the method may further include:
recording the function declarations in the main file and the dependent file in the same function declaration list;
the establishing a link relationship for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table may include:
and establishing a mapping relation between the function of the main file and the function statement of the dependent file according to the same function statement list to obtain a unified symbol table.
In the embodiment of the present invention, the process of function linking may be understood as: the symbolization process records the function declarations in the main file and all dependent files in the same list. Traversing the main file after symbolization, when finding the called function fun (a) in the main file, traversing the function statement list corresponding to the main file, finding the corresponding function statement through parameter matching, and establishing the mapping relation between the function call and the function statement.
Referring to fig. 9, another embodiment of the method for scanning a file according to the embodiment of the present invention includes:
301. automatically analyzing the code to establish global dependencies.
Traversing codes of the whole project engineering in advance, quickly analyzing sentences marked with dependence in code contents, establishing a global dependence relationship, and generating a global dependence table.
302. Traversing and scanning each main file, and executing the following steps on each file;
303. and according to the global dependency relationship, symbolizing the file traversal depended by the main file to generate a symbol table independent of the dependent file, and directly acquiring the symbol table from the cache if the symbol table of the file is cached.
304. Symbolize the main file.
305. And analyzing the reference relations of variables, macros and the like between independent symbol tables of the dependent files, establishing symbol links and forming a unified symbol table.
306. The symbol table is cached.
According to a caching strategy, caching a plurality of independent symbol tables for later processes, releasing unneeded symbol tables and reducing the memory occupation;
307. and scanning the unified symbol table to check defects.
308. And outputting the scanning result.
The specific process of steps 301-308 can be understood by referring to fig. 1 and the related description only in fig. 8, which are not described herein in detail.
Referring to fig. 10, an embodiment of the apparatus 40 for document scanning according to the present invention includes:
a first processing unit 401, configured to symbolize a main file, and generate a main file symbol table;
a second processing unit 402, configured to perform symbolization on a dependent file on which the main file depends according to a global file dependency relationship, so as to generate a symbol table of the dependent file, where the global file dependency relationship is a relationship established according to a dependency statement between the main file and the dependent file;
a link establishing unit 403, configured to establish a link relationship for the symbol table of the main file obtained by processing by the first processing unit 401 and the symbol table of the dependent file obtained by processing by the second processing unit 402, so as to obtain a unified symbol table;
a scanning unit 404, configured to scan the unified symbol table established by the link establishing unit 403;
an output unit 405, configured to output a scanning result of the scanning unit scanning the unicode.
In the embodiment of the present invention, the first processing unit 401 symbolizes a main file, and generates a main file symbol table; the second processing unit 402 symbolizes the dependent file on which the main file depends according to a global file dependency relationship, which is a relationship established according to a dependency statement between the main file and the dependent file, to generate a symbol table of the dependent file; a link establishing unit 403 establishes a link relationship for the symbol table of the main file obtained by processing by the first processing unit 401 and the symbol table of the dependent file obtained by processing by the second processing unit 402 to obtain a unified symbol table; the scanning unit 404 scans the unified symbol table established by the link establishment unit 403; the output unit 405 outputs a scan result of the scan unit scanning the unicode. Compared with the prior art that false alarm and missing alarm are more during file scanning of a cross-file scene, the file scanning device provided by the embodiment of the invention can scan the unified symbol tables of all files with dependency relationships during file scanning through the file dependency relationships, thereby effectively avoiding false alarm and missing alarm.
Optionally, on the basis of the embodiment corresponding to fig. 10 and referring to fig. 11, in a first optional embodiment of the apparatus 40 for scanning a document according to the embodiment of the present invention, the apparatus 40 further includes:
a dependency relationship establishing unit 406, configured to traverse a global file before the second processing unit 402 symbolizes the dependency file, analyze a statement identifying a dependency in code content of the global file, and establish a global file dependency relationship, where the global file includes the main file and the dependency file.
Optionally, on the basis of the embodiment corresponding to fig. 11, referring to fig. 12, in a second optional embodiment of the apparatus 40 for scanning a file according to the embodiment of the present invention, the dependency relationship establishing unit 406 includes:
an obtaining sub-unit 4061, configured to obtain, from the header file of the main file, a dependent file name on which the main file depends;
a searching subunit 4062, configured to search, according to the dependent file name acquired by the acquiring subunit 4061, dependent file information corresponding to the dependent file name from the established correspondence between the file name and the file information;
a creating sub-unit 4063, configured to, when the searching sub-unit 4062 finds the dependent file information corresponding to the dependent file name, add the dependent file corresponding to the dependent file name into the dependency relationship of the main file, and create the global file dependency relationship according to the dependency relationship of each main file.
Optionally, on the basis of the embodiment corresponding to fig. 10 and referring to fig. 13, in a third optional embodiment of the apparatus 40 for scanning a file according to the embodiment of the present invention, the second processing unit 402 includes:
a confirming subunit 4021, configured to confirm, according to the global file dependency relationship, whether the dependent file on which the main file depends is already signed;
a generating subunit 4022, configured to symbolize the dependent file and generate a symbol table of the dependent file when the confirming subunit 4021 confirms that the dependent file is not symbolized;
a caching subunit 4023, configured to cache the symbol table of the dependent file generated by the generating subunit 4022;
an obtaining subunit 4024, configured to obtain, when the determining subunit 4021 determines that the dependent file is signed, a symbol table of the dependent file from the cache.
Alternatively, on the basis of the embodiment corresponding to fig. 10, the first or the second alternative embodiment, in a fourth alternative embodiment of the document scanning device 40 provided by the embodiment of the present invention,
the link establishing unit 403 is configured to establish a link relationship between the characteristic parameters in the symbol table of the main file and the symbol table of the dependent file according to the unique variable identifier in the symbol table of the main file and the symbol table of the dependent file, so as to obtain a unified symbol table, where the unique variable identifier is used to identify the same characteristic parameter in the symbol table of the main file and the symbol table of the dependent file.
Alternatively, on the basis of the embodiment corresponding to fig. 10, the first alternative embodiment, or the second alternative embodiment, referring to fig. 14, in a fifth alternative embodiment of the apparatus 40 for scanning a document according to the embodiment of the present invention, the apparatus further includes: the recording unit 407 is a unit for recording,
the recording unit 407 is configured to record the function declarations in the main file and the dependent file in the same function declaration list;
the link establishing unit 403 is configured to establish a mapping relationship between the function of the main file and the function declaration of the dependent file according to the same function declaration list recorded by the recording unit 407, so as to obtain a unified symbol table.
The document scanning apparatus 40 provided in the embodiment of fig. 10 and fig. 14 of the present invention can be understood with reference to the related descriptions in fig. 1 to fig. 9, and will not be described in detail herein.
Fig. 15 is a schematic structural diagram of a document scanning apparatus 40 according to an embodiment of the present invention. The apparatus 40 for document scanning may include an input device 410, an output device 440, a processor 430, and a memory 440.
Memory 440 may include both read-only memory and random-access memory, and provides instructions and data to processor 430. A portion of memory 440 may also include non-volatile random access memory (NVRAM).
Memory 440 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:
and (3) operating instructions: including various operational instructions for performing various operations.
Operating the system: including various system programs for implementing various basic services and for handling hardware-based tasks.
In the embodiment of the present invention, the processor 430, by calling the operation instruction stored in the memory 440 (the operation instruction may be stored in the operating system), performs the following operations:
symbolizing a main file to generate a main file symbol table;
according to a global file dependency relationship, symbolizing a dependent file on which the main file depends to generate a symbol table of the dependent file, wherein the global file dependency relationship is a relationship established according to a dependent statement between the main file and the dependent file;
establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table;
the unified symbol table is scanned and the scanning result is output through the output device 420.
Compared with the prior art that false alarm and missing alarm are more during file scanning of a cross-file scene, the file scanning device provided by the embodiment of the invention can scan the unified symbol tables of all files with dependency relationships during file scanning through the file dependency relationships, thereby effectively avoiding false alarm and missing alarm.
Processor 430 controls the operation of device 40 for scanning files, and processor 430 may also be referred to as a Central Processing Unit (CPU). Memory 440 may include both read-only memory and random-access memory, and provides instructions and data to processor 430. A portion of memory 440 may also include non-volatile random access memory (NVRAM). In a particular application, the various components of the apparatus 40 for scanning a document are coupled together by a bus system 450, wherein the bus system 450 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, however, the various buses are designated in the figure as the bus system 450.
The method disclosed in the above embodiments of the present invention may be applied to the processor 430, or implemented by the processor 430. Processor 430 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 430. The processor 430 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 440, and the processor 430 reads the information in the memory 440 and performs the steps of the above method in combination with the hardware thereof.
Optionally, the processor 430 is configured to traverse a global file, analyze a statement identifying a dependency in code content of the global file, and establish a global file dependency relationship, where the global file includes the main file and the dependency file.
Optionally, the processor 430 is configured to:
acquiring a dependent file name depended by the main file from a header file of the main file;
according to the dependent file name, searching dependent file information corresponding to the dependent file name from the established corresponding relation between the file name and the file information;
and when the dependent file information corresponding to the dependent file name is found, adding the dependent file corresponding to the dependent file name into the dependency relationship of the main files, and establishing the global file dependency relationship according to the dependency relationship of each main file.
Optionally, the processor 430 is configured to:
according to the global file dependency relationship, whether the dependent file depended on by the main file is signed or not is confirmed;
when the dependent file is not symbolized, symbolizing the dependent file, generating a symbol table of the dependent file, and caching the symbol table of the dependent file;
and when the dependent file is symbolized, acquiring a symbol table of the dependent file from the cache.
Optionally, the processor 430 is configured to establish a link relationship between the feature parameters in the symbol table of the main file and the symbol table of the dependent file according to a unique variable identifier in the symbol table of the main file and the symbol table of the dependent file, so as to obtain a unified symbol table, where the unique variable identifier is used to identify the same feature parameter in the symbol table of the main file and the symbol table of the dependent file.
Optionally, the processor 430 is configured to record the function declarations in the main file and the dependent file in the same function declaration list; and establishing a mapping relation between the function of the main file and the function statement of the dependent file according to the same function statement list to obtain a unified symbol table.
The document scanning apparatus 40 provided in the embodiment of the present invention can be understood by referring to the related descriptions in fig. 1 to fig. 9, and will not be described in detail herein.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The method and apparatus for scanning files provided by the embodiment of the present invention are described in detail above, and a specific example is applied in the description to explain the principle and the embodiment of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A method of document scanning, comprising:
symbolizing a main file to generate a main file symbol table;
acquiring a dependent file name depended by the main file from a header file of the main file;
according to the dependent file name, searching dependent file information corresponding to the dependent file name from the established corresponding relation between the file name and the file information;
when a plurality of matching files corresponding to the dependent file names are found and the header file does not contain the directory path corresponding to the dependent file names, calculating the path distance between the main file and the plurality of matching files on the directory structure, and taking the matching file with the shortest path distance as the dependent file corresponding to the dependent file names;
adding the dependent file corresponding to the dependent file name into the dependent relationship of the main files, and establishing a global file dependent relationship according to the dependent relationship of each main file;
counting the times of each dependent file being dependent according to the global file dependency relationship, wherein the times of target dependent files being dependent are included, and the target dependent files are any dependent files contained in the global file dependency relationship;
according to the global file dependency relationship, symbolizing the dependent file depended by the main file, generating a symbol table of the dependent file, and caching the symbol table of the dependent file; the global file dependency relationship is a relationship established according to a dependency statement between the main file and the dependency file;
establishing a link relation for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table;
scanning the unified symbol table and outputting a scanning result;
after the scanning of the main file depending on the target dependent file is completed, reducing the number of times that the target dependent file is depended on by 1, and deleting the symbolic cache data of the target dependent file when the number of times that the target dependent file is depended on is zero.
2. The method according to claim 1, wherein after the searching for the dependent file information corresponding to the dependent file name from the established correspondence between the file name and the file information according to the dependent file name, the method further comprises:
when the unique matching file corresponding to the dependent file name is found, taking the unique matching file as the dependent file corresponding to the dependent file name;
and when a plurality of matching files corresponding to the dependent file names are found and the header file contains the directory path corresponding to the dependent file names, matching the path to an upper layer, and if the path is matched to a unique file, taking the matching file corresponding to the directory path as the dependent file corresponding to the dependent file names.
3. The method according to any one of claims 1-2, wherein the symbolizing the dependent file depended on by the master file according to the global file dependency relationship to generate a symbol table of the dependent file comprises:
according to the global file dependency relationship, whether the dependent file depended on by the main file is signed or not is confirmed;
when the dependent file is not symbolized, symbolizing the dependent file, generating a symbol table of the dependent file, and caching the symbol table of the dependent file;
and when the dependent file is symbolized, acquiring a symbol table of the dependent file from the cache.
4. The method according to any one of claims 1-2, wherein the establishing a link relationship for the symbol table of the main file and the symbol table of the dependent file to obtain a unified symbol table comprises:
and establishing a link relation of the characteristic parameters in the main file symbol table and the symbol table of the dependent file according to the unique variable identification in the main file symbol table and the symbol table of the dependent file to obtain a unified symbol table, wherein the unique variable identification is used for identifying the same characteristic parameter in the main file symbol table and the symbol table of the dependent file.
5. The method of any of claims 1-2, wherein when tokenizing the master file and the dependent file, the method further comprises:
recording the function declarations in the main file and the dependent file in the same function declaration list;
the establishing a link relation for the main file symbol table and the symbol table of the dependent file to obtain a unified symbol table includes:
and establishing a mapping relation between the function of the main file and the function statement of the dependent file according to the same function statement list to obtain a unified symbol table.
6. An apparatus for scanning a document, comprising:
the first processing unit is used for symbolizing the main file and generating a main file symbol table;
the second processing unit is used for symbolizing the dependent file which the main file depends on according to the global file dependency relationship, generating a symbol table of the dependent file and caching the symbol table of the dependent file; the global file dependency relationship is a relationship established according to a dependency statement between the main file and the dependency file;
the device is further used for counting the times of each dependent file being depended on according to the global file dependency relationship, wherein the times of target dependent files being depended on comprise the times of target dependent files being any dependent file contained in the global file dependency relationship;
the link establishing unit is used for establishing a link relation for the symbol table of the main file obtained by the processing of the first processing unit and the symbol table of the dependent file obtained by the processing of the second processing unit to obtain a unified symbol table;
a scanning unit, configured to scan the unified symbol table established by the link establishing unit;
the output unit is used for outputting the scanning result of the scanning unit for scanning the uniform symbol;
the device is also used for subtracting 1 from the number of times the target dependent file is depended on after the scanning of the main file depending on the target dependent file is finished, and deleting the symbolic cache data of the target dependent file when the number of times the target dependent file is depended on is zero;
the dependency relationship establishing unit is used for traversing a global file before the second processing unit symbolizes the dependency file, analyzing sentences which mark dependencies in code contents of the global file, and establishing the global file dependency relationship, wherein the global file comprises the main file and the dependency file;
the dependency relationship establishing unit includes:
the acquiring subunit is used for acquiring a dependent file name which the main file depends on from a header file of the main file;
the searching subunit is used for searching the dependent file information corresponding to the dependent file name from the established corresponding relation between the file name and the file information according to the dependent file name acquired by the acquiring subunit;
and the establishing subunit is configured to, when the multiple matching files corresponding to the dependent file names are found and the header file does not include the directory path corresponding to the dependent file names, calculate path distances between the main file and the multiple matching files in a directory structure, take the matching file with the shortest path distance as the dependent file corresponding to the dependent file names, add the dependent file corresponding to the dependent file names into the dependency relationship of the main file, and establish the global file dependency relationship according to the dependency relationship of each main file.
7. The apparatus of claim 6, wherein the establishing subunit is further configured to:
when the unique matching file corresponding to the dependent file name is found, taking the unique matching file as the dependent file corresponding to the dependent file name;
and when a plurality of matching files corresponding to the dependent file names are found and the header file contains the directory path corresponding to the dependent file names, matching the path to an upper layer, and if the path is matched to a unique file, taking the matching file corresponding to the directory path as the dependent file corresponding to the dependent file names.
8. The apparatus according to any of claims 6-7, wherein the second processing unit comprises:
the confirming subunit is used for confirming whether the dependent file depended by the main file is already symbolized or not according to the global file dependency relationship;
the generating subunit is used for symbolizing the dependent file and generating a symbol table of the dependent file when the confirming subunit confirms that the dependent file is not symbolized;
a buffer subunit, configured to buffer the symbol table of the dependent file generated by the generation subunit;
and the obtaining subunit is configured to obtain the symbol table of the dependent file from the cache when the confirming subunit confirms that the dependent file is signed.
9. The apparatus according to any one of claims 6 to 7,
the link establishing unit is used for establishing a link relation of the characteristic parameters in the symbol tables of the main file and the dependent file according to the unique variable identification in the symbol tables of the main file and the dependent file to obtain a unified symbol table, and the unique variable identification is used for identifying the same characteristic parameter in the symbol tables of the main file and the dependent file.
10. The apparatus of any of claims 6-7, further comprising: a recording unit for recording the data of the recording unit,
the recording unit is used for recording the function declarations in the main file and the dependent file in the same function declaration list;
and the link establishing unit is used for establishing a mapping relation between the function of the main file and the function statement of the dependent file according to the same function statement list recorded by the recording unit to obtain a unified symbol table.
11. The device for scanning the file is characterized by comprising a processor and a memory;
wherein the memory is used for storing operation instructions;
the processor is used for calling the operation instructions stored in the memory to execute the file scanning method according to any one of claims 1-5.
12. A computer-readable storage medium, characterized in that a program is stored therein, which when executed, implements the method of document scanning according to any one of claims 1-5.
CN201510511052.XA 2015-08-19 2015-08-19 File scanning method and device Active CN106469049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510511052.XA CN106469049B (en) 2015-08-19 2015-08-19 File scanning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510511052.XA CN106469049B (en) 2015-08-19 2015-08-19 File scanning method and device

Publications (2)

Publication Number Publication Date
CN106469049A CN106469049A (en) 2017-03-01
CN106469049B true CN106469049B (en) 2020-09-29

Family

ID=58214551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510511052.XA Active CN106469049B (en) 2015-08-19 2015-08-19 File scanning method and device

Country Status (1)

Country Link
CN (1) CN106469049B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795069A (en) * 2018-08-02 2020-02-14 Tcl集团股份有限公司 Code analysis method, intelligent terminal and computer readable storage medium
CN110262803B (en) * 2019-06-30 2023-04-18 潍柴动力股份有限公司 Method and device for generating dependency relationship
CN110286934B (en) * 2019-06-30 2023-04-18 潍柴动力股份有限公司 Static code checking method and device
CN111158663B (en) * 2019-12-26 2021-07-02 深圳逻辑汇科技有限公司 Method and system for handling references to variables in program code
CN113296777B (en) * 2020-04-10 2022-05-27 阿里巴巴集团控股有限公司 Dependency analysis and program compilation method, apparatus, and storage medium
CN112230980A (en) * 2020-09-28 2021-01-15 北京五八信息技术有限公司 Dependency relationship detection method and device, electronic equipment and storage medium
CN112230979A (en) * 2020-09-28 2021-01-15 北京五八信息技术有限公司 Dependency relationship detection method and device, electronic equipment and storage medium
CN113448553B (en) * 2021-06-23 2023-11-03 南京大学 Method and system for managing and visualizing C language project dependent information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739339A (en) * 2009-12-29 2010-06-16 北京航空航天大学 Program dynamic dependency relation-based software fault positioning method
US20140059525A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Method and system for facilitating replacement of system calls
CN104182519A (en) * 2014-08-25 2014-12-03 百度在线网络技术(北京)有限公司 File scanning method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739339A (en) * 2009-12-29 2010-06-16 北京航空航天大学 Program dynamic dependency relation-based software fault positioning method
US20140059525A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Method and system for facilitating replacement of system calls
CN104182519A (en) * 2014-08-25 2014-12-03 百度在线网络技术(北京)有限公司 File scanning method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘春燕.基于规则的C/C++代码静态检测方法研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2011,(第05期),正文第8,14-15,24-27,34-35,41-42页. *

Also Published As

Publication number Publication date
CN106469049A (en) 2017-03-01

Similar Documents

Publication Publication Date Title
CN106469049B (en) File scanning method and device
US9122540B2 (en) Transformation of computer programs and eliminating errors
US10394694B2 (en) Unexplored branch search in hybrid fuzz testing of software binaries
CN111104335B (en) C language defect detection method and device based on multi-level analysis
CN106295346B (en) Application vulnerability detection method and device and computing equipment
US11262988B2 (en) Method and system for using subroutine graphs for formal language processing
CN104320312A (en) Network application safety test tool and fuzz test case generation method and system
CN111124480A (en) Application package generation method and device, electronic equipment and storage medium
CN111427578B (en) Data conversion method, device and equipment
CN106776266B (en) Configuration method of test tool and terminal equipment
CN115658128A (en) Method, device and storage medium for generating software bill of material
CN113568604B (en) Method and device for updating wind control strategy and computer readable storage medium
JP2008299723A (en) Program verification method and device
CN108897678B (en) Static code detection method, static code detection system and storage device
JP2007233432A (en) Inspection method and apparatus for fragileness of application
CN111240987B (en) Method and device for detecting migration program, electronic equipment and computer readable storage medium
US20040010780A1 (en) Method and apparatus for approximate generation of source code cross-reference information
US9396239B2 (en) Compiling method, storage medium and compiling apparatus
CN111309301A (en) Program language conversion method, device and conversion equipment
CN116578282A (en) Code generation method, device, electronic equipment and medium
CN113190235B (en) Code analysis method and device, electronic terminal and storage medium
CN112799673B (en) Network protocol data checking method and device
Kumar et al. Code clone detection and analysis using software metrics and neural network-a literature review
CN111580821B (en) Script binding method and device, electronic equipment and computer readable storage medium
CN115113858A (en) Method and system for detecting class cycle dependence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant