CN114115908A - Instruction set generation method, program analysis method, device and storage medium - Google Patents

Instruction set generation method, program analysis method, device and storage medium Download PDF

Info

Publication number
CN114115908A
CN114115908A CN202111465884.4A CN202111465884A CN114115908A CN 114115908 A CN114115908 A CN 114115908A CN 202111465884 A CN202111465884 A CN 202111465884A CN 114115908 A CN114115908 A CN 114115908A
Authority
CN
China
Prior art keywords
character
instruction
target
class
character instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111465884.4A
Other languages
Chinese (zh)
Inventor
李俊蓉
高飞
罗宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111465884.4A priority Critical patent/CN114115908A/en
Publication of CN114115908A publication Critical patent/CN114115908A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Abstract

The present disclosure provides instruction set generation and program parsing methods, apparatuses, and storage media, and relates to the field of computer technologies, in particular to the field of computer software technologies. The specific implementation scheme is as follows: obtaining a first code file of an Application Program Interface (API) provided by an operating system; determining declared classes in the first code file and relationship information between the declared classes; generating character instructions corresponding to the component elements according to the component elements contained in the determined classes; generating a relation instruction representing the relation between the generated character instructions according to the relation information; a character instruction set is obtained that includes the generated character instructions and the relationship instructions. By applying the scheme provided by the embodiment of the disclosure, the character instruction set corresponding to the API can be generated.

Description

Instruction set generation method, program analysis method, device and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to the field of computer software technology.
Background
The service code of the Application program may refer to an Application Programming Interface (API) of the operating system, and use the functions provided by the operating system through the API of the operating system. In this process, the parsing engine of the high-level programming language used by the application program needs to parse the API and interact with the operating system using corresponding system instructions, for example, specifying a program that the operating system needs to execute in order to provide the function. The parsing rule corresponding to the API may be different from the parsing rule of the parsing engine, so that the API cannot be parsed by the parsing engine.
Therefore, an instruction set corresponding to the API needs to be generated, so that the parsing engine obtains the system instruction corresponding to the API through the generated instruction set and completes the process of interacting with the operating system.
Disclosure of Invention
The disclosure provides instruction set generation and program analysis method and devices, equipment and a storage medium.
According to an aspect of the present disclosure, there is provided an instruction set generating method including:
obtaining a first code file of an Application Program Interface (API) provided by an operating system;
determining declared classes in the first code file and relationship information between the declared classes;
generating character instructions corresponding to the component elements according to the component elements contained in the determined classes;
generating a relation instruction representing the relation between the generated character instructions according to the relation information;
a character instruction set is obtained that includes the generated character instructions and the relationship instructions.
According to another aspect of the present disclosure, there is provided a program parsing method including:
obtaining grammar structure data generated by compiling a second code file of the application program;
when each syntactic element in the syntactic structure data is analyzed, if the analyzed target syntactic element represents an API for calling an operating system, selecting a character instruction for analyzing the target syntactic element from a character instruction set, and analyzing the target syntactic element by using the selected character instruction, wherein the character instruction set is as follows: and generating the instruction set according to the instruction set generation method.
According to another aspect of the present disclosure, there is provided an instruction set generating apparatus including:
the code file obtaining module is used for obtaining a first code file of an application program interface API provided by an operating system;
a relationship information determination module for determining the declared classes in the first code file and relationship information between the declared classes;
the character instruction generating module is used for generating character instructions corresponding to all the constituent elements according to the constituent elements contained in the determined classes;
the relation instruction generating module is used for generating a relation instruction which represents the relation between the generated character instructions according to the relation information;
and the character instruction set obtaining module is used for obtaining a character instruction set containing the generated character instruction and the relation instruction.
According to another aspect of the present disclosure, there is provided a program analysis apparatus including:
the grammar structure obtaining module is used for obtaining grammar structure data generated by compiling a second code file of the application program;
a syntax element parsing module, configured to, when parsing each syntax element in the syntax structure data, select, if a parsed target syntax element indicates an API that invokes an operating system, a character instruction for parsing the target syntax element from a character instruction set, and parse the target syntax element using the selected character instruction, where the character instruction set is: and generating the instruction set generated by the device according to the instruction set.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the instruction set generation, program parsing methods described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform any of the instruction set generation, program parsing methods described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the instruction set generation, program parsing method of any of the above.
As can be seen from the above, in the solution provided in the embodiment of the present disclosure, according to the component elements included in the declared classes in the first code file of the API, the character instruction of each component element is generated, and according to the relationship information between the declared classes in the first code file, the relationship instruction representing the relationship between the character instructions is generated, so as to obtain the character instruction set including the corresponding character instruction and the relationship instruction. It can be seen that the scheme provided by the embodiment of the present disclosure can generate an instruction set of a corresponding API.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow chart diagram illustrating a first method for generating an instruction set according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram illustrating a second method for generating an instruction set according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a third method for generating an instruction set according to an embodiment of the disclosure;
FIG. 4 is a schematic flow chart diagram illustrating a fourth method for generating an instruction set according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a program parsing method according to an embodiment of the disclosure;
fig. 6 is a schematic flowchart of a first character instruction selection method provided in the embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating a second method for selecting a character command according to an embodiment of the disclosure;
fig. 8 is a schematic structural diagram of an instruction set generating apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a first program analysis device according to an embodiment of the disclosure;
fig. 10 is a schematic structural diagram of a second program analysis device according to an embodiment of the disclosure;
FIG. 11 is a block diagram of an electronic device for implementing an instruction set generation method of an embodiment of the present disclosure;
fig. 12 is a block diagram of an electronic device for implementing a program parsing method according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An application scenario of the embodiment of the present disclosure is described below by taking an application developed based on the Flutter technology as an example.
Flutter is a development framework for a cross-platform mobile UI (User Interface). Based on the development framework, mobile UI development can be realized by using a high-level programming language, wherein the high-level programming language can be Dart language and the like.
During the process of running the application program developed based on the Flutter technology, the electronic device uses the compiler and the parsing engine corresponding to the high-level programming language. Specifically, the compiler generates grammar structure data for a code file of the application program; the analysis engine analyzes the grammar structure data to obtain the execution information which represents the execution mode of the application program in the grammar structure data, and controls the execution process of the application program by using the execution information.
The data recorded in the syntax structure data may include not only information related to an API that calls an application program itself, but also information related to an API that calls an operating system, and therefore, after parsing the syntax structure data, the parsing engine may obtain call information for calling the API provided by the operating system, for example, the call information includes a class name and a method name in the API, so that the parsing engine may determine specific content of the API provided by the calling operating system according to the call information, and further generate execution information representing functions of the API. However, if the preset parsing rule corresponding to the API of the operating system is different from the parsing rule of the parsing engine, the parsing engine may not accurately determine the specific content of the API of the calling operating system according to the calling information corresponding to the API of the operating system, so that the application program may not continue to run.
Therefore, a character instruction set corresponding to the API provided by the operating system needs to be generated, so that the parsing engine can obtain the character instruction corresponding to the API provided by the operating system according to the character instruction set when parsing the syntax structure data, and since the character instruction obtained from the character instruction set conforms to the parsing rule of the parsing engine, the parsing engine can parse the specific content of the API provided by the calling operating system according to the obtained character instruction, thereby completing the process of calling the API provided by the operating system.
The syntax structure data may be tree structure data, which may be referred to as an abstract syntax tree. Where different high-level programming languages are used, the syntactic structure data is described using different types of data. For example, in the case where the high-level programming language is Dart language, the syntax structure data is described using data of JSON (JavaScript Object Notation) type.
The following describes the instruction set generation method provided by the embodiments of the present disclosure in detail by using specific embodiments.
In one embodiment of the present disclosure, referring to fig. 1, a flowchart of a first instruction set generation method is provided, which includes the following steps S101-S105.
Step S101: a first code file of an API provided by an operating system is obtained.
The first code file is a system file for recording codes for implementing an API provided by an operating system, and the system file may be written by a Dart code, a C language code, or the like.
In one implementation, a system file path may be preset, so that when a first code file needs to be obtained, the system file path may be directly accessed, and a file directory under the file path is read, so as to obtain a file in the file directory as the first code file.
Generally, when an application calls an API of an operating system, only an API for implementing the application function is selected, and not all APIs of the operating system are called. In view of the above, in one implementation, the system file path may be set according to the function of the application, so that the range of the path for acquiring the first code file may be narrowed, the number of the acquired first code files may be reduced, and the workload for subsequently generating the character instruction set may be reduced accordingly.
Step S102: classes declared in the first code file and relationship information between the declared classes are determined.
In a high-level programming language, the classes may be templates for creating objects. Keywords can be used in the code file to declare a class, and the keywords used for declaring a class are different in different high-level programming languages, for example, the high-level programming language can be Java language, and then the keywords can be class; the high-level programming language may be a JavaScript language, and the keyword may be a function. In view of the above, in one implementation, a character string with the above-described key may be detected in the first code file, and then the class declared in the first code file may be determined from character strings following the detected character string. For example, if the information "class Cars" is recorded in the first code file, the character string Cars after class can be determined as the class declared in the first code file after detecting the character string of class from the first code file.
The relationship represented by the relationship information between the classes is determined by the high-level programming language, and is at least one of the relationships preset for the high-level programming language. For example, the high-level programming language is Dart language, and the preset relationship of Dart language may include: an extend relationship, an instance relationship, a mixin relationship, etc., and the relationship information is at least one of the extend, the instance, and the mixin.
Specifically, the relationship information between the classes may be recorded by using a relationship table, where each record in the relationship table is used to record the type of each relationship and the class names of two or more classes corresponding to each relationship in the relationship information; the relationship information between classes can also be recorded in class diagrams. Each class is recorded as a component in the class diagram, and the relationships between classes are recorded with pointers between components.
In view of the above, in one implementation, it is possible to detect, in the first code file, a character string representing a keyword of the relationship information, then determine the relationship represented by the detected character string as the relationship information between the declared classes of the first code file, and determine the declared class of the first code file from the character strings preceding and following the detected character string. For example, if the first code file has information "class a extensions B" recorded therein, it is possible to determine that the relationship information between the classes declared by the first code file is the extensions relationship, the character string before the extensions is a, and the character string after the extensions is B, after detecting the character string of the extensions from the first code file, and the classes declared by the first code file include classes a and B.
Step S103: and generating character instructions corresponding to the component elements according to the component elements contained in the determined class.
The above-mentioned constituent elements may be variables and methods included in the class, such as static variables, local variables, static methods, instance methods, and the like.
The character command is a command containing constituent elements recorded in the form of characters. For example, if a method in a class is a Container construction method, a character instruction corresponding to the method may be "Container ()", and name information "Container" of the corresponding construction method may be recorded using the character instruction.
The specific way of generating the character instruction corresponding to the component element is referred to step S203 in the embodiment shown in fig. 2, which will not be described in detail here.
Step S104: a relation instruction indicating a relation between the generated character instructions is generated based on the relation information.
Wherein the relation instruction is used for representing the reference relation between the character instructions.
Because reference relations exist among the component elements in the classes with the reference relations, and the reference relations of the component elements are restricted by the reference relations among the classes, the relation information can record not only the reference relations among the classes, but also the reference relations among the component elements in the classes with the reference relations. Also, since the character instructions correspond to the constituent elements, the reference relationship between the character instructions is the same as the reference relationship between the constituent elements, and thus, in the case where the reference relationship between the constituent elements in the class having the reference relationship is known, the relationship instruction representing between the character instructions can be generated based on the above-described reference relationship.
For example, it is assumed that: reference relations exist between the subclass A and the parent B, and reference relations exist between the method a1 in the subclass A and the method B1 in the parent B, so that a relation instruction which represents the relation between the character instruction corresponding to the a1 and the character instruction corresponding to the B1 can be generated based on the reference relations between the method a1 in the subclass A and the method B1 in the parent B.
The parent class refers to a class that is a referent, and the child class refers to a class that is a referent.
The specific manner of generating the relationship instruction can be seen in steps S304-S307 in the embodiment shown in fig. 3, which will not be described in detail here.
Step S105: a character instruction set is obtained that includes the generated character instructions and the relationship instructions.
Specifically, in the character instruction set, all the relationship instructions may be stored in a subset of the character instruction set, or a character instruction corresponding to the reference relationship represented by each relationship instruction may be determined, and the relationship instruction and the determined character instruction are stored in a subset of the character instruction set.
The manner of obtaining the character instruction set can be seen in steps S405-S406 in the embodiment shown in fig. 4, which will not be described in detail here.
In an embodiment of the present disclosure, after the character instruction set is generated, search information indicating a corresponding relationship between the character instruction and the API may be generated in the instruction set file according to a preset calling manner and an instruction generation manner of the API recorded in the first code file. For example, the search information may record a storage address of a character command corresponding to the API, e.g., record a correspondence between an identifier of the API and the storage address of the character command.
The search information can be defined as a class and is used as a class associated with the API, so that when the API is called, the class corresponding to the search information can be obtained by calling the class preset in the API, and the storage address of the character instruction corresponding to the API can be obtained from the class.
On the basis, when the analysis engine calls the API of the operating system, the analysis engine may query the retrieval information recorded with the API identifier according to the identifier of the called API, and obtain the character instruction by using the storage address of the character instruction recorded in the queried retrieval information.
As can be seen from the above, in the solution provided in the embodiment of the present disclosure, according to the component elements included in the declared classes in the first code file of the API, the character instruction of each component element is generated, and according to the relationship information between the declared classes in the first code file, the relationship instruction representing the relationship between the character instructions is generated, so as to obtain the character instruction set including the corresponding character instruction and the relationship instruction. It can be seen that the scheme provided by the embodiment of the present disclosure can generate an instruction set of a corresponding API.
A specific embodiment of generating the character command in step S103 will be described below.
In one embodiment of the present disclosure, referring to fig. 2, a flowchart of a second instruction set generation method is provided, which includes steps S201-S205.
Step S201: a first code file of an API provided by an operating system is obtained.
Step S202: classes declared in the first code file and relationship information between the declared classes are determined.
The above steps S201 to S202 are the same as the above steps S101 to S102, respectively, and are not described in detail here.
Step S203: and for each component element contained in the determined class, determining an instruction generation mode corresponding to the component element according to the type of the component element, and generating a character instruction corresponding to the component element according to the instruction generation mode.
The types of the above-described constituent elements may be methods or attributes.
The instruction generation modes can be preset for different types of the component elements, so that for each component element, the instruction generation mode corresponding to the component element can be determined from the preset instruction generation modes according to the type of the component element; and then generating the character instruction of the component element according to the instruction generation rule corresponding to the determined instruction generation mode.
In an embodiment of the present disclosure, the constituent elements may be classified according to types of the constituent elements, so that the constituent elements of the same type are structured under the same type, and then a preset instruction generation manner corresponding to each type is determined as an instruction generation manner corresponding to each constituent element under the type.
For example, when the type of the component element is an attribute, the instruction generation manner may be an instruction to generate a value representing the attribute, and when the type of the component element is a method, the instruction generation manner may be an instruction to generate a step representing the method.
The instruction generation rule may be a mapping relationship between the meaning represented by the component element and a preset character, so that a character corresponding to each component element may be obtained according to the mapping rule, and then a character instruction is generated according to the character corresponding to the component element.
Assuming that the meaning of the component element 1 is "transfer", and the predetermined character corresponding to "transfer" is "MOV", the predetermined rule may be a mapping relationship between "transfer" and "MOV".
The meaning of the component element 2 is "addition", and the preset character corresponding to the addition is "ADD", and the preset rule may be a mapping relationship between the addition and the ADD.
Step S204: a relation instruction indicating a relation between the generated character instructions is generated based on the relation information.
Step S205: a character instruction set is obtained that includes the generated character instructions and the relationship instructions.
The above steps S204-S205 are the same as the above steps S104-S105, respectively, and are not described in detail here.
As can be seen from the above, in the scheme provided in this embodiment, according to the types of the constituent elements, the instruction generation manner corresponding to the constituent elements is determined, and the character instructions corresponding to the constituent elements are generated according to the instruction generation manner, so that the generated character instructions can conform to the characteristics of different types of character instructions, and the accuracy of generating the character instructions is improved.
A specific embodiment of the generation of the relation instruction in step S104 will be described below.
In one embodiment of the present disclosure, referring to fig. 3, a flow chart of a third instruction set generation method is provided, which includes the following steps S301-S308.
Step S301: a first code file of an API provided by an operating system is obtained.
Step S302: classes declared in the first code file and relationship information between the declared classes are determined.
Step S303: and generating character instructions corresponding to the component elements according to the component elements contained in the determined class.
The above steps S301-S303 are the same as the above steps S101-S103, respectively, and are not described in detail.
Step S304: and determining a class group of which the class declared in the first code file has the reference relation according to the relation information.
The above-described class group refers to a combination of classes having a reference relationship.
Because the reference relationship between the classes is recorded in the relationship information, the classes with the reference relationship can be obtained from the relationship information, and the classes with the reference relationship are combined to obtain the class group.
Specifically, the reference relationship may be determined according to a single relationship between classes in the relationship information. For example, when a single relationship between classes in the relationship information is an inheritance relationship of class a and class B, then class a and class B may be determined to have a reference relationship. The above-mentioned reference relationship may also be determined based on a plurality of relationships between classes in the relationship information. For example, when the plurality of relationships between classes in the relationship information are an inheritance relationship of class a and class B, and an inheritance relationship of class B and class C, it can be determined that class a and class C have a reference relationship.
In one embodiment of the present disclosure, the class group may be determined by:
presetting a target class, determining a class having a reference relation with the target class according to the relation information, and adding the determined class and the target class into the same combination. Then, a class which can have a reference relationship with the combination but is not located in the combination is obtained according to the relationship information, and the obtained class is added into the combination. Repeating the process until the class which is not located in the combination can not be obtained according to the relationship information, and obtaining the class group corresponding to the target class, wherein the classes in the class group have the reference relationship.
Step S305: and determining target composition elements with reference relations among the classes in the various groups.
The reference relationship in this step is a reference relationship between constituent elements in a class, and is used to indicate a manner in which a constituent element references a constituent element in the same class or in another class in the class to which the constituent element belongs.
For example, a method __ init __ () in class a has a reference relationship with its parent class __ init __ () method, in such a way that the super () method representing the parent class __ init __ () method is called from __ init __ () method of class a.
In an embodiment of the present disclosure, because the relationship information may record, in addition to the reference relationship between the classes, the reference relationship between the constituent elements in the classes, after the class group is determined, the constituent element having the reference relationship and the constituent elements having different classes having the reference relationship in each class in the class group may be determined from the relationship information, and the determined constituent element is determined as the target constituent element.
Step S306: and determining a second reference relation between the character instructions corresponding to the target component elements according to the first reference relation between the target component elements.
The target component elements and the character instructions have corresponding relations, so that the reference relations between the component elements and the reference relations between the character instructions tend to be consistent, and the second reference relations can be determined according to the first reference relations.
Specifically, the character instruction corresponding to the component element belonging to the citation party in the first citation relationship may be determined as the citation party in the second citation relationship, and the character instruction corresponding to the component element belonging to the cited party in the first citation relationship may be determined as the cited party in the second citation relationship.
For example, if the first reference relationship representation element method X references the component element method Y, then the character instruction of the second drink relationship representation method X references the character instruction of the method Y when executed.
Step S307: and generating a relation instruction which represents each second reference relation.
The above relation instruction is used for specifying a specific reference mode of the character instruction. For example, the second reference relationship may be that the character instruction of method X references the character instruction of method B when executed, the reference manner is rewriting, and the generated relationship instruction may be descriptive information recording the rewriting behavior.
Step S308: a character instruction set is obtained that includes the generated character instructions and the relationship instructions.
Step S308 is the same as step S105, and will not be described in detail.
As can be seen from the above, in the scheme provided in this embodiment, the reference relationship of the character instruction is determined according to the reference relationship of the constituent elements, and the relationship instruction is generated, so that the character instruction having the reference relationship with the character instruction can be determined when the character instruction is obtained, the character instruction is conveniently searched and obtained, and the efficiency of using the character instruction set is improved.
The following describes a specific manner of obtaining the character instruction set in the aforementioned step S105.
In one embodiment of the present disclosure, referring to fig. 4, a flowchart of a fourth instruction set generation method is provided, which includes steps S401-S406.
Step S401: a first code file of an API provided by an operating system is obtained.
Step S402: classes declared in the first code file and relationship information between the declared classes are determined.
Step S403: and generating character instructions corresponding to the component elements according to the component elements contained in the determined class.
Step S404: a relation instruction indicating a relation between the generated character instructions is generated based on the relation information.
The above steps S401-S404 are the same as the above steps S101-S104, respectively, and are not described in detail.
Step S405: and aiming at each character instruction subset to be generated, determining a target class corresponding to the character instruction subset, and generating the character instruction subset of the character instructions containing the constituent elements of the corresponding target class.
For each character instruction subset to be generated, the corresponding target class may be one of a plurality of preset classes.
Specifically, the component elements included in the first code file of the target class may be determined according to the declaration information of the target class in the first code file, all the character instructions corresponding to the determined component elements are placed in the same set, and the obtained set is determined as the character instruction subset corresponding to the target class.
Step S406: and obtaining a character instruction set according to the generated character instruction subset and the generated relation instruction.
Two implementations of obtaining the character instruction set are described below.
In one implementation, all the relational instructions may be placed in one relational instruction subset, and then the relational instruction subset and all the character instruction subsets are placed in the same set, and the set is used as the character instruction set.
In another implementation mode, for each character instruction subset, determining a relation instruction between character instructions in the character instruction subset from each relation instruction subset, and calling the determined relation instruction into the character instruction subset; and placing the adjusted character instruction subsets into the same set, and taking the set as a character instruction set.
As can be seen from the above, in the scheme provided in this embodiment, when generating the character instruction subset, the target class is used as a unit, so that the character instructions corresponding to the same type of constituent elements can be placed in the same character instruction subset. Since the character instruction subset corresponds to a class, and the API provided by the calling operating system is usually called by class, generating the character instruction subset in this way can facilitate obtaining the character instructions of the constituent elements in the same class.
In an embodiment of the present disclosure, after the character instruction set is obtained in step S105 according to the generated character instruction subset and the generated relation instruction, for each character instruction subset, a code library to which an implementation code of a target class corresponding to the character instruction subset belongs may be determined, and the character instruction subset is merged to an instruction set corresponding to the code library.
The code library may include a plurality of classes in the API. When writing a code file, a code block may be imported into the code file using a key. For example, a code library with a library name of os can be imported into a code file by recording an "import os" character string in the code file with import as a key.
Because the code base can contain a plurality of classes, a corresponding relation exists between the classes and the code base; because the character instruction subset and the class in the first code file have a corresponding relation, the target class corresponding to the character instruction subset can be determined according to the corresponding relation between the character instruction subset and the class in the first code file; and determining the code base to which the implementation code of the target class belongs according to the corresponding relation between the class and the code base.
The following describes how the character command subsets are combined.
In one implementation, a new empty instruction set may be generated for the code base, and used as an instruction set corresponding to the code base, and each character instruction in the character instruction subset to be merged is added to the instruction set corresponding to the code base. After the above process is completed, each character instruction in the character instruction subset to be merged exists in the instruction set corresponding to the code base, which is equivalent to merging the character instruction subset into the instruction set corresponding to the code base.
As can be seen from the above, in the scheme provided in this embodiment, the character instruction subset is merged to the instruction set corresponding to the code library, so that other application programs can obtain the character instructions according to the instruction set corresponding to the code library when multiplexing the code library, and can call the corresponding API according to the obtained character instructions, thereby improving the convenience of using the code library.
In an embodiment of the present disclosure, the character instructions in the instruction set corresponding to the code library may be stored in a preset instruction set file. In the instruction set file, the high-level programming language used for recording the character instructions may be different according to the applicability of the parsing engine. For example, the Flutter parsing engine is adapted to parse the Dart language, and if the parsing engine corresponding to the application program is the Flutter engine, the high-level programming language for recording the character instruction in the instruction set file may be the Dart language; the V8 parsing engine is adapted to the JavaScript language, and if the parsing engine corresponding to the application program is the V8 parsing engine, the high-level programming language for recording the character instruction in the instruction set file may be the JavaScript language.
In an embodiment of the present disclosure, after the character instruction set is obtained according to the generated character instruction subset and the generated relation instruction in step S105, a search index of each character instruction subset may also be set according to a target class corresponding to each character instruction subset; and establishing a chain relation among the search indexes of the character instruction subsets corresponding to the target classes according to the inter-class reference relation represented by the relation information.
The target class is a class to which constituent elements corresponding to character instructions in the character instruction subset belong.
The lookup index may be an ID (identification), and information such as a name of a class and a path of a code file where a declaration of the class is located may be recorded in the ID. The search index can be used for establishing a corresponding relation with the class, and the character instruction subset and the class also have a corresponding relation, so that the character instruction subset can be found through the search index.
The chain relation between the lookup indexes can be a relation established by a pointer according to the reference relation between the classes. When the target lookup index is obtained through the chain relation, other associated lookup indexes can be sequentially obtained along the extending direction of the chain relation represented by the pointer, and the character instruction sets corresponding to the other lookup indexes have a reference relation with the character instruction set corresponding to the target lookup index.
As can be seen from the above, in the scheme provided by the embodiment of the present disclosure, the lookup indexes are set for each character instruction subset, and the chain relationship between the lookup indexes is established, so that the classes having the inter-class reference relationship can be obtained by looking up the chain relationship between the indexes, which further facilitates the use of the character instruction subsets and improves the convenience.
In an embodiment of the present disclosure, the determining the class declared in the first code file may be further performed by:
it is determined that the class declared by the first code file does not belong to a generic class.
And acquiring the extensive types recorded in the first code file, and establishing a corresponding relation between the acquired extensive types and the preset actual data types.
The generic type refers to a parameter type, and the constituent elements or classes with the generic type are only given to the actual data type transmitted by the application program during the running process of the application program. For example, using T to represent a generic type, and defining Array < T > in the first code file to represent that the file generates a variable, the variable is an Array, and the type of the element in the Array is generic, if the actual data type transmitted during the running process of the application is: array < String >, then generic T is assigned the actual type String.
When the application program is not operated, the class which does not belong to the generic type can generate a corresponding character instruction according to the actual data type recorded by the code file to which the class belongs; on the other hand, a class belonging to the generic type may correspond to classes of a plurality of actual data types, and thus a character instruction belonging to an actual class corresponding to the generic type cannot be specified.
The process of establishing the correspondence between the generic types and the actual data types may be to set a generic relationship recording file, record the generic types therein, and record all the actual data types that may correspond to the generic types during the operation of the application program.
As can be seen from the above, in the scheme provided in this embodiment, the corresponding relationship between the obtained generic type and the preset actual data type is established, so that when the subsequent analysis engine analyzes the information of the application program, the generic type can be analyzed according to the actual data type, and the accuracy is further improved.
In one embodiment of the present disclosure, referring to fig. 5, a flowchart of a program parsing method is provided, which includes the following steps S501-S502.
Step S501: and obtaining grammar structure data generated by compiling the second code file of the application program.
In an embodiment of the disclosure, the second code file is a code file for recording application information, and the second code file may be a Dart file, a JavaScript file, or the like.
During the running process of the application program, the compiler of the application program can perform syntax analysis on the codes in the second code file according to a preset syntax rule to obtain an abstract syntax structure of the codes in the second code file, and generate syntax structure data corresponding to the codes in the second code file according to the obtained abstract syntax structure.
Step S502: when each syntax element in the syntax structure data is analyzed, if the analyzed target syntax element represents an API for calling an operating system, a character command for analyzing the target syntax element is selected from the character command set, and the selected character command is used for analyzing the target syntax element.
Wherein, the character instruction set is as follows: a set of instructions generated according to the method of any one of the above embodiments.
In an embodiment of the present disclosure, the syntax structure data may be an abstract syntax tree, and the corresponding syntax element may be a node in the abstract syntax tree; the above syntax element indicates that the API of the operating system is called, and reference may be made to step S601 described below, which is not described in detail here.
The method for selecting the character instruction may be to select a character instruction subset corresponding to an API calling an operating system from the character instruction set, and obtain the character instruction from the character instruction subset, as described in detail in the embodiments in fig. 6 and 7 below.
And analyzing the target syntax element by using the selected character instruction, wherein the analyzed result can be execution information corresponding to the specific content of the API provided by the calling operating system. The execution information may be expressed in bytecode.
As can be seen from the above, in the solution provided in this embodiment, when parsing a syntax element in syntax structure data, if an API that needs to be called exists in the syntax element, a corresponding character instruction is selected from a character instruction set for parsing, so that an application program can execute a function of the API according to the character instruction, and usability of the application program is improved.
In step S502, the specific manner of selecting the character command for parsing the target syntax element from the character command set is as shown in fig. 6 and fig. 7.
In an embodiment of the present disclosure, referring to fig. 6, a flowchart of a first character instruction selection method is provided, where the method includes the following steps S601-S603:
step S601: and determining a target API of the operating system to be called and a target class of the target API to be called according to the target syntax element.
Corresponding to the step S502, the target syntax element represents an API for calling the operating system, so that the API called by the calling information corresponding to the calling information can be determined as the target API according to the calling information representing the API for calling the operating system in the target syntax element, and the class of the API called by the calling information corresponding to the calling information can be determined as the target class according to the class information included in the calling information. For example, if the calling information including the API in the calling information is the file name of the first code file, the code file of the API having the file name may be searched according to the file name, and the API corresponding to the code file obtained by searching may be determined as the target API; if the calling information contains the class information as the name of the class, the code file corresponding to the API can be searched, the class declaration containing the name is obtained in the code file, and the class corresponding to the class declaration is determined as the target class of the target API.
Step S602: and searching a target character instruction subset corresponding to the target class in the character instruction set corresponding to the target API.
In an embodiment of the present disclosure, with reference to the process of establishing the chain relationship in the above embodiments, the corresponding target search index in the chain relationship may be obtained according to the target class, and the corresponding target character instruction subset may be obtained according to the obtained target search index.
In one embodiment of the present disclosure, the lookup index may be an ID, and a name of a class may be recorded in the ID. Therefore, the search index corresponding to the target class in the chain relation can be obtained according to the matching between the name of the class recorded in the ID and the class name of the target class.
Step S603: and determining a character instruction for analyzing the target syntax element according to the target character instruction subset.
Specifically, the method may include determining a class component element in an API used when the API is called according to call information indicating the API that calls the operating system in the target syntax element, obtaining a character instruction corresponding to the component element in the target character instruction subset according to the determined component element, and determining the obtained character instruction as a character instruction for parsing the target syntax element.
Specifically, the component elements of the class in the API used when the API is called may be obtained by obtaining the corresponding class component elements from the attribute name or method name of the call information record in the target syntax element indicating the API that calls the operating system, for example, the call information in the target syntax element indicating the API that calls the operating system may be the reference information of the record attribute, such as "params: [ a, b ] ", according to the reference information, it can be determined that the constituent elements of the class in the calling API are the constituent element a and the constituent element b.
Therefore, when the program is analyzed by applying the scheme provided by the embodiment of the disclosure, the target character instruction subset is determined according to the target class of the target API, and the character instruction is determined according to the target character instruction subset, so that the determined character instruction can be used for analyzing the syntax element representing the target class in the calling API, and the accuracy of program analysis is improved.
In an embodiment of the present disclosure, referring to fig. 7, a flowchart of a second character instruction selection method is provided, where the method includes the following steps S701 to S705:
step S701: and determining a target API of the operating system to be called and a target class of the target API to be called according to the target syntax element.
Step S702: and searching a target character instruction subset corresponding to the target class in the character instruction set corresponding to the target API.
The above steps S701 to S702 are the same as the above steps S601 to S602, respectively, and are not described in detail here.
Step S703: in the chain relationship, a target lookup index representing a subset of target character instructions is looked up.
According to the description of the chain relation in the foregoing embodiment, the search index of each character instruction subset is set for the target class corresponding to each character instruction subset, and therefore, the target search index set for the target class can be obtained according to the target class corresponding to the target character instruction subset.
Step S704: and determining the associated search indexes which are sequentially connected in a chain manner after the target search index along the extending direction of the chain relationship.
In an embodiment of the disclosure, the extending direction may be consistent with a relationship between classes corresponding to the character instruction subsets corresponding to the target search index and the associated search index. For example. Class a inherits class B, and class B inherits class C, and the extension direction may be from the lookup index of class a to class B, and from the lookup index of class B to class C. Therefore, along the extending direction of the chain relation, the search index corresponding to the class a can be used as the target search index, and the search indexes corresponding to the class B and the class C can be determined.
Step S705: and determining a character instruction for analyzing the target syntax element according to the target character instruction subset and the character instruction subset corresponding to the determined associated lookup index.
In the chain relationship, the correspondence between the association lookup index and the character instruction subset is consistent with the correspondence between the target lookup index and the character instruction subset, so that the character instruction subset corresponding to the association lookup index can be obtained according to the association lookup index, similar to the step S602, and detailed description is omitted here.
The manner of determining the character command for parsing the target syntax element according to the target character command and the character command subset corresponding to the associated lookup index is consistent with the above step S603, and is not described in detail here.
As can be seen from the above, in the process of program parsing by applying the scheme provided by the embodiment of the present disclosure, the target lookup index and the associated lookup index may be determined by the above-mentioned chain relationship, and the character instruction for parsing the target syntax element is determined according to the determined lookup index. In the process, the chain relation enables chain connection to exist between the related search indexes, the related search indexes can be obtained according to the extending direction of the chain connection, and the efficiency of obtaining the character instruction is improved.
In an embodiment of the present disclosure, the step S602 may further be performed as follows:
if the type of the target class is a generic type, acquiring an actual class given by the target class in the application program, and searching a target character instruction subset corresponding to the actual class in a character instruction set corresponding to a target API;
referring to the step of establishing the correspondence between the obtained generic type and the preset actual data type in the foregoing embodiment, if the type of the target class is generic, the call information indicating the API that calls the operating system in the syntax element may be obtained, and class information indicating the class assigned to the generic type in the call information may be obtained from the call information, in the correspondence between the generic type and the actual data type, the target class is matched, and the class of the actual data type obtained by matching is taken as the actual class of the target class.
For example, in the second code file of the application, the call information indicating that the API of the operating system is called in the syntax element calls class a "class a < T" having a generic T, and the generic referred to in the class is given an object B belonging to the actual class B in the call information. Then, in the correspondence between the generic type and the actual data type, the structure of the actual class to which the object b belongs may be obtained according to the structural analysis of the object b, and the structures of the classes of the actual data type recorded in the correspondence between the generic type and the preset actual data type are compared to obtain the class of the matched actual data type, which is used as the actual class of the target class.
The step of searching the target character instruction subset corresponding to the actual class is consistent with the step S602, and is not further described here.
The step S603 may be performed as follows:
and if the target component elements with the types of the extensive types exist in the target class, acquiring actual component elements endowed by the target component elements in the application program, and determining a character instruction for analyzing the target syntax elements according to the character instruction corresponding to the actual component elements in the target character instruction subset.
In an embodiment of the present disclosure, incoming parameters corresponding to the target class in the syntax elements may be obtained, matching is performed in a corresponding relationship between the generic type and a preset actual data type according to the type of the incoming parameters, and the actual data type obtained by matching is used as the actual data type of the target constituent element. For example, for a target constituent element "T name" having a generic type, the actual data type of the generic type T may be determined as the String type recorded in the generic correspondence recording file by the incoming parameter "Bob" in accordance with the generic correspondence recording file in which the correspondence is recorded.
Therefore, in the scheme provided by the embodiment of the disclosure, the actual class can be determined according to the target class of which the type is the generic type, and the actual constituent element can be determined according to the target constituent element of which the type is the generic type, so that the character instruction determined according to the above steps can correspond to the actual operation content in the operation of the application program, and the accuracy of program analysis is further improved.
In an embodiment of the present disclosure, referring to fig. 8, a schematic structural diagram of an instruction set generating apparatus is provided, including:
a code file obtaining module 801, configured to obtain a first code file of an application program interface API provided by an operating system;
a relationship information determining module 802 for determining the declared classes in the first code file and relationship information between the declared classes;
a character instruction generating module 803, configured to generate a character instruction corresponding to each constituent element according to the constituent element included in the determined class;
a relation instruction generating module 804, configured to generate a relation instruction representing a relation between the generated character instructions according to the relation information;
a character instruction set obtaining module 805, configured to obtain a character instruction set including the generated character instruction and the relationship instruction.
As can be seen from the above, in the solution provided in the embodiment of the present disclosure, according to the component elements included in the declared classes in the first code file of the API, the character instruction of each component element is generated, and according to the relationship information between the declared classes in the first code file, the relationship instruction representing the relationship between the character instructions is generated, so as to obtain the character instruction set including the corresponding character instruction and the relationship instruction. It can be seen that the scheme provided by the embodiment of the present disclosure can generate an instruction set of a corresponding API.
In one embodiment of the present disclosure, a first,
the character instruction generating module 803 is specifically configured to determine, for each component element included in the determined class, an instruction generating manner corresponding to the component element according to the type of the component element, and generate a character instruction corresponding to the component element according to the instruction generating manner.
As can be seen from the above, in the scheme provided in this embodiment, according to the types of the constituent elements, the instruction generation manner corresponding to the constituent elements is determined, and the character instructions corresponding to the constituent elements are generated according to the instruction generation manner, so that the generated character instructions can conform to the characteristics of different types of character instructions, and the accuracy of generating the character instructions is improved.
In one embodiment of the present disclosure, a first,
the relationship instruction generating module 804 is specifically configured to determine, according to the relationship information, a class group in which a class stated in the first code file has a reference relationship; determining target composition elements with reference relations among classes in various groups; determining a second reference relation between the character instructions corresponding to the target component elements according to the first reference relation between the target component elements; and generating a relation instruction which represents each second reference relation.
As can be seen from the above, in the scheme provided in this embodiment, the reference relationship of the character instruction is determined according to the reference relationship of the constituent elements, and the relationship instruction is generated, so that the character instruction having the reference relationship with the character instruction can be determined when the character instruction is obtained, the character instruction is conveniently searched and obtained, and the efficiency of using the character instruction set is improved.
In one embodiment of the present disclosure, a first,
the character instruction set obtaining module 805 is specifically configured to determine, for each character instruction subset to be generated, a target class corresponding to the character instruction subset, and generate a character instruction subset including character instructions corresponding to constituent elements of the target class; and obtaining a character instruction set according to the generated character instruction subset and the generated relation instruction.
As can be seen from the above, in the scheme provided in this embodiment, when generating the character instruction subset, the target class is used as a unit, so that the character instructions corresponding to the same type of constituent elements can be placed in the same character instruction subset. Since the character instruction subset corresponds to a class, and the API provided by the calling operating system is usually called by class, generating the character instruction subset in this way can facilitate obtaining the character instructions of the constituent elements in the same class.
In one embodiment of the present disclosure, the apparatus further comprises:
an instruction set merging module, configured to, after the character instruction set obtaining module 805 obtains a character instruction set including the generated character instruction and the relationship instruction, determine, for each character instruction subset, a code base to which an implementation code of a target class corresponding to the character instruction subset belongs, and merge the character instruction subset into an instruction set corresponding to the code base.
As can be seen from the above, in the scheme provided in this embodiment, the character instruction subset is merged to the instruction set corresponding to the code library, so that other application programs can obtain the character instructions according to the instruction set corresponding to the code library when multiplexing the code library, and can call the corresponding API according to the obtained character instructions, thereby improving the convenience of using the code library.
In one embodiment of the present disclosure, the apparatus further comprises:
a search index setting module, configured to set a search index of each character instruction subset according to a target class corresponding to each character instruction subset after the character instruction set obtaining module 805 obtains a character instruction set including the generated character instructions and the relationship instructions;
and the chain relation establishing module is used for establishing the chain relation among the search indexes of the character instruction subsets corresponding to the target classes according to the inter-class reference relation represented by the relation information.
As can be seen from the above, in the scheme provided by the embodiment of the present disclosure, the lookup indexes are set for each character instruction subset, and the chain relationship between the lookup indexes is established, so that the classes having the inter-class reference relationship can be obtained by looking up the chain relationship between the indexes, which further facilitates the use of the character instruction subsets and improves the convenience.
In one embodiment of the present disclosure, a first,
the relationship information determining module 802 is specifically configured to determine that the class declared by the first code file does not belong to a generic class. Acquiring the extensive types recorded in the first code file, and establishing a corresponding relation between the acquired extensive types and a preset actual data type; relationship information between the declared classes is determined.
As can be seen from the above, in the scheme provided in this embodiment, the corresponding relationship between the obtained generic type and the preset actual data type is established, so that when the subsequent analysis engine analyzes the information of the application program, the generic type can be analyzed according to the actual data type, and the accuracy is further improved.
In an embodiment of the present disclosure, referring to fig. 9, a schematic structural diagram of a first program analysis device is provided, including:
a syntax structure obtaining module 901, configured to obtain syntax structure data generated by compiling the second code file of the application;
a syntax element parsing module 902, configured to, when parsing each syntax element in the syntax structure data, if a parsed target syntax element indicates an API that invokes an operating system, select a character instruction for parsing the target syntax element from a character instruction set, and parse the target syntax element using the selected character instruction, where the character instruction set is: a set of instructions generated by any one of the apparatus above.
As can be seen from the above, in the solution provided in this embodiment, when parsing a syntax element in syntax structure data, if an API that needs to be called exists in the syntax element, a corresponding character instruction is selected from a character instruction set for parsing, so that an application program can execute a function of the API according to the character instruction, and usability of the application program is improved.
In an embodiment of the present disclosure, referring to fig. 10, a schematic structural diagram of a second program parsing apparatus is provided, and the syntax element parsing module 902 further includes:
a call target determining unit 9021, configured to determine, according to the target syntax element, a target API of the operating system to be called and a target class of the target API to be called;
a character instruction subset search unit 9022, configured to search, in the character instruction set corresponding to the target API, a target character instruction subset corresponding to the target class;
a character instruction determining unit 9023, configured to determine, according to the target character instruction subset, a character instruction for parsing the target syntax element.
Therefore, when the program is analyzed by applying the scheme provided by the embodiment of the disclosure, the target character instruction subset is determined according to the target class of the target API, and the character instruction is determined according to the target character instruction subset, so that the determined character instruction can be used for analyzing the syntax element representing the target class in the calling API, and the accuracy of program analysis is improved.
In one embodiment of the present disclosure, a first,
the character instruction determining unit 9023 is specifically configured to search a target search index representing the target character instruction subset in a chain relationship, where the chain relationship is: according to the inter-class reference relation represented by the relation information, establishing a linked list relation among the search indexes of the character instruction subsets corresponding to the target classes, wherein the search index of each character instruction subset is set according to the target class corresponding to each character instruction subset; determining the related search indexes which are sequentially connected in a chain manner after the target search index along the extending direction of the chain relationship; and determining a character instruction for analyzing the target syntax element according to the target character instruction subset and the character instruction subset corresponding to the determined associated lookup index.
As can be seen from the above, in the process of program parsing by applying the scheme provided by the embodiment of the present disclosure, the target lookup index and the associated lookup index may be determined by the above-mentioned chain relationship, and the character instruction for parsing the target syntax element is determined according to the determined lookup index. In the process, the chain relation enables chain connection to exist between the related search indexes, the related search indexes can be obtained according to the extending direction of the chain connection, and the efficiency of obtaining the character instruction is improved.
In one embodiment of the present disclosure, a first,
the character instruction subset search unit 9022 is specifically configured to, if the type of the target class is generic, obtain an actual class assigned to the target class in the application program, and search, in a character instruction set corresponding to the target API, for a target character instruction subset corresponding to the actual class;
or
The character instruction determining unit 9023 is specifically configured to, if there is a target component element of a generic type in the target class, obtain an actual component element assigned to the target component element in the application program, and determine, according to a character instruction corresponding to the actual component element in the target character instruction subset, a character instruction for parsing the target syntax element.
Therefore, in the scheme provided by the embodiment of the disclosure, the actual class can be determined according to the target class of which the type is the generic type, and the actual constituent element can be determined according to the target constituent element of which the type is the generic type, so that the character instruction determined according to the above steps can correspond to the actual operation content in the operation of the application program, and the accuracy of program analysis is further improved.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
In one embodiment of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the instruction set generation, program parsing method described in the method embodiments above.
In one embodiment of the present disclosure, a non-transitory computer readable storage medium is provided having stored thereon computer instructions for causing a computer to perform the instruction set generation, program parsing method described in the preceding method embodiments.
In one embodiment of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the instruction set generation, program parsing method described in the aforementioned method embodiments.
FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 1101 performs the various methods and processes described above, such as method instruction set generation. For example, in some embodiments, the method instruction set generation may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When loaded into RAM 1103 and executed by the computing unit 1101, the computer program may perform one or more of the steps generated by the set of instructions of the method described above. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform method instruction set generation by any other suitable means (e.g., by way of firmware).
FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1201 performs the respective methods and processes described above, such as method program parsing. For example, in some embodiments, the method program parsing can be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the method program parsing described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform method program parsing by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (20)

1. An instruction set generation method comprising:
obtaining a first code file of an Application Program Interface (API) provided by an operating system;
determining declared classes in the first code file and relationship information between the declared classes;
generating character instructions corresponding to the component elements according to the component elements contained in the determined classes;
generating a relation instruction representing the relation between the generated character instructions according to the relation information;
a character instruction set is obtained that includes the generated character instructions and the relationship instructions.
2. The method according to claim 1, wherein the generating of the character instruction corresponding to each component element according to the component elements included in the determined class comprises:
and for each component element contained in the determined class, determining an instruction generation mode corresponding to the component element according to the type of the component element, and generating a character instruction corresponding to the component element according to the instruction generation mode.
3. The method of claim 1, wherein generating a relationship instruction representing a relationship between the generated character instructions according to the relationship information comprises:
determining a class group of which the class declared in the first code file has a reference relation according to the relation information;
determining target composition elements with reference relations among classes in various groups;
determining a second reference relation between the character instructions corresponding to the target component elements according to the first reference relation between the target component elements;
and generating a relation instruction which represents each second reference relation.
4. The method of any of claims 1-3, wherein the obtaining a character instruction set containing the generated character instruction and the relationship instruction comprises:
aiming at each character instruction subset to be generated, determining a target class corresponding to the character instruction subset, and generating a character instruction subset containing character instructions corresponding to constituent elements of the target class;
and obtaining a character instruction set according to the generated character instruction subset and the generated relation instruction.
5. The method of claim 4, wherein after obtaining the character instruction set according to the generated character instruction subset and the generated relationship instruction, further comprising:
and aiming at each character instruction subset, determining a code base to which the implementation code of the target class corresponding to the character instruction subset belongs, and merging the character instruction subset into an instruction set corresponding to the code base.
6. The method of claim 4, wherein after obtaining the character instruction set according to the generated character instruction subset and the generated relationship instruction, further comprising:
setting a search index of each character instruction subset according to the target class corresponding to each character instruction subset;
and establishing a chain relation among the search indexes of the character instruction subsets corresponding to the target classes according to the inter-class reference relation represented by the relation information.
7. A method as claimed in any one of claims 1 to 3, wherein said determining a class declared in said first code file comprises:
determining that the declared class of the first code file does not belong to a generic class;
and acquiring the extensive types recorded in the first code file, and establishing a corresponding relation between the acquired extensive types and a preset actual data type.
8. A program parsing method, comprising:
obtaining grammar structure data generated by compiling a second code file of the application program;
when each syntactic element in the syntactic structure data is analyzed, if the analyzed target syntactic element represents an API for calling an operating system, selecting a character instruction for analyzing the target syntactic element from a character instruction set, and analyzing the target syntactic element by using the selected character instruction, wherein the character instruction set is as follows: the set of instructions generated according to the method of any one of claims 1-7.
9. The method of claim 8, wherein said selecting a character instruction from a character instruction set for parsing the target syntax element comprises:
determining a target API of an operating system to be called and a target class of the target API to be called according to the target syntax element;
searching a target character instruction subset corresponding to the target class in a character instruction set corresponding to the target API;
and determining a character instruction for analyzing the target syntax element according to the target character instruction subset.
10. The method of claim 9, wherein determining the character instruction for parsing the target syntax element from the target character instruction subset comprises:
searching a target search index representing the target character instruction subset in a chain relation, wherein the chain relation is as follows: according to the inter-class reference relation represented by the relation information, establishing a linked list relation among the search indexes of the character instruction subsets corresponding to the target classes, wherein the search index of each character instruction subset is set according to the target class corresponding to each character instruction subset;
determining the related search indexes which are sequentially connected in a chain manner after the target search index along the extending direction of the chain relationship;
and determining a character instruction for analyzing the target syntax element according to the target character instruction subset and the character instruction subset corresponding to the determined associated lookup index.
11. The method of claim 9, wherein the finding a target character instruction subset corresponding to the target class in the character instruction set corresponding to the target API comprises:
if the type of the target class is generic, acquiring an actual class given by the target class in the application program, and searching a target character instruction subset corresponding to the actual class in a character instruction set corresponding to the target API;
or
The determining, according to the target character instruction subset, a character instruction for parsing the target syntax element includes:
if the target component elements with the types of generalization exist in the target class, acquiring actual component elements endowed by the target component elements in the application program, and determining character instructions for analyzing the target syntax elements according to the character instructions corresponding to the actual component elements in the target character instruction subset.
12. An instruction set generation apparatus comprising:
the code file obtaining module is used for obtaining a first code file of an application program interface API provided by an operating system;
a relationship information determination module for determining the declared classes in the first code file and relationship information between the declared classes;
the character instruction generating module is used for generating character instructions corresponding to all the constituent elements according to the constituent elements contained in the determined classes;
the relation instruction generating module is used for generating a relation instruction which represents the relation between the generated character instructions according to the relation information;
and the character instruction set obtaining module is used for obtaining a character instruction set containing the generated character instruction and the relation instruction.
13. The apparatus of claim 12, wherein,
the character instruction generating module is specifically configured to determine, for each constituent element included in the determined class, an instruction generating manner corresponding to the constituent element according to the type of the constituent element, and generate a character instruction corresponding to the constituent element according to the instruction generating manner.
14. The apparatus of claim 12, wherein,
the relationship instruction generating module is specifically configured to determine, according to the relationship information, a class group in which a class declared in the first code file has a reference relationship; determining target composition elements with reference relations among classes in various groups; determining a second reference relation between the character instructions corresponding to the target component elements according to the first reference relation between the target component elements; and generating a relation instruction which represents each second reference relation.
15. The apparatus of any one of claims 12-14,
the character instruction set obtaining module is specifically configured to determine, for each character instruction subset to be generated, a target class corresponding to the character instruction subset, and generate a character instruction subset including character instructions corresponding to constituent elements of the target class; and obtaining a character instruction set according to the generated character instruction subset and the generated relation instruction.
16. The apparatus of claim 15, wherein the apparatus further comprises:
and an instruction set merging module, configured to, after the character instruction set obtaining module obtains the character instruction set including the generated character instructions and the relationship instructions, determine, for each character instruction subset, a code base to which an implementation code of a target class corresponding to the character instruction subset belongs, and merge the character instruction subset into the instruction set corresponding to the code base.
17. A program analysis device comprising:
the grammar structure obtaining module is used for obtaining grammar structure data generated by compiling a second code file of the application program;
a syntax element parsing module, configured to, when parsing each syntax element in the syntax structure data, select, if a parsed target syntax element indicates an API that invokes an operating system, a character instruction for parsing the target syntax element from a character instruction set, and parse the target syntax element using the selected character instruction, where the character instruction set is: the set of instructions generated by the apparatus of any one of claims 12-16.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7 or 8-11.
19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-7 or 8-11.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7 or 8-11.
CN202111465884.4A 2021-12-03 2021-12-03 Instruction set generation method, program analysis method, device and storage medium Pending CN114115908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111465884.4A CN114115908A (en) 2021-12-03 2021-12-03 Instruction set generation method, program analysis method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111465884.4A CN114115908A (en) 2021-12-03 2021-12-03 Instruction set generation method, program analysis method, device and storage medium

Publications (1)

Publication Number Publication Date
CN114115908A true CN114115908A (en) 2022-03-01

Family

ID=80366133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111465884.4A Pending CN114115908A (en) 2021-12-03 2021-12-03 Instruction set generation method, program analysis method, device and storage medium

Country Status (1)

Country Link
CN (1) CN114115908A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086441A (en) * 2022-06-09 2022-09-20 北京百度网讯科技有限公司 Information transmission method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086441A (en) * 2022-06-09 2022-09-20 北京百度网讯科技有限公司 Information transmission method and device, electronic equipment and storage medium
CN115086441B (en) * 2022-06-09 2024-03-19 北京百度网讯科技有限公司 Information transmission method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10169471B2 (en) Generating and executing query language statements from natural language
CN108139891B (en) Method and system for generating suggestions to correct undefined token errors
JP2018533126A (en) Method, system, and computer program product for a natural language interface to a database
CN109492053B (en) Method and device for accessing data
US10303689B2 (en) Answering natural language table queries through semantic table representation
US20130055291A1 (en) Describing native application programming interfaces of an operating system with metadata
CN110764748B (en) Code calling method, device, terminal and storage medium
CN104320312A (en) Network application safety test tool and fuzz test case generation method and system
US8635233B2 (en) Techniques to automatically build a language dependency graph for localizable resources
CN108959294B (en) Method and device for accessing search engine
CN114115908A (en) Instruction set generation method, program analysis method, device and storage medium
CN113220710A (en) Data query method and device, electronic equipment and storage medium
EP3301590A1 (en) Algebraic data types for database query languages
US9201937B2 (en) Rapid provisioning of information for business analytics
CN115421831A (en) Method, device, equipment and storage medium for generating calling relation of activity component
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN113760969A (en) Data query method and device based on elastic search
CN113419740A (en) Program data stream analysis method and device, electronic device and readable storage medium
CN112988778A (en) Method and device for processing database query script
CN112068814A (en) Method, device, system and medium for generating executable file
CN115563183B (en) Query method, query device and program product
US10474750B1 (en) Multiple information classes parsing and execution
CN117406971A (en) Database configuration method, device and equipment based on persistent layer framework
CN113849170A (en) Code sharing method and device, electronic equipment and storage medium
CN113849409A (en) Script detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination