CN114327469A - Code analysis method, device, equipment and medium - Google Patents

Code analysis method, device, equipment and medium Download PDF

Info

Publication number
CN114327469A
CN114327469A CN202011066291.6A CN202011066291A CN114327469A CN 114327469 A CN114327469 A CN 114327469A CN 202011066291 A CN202011066291 A CN 202011066291A CN 114327469 A CN114327469 A CN 114327469A
Authority
CN
China
Prior art keywords
code
programming language
expression
variable
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011066291.6A
Other languages
Chinese (zh)
Inventor
徐珊珊
乐永年
鲍鹏
吕志宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011066291.6A priority Critical patent/CN114327469A/en
Publication of CN114327469A publication Critical patent/CN114327469A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

In order to solve the problems that the accuracy of code analysis is low and service requirements are difficult to meet, the application provides a code analysis method, which comprises the following steps: the method comprises the steps of obtaining a first code, wherein the first code comprises a variable of a first programming language and a code based on a second programming language, then replacing a first expression in the first code, which is associated with the variable of the first programming language, to obtain a second code which accords with a grammatical rule of the second programming language, wherein the second code comprises a second expression, and then carrying out grammatical analysis on the second code to obtain a grammatical structure of the second code. The method keeps the grammatical information of the original programming language, so that the method has higher resolution accuracy and can meet the service requirement.

Description

Code analysis method, device, equipment and medium
Technical Field
The present application relates to the field of application development technologies, and in particular, to a code parsing method, apparatus, device, and computer-readable storage medium.
Background
With the generation of new programming languages, more and more developers adopt two or more programming languages for application development, so as to fully exert the advantages of different programming languages and improve the development efficiency. This type of programming is also known as mixed language programming. When a developer performs hybrid programming, the developer can write codes based on different programming languages to obtain a code file.
In many scenarios, such as similarity detection or code compilation, a developer needs to parse the code file. When a compiler analyzes the same code file including codes written in different programming languages, the analysis requirements of all codes in the code file cannot be met.
For this reason, some analytical methods have been proposed in the industry. For example, a developer may convert different programming languages in a code file into an intermediate language, and then parse the code of the intermediate language with a compiler. The method loses the grammatical information of the original programming language, has low analysis accuracy and is difficult to meet the service requirement. There is a need to provide a code parsing method with high accuracy.
Disclosure of Invention
The application provides a code analysis method, which comprises the steps of replacing a first expression in a first code, wherein the first expression is associated with a variable of a first programming language, enabling the replaced code to accord with a grammar rule of a second programming language, and then analyzing the replaced code. The method keeps the grammatical information of the original programming language as much as possible, thereby improving the analysis accuracy. The application also provides a device, equipment, a computer readable storage medium and a computer program product corresponding to the method.
In a first aspect, the present application provides a code parsing method. The method may be performed by a parser. The resolver may be a software module, and provides a code resolution service by running on a hardware device such as a computer. In some possible implementations, the parser may also be a hardware module with code parsing functionality.
Specifically, the parser obtains a first code, which includes a variable based on a first programming language and a code based on a second programming language, wherein the code based on the second programming language includes an expression associated with the variable of the first programming language. Because the first code is embedded in the code of the first programming language, the code does not conform to the grammatical rules of the second programming language, and the parser is difficult to directly parse the code which is similar to the second programming language but does not conform to the grammatical rules of the second programming language, when the parser parses the first code, the parser can replace a first expression in the first code, which is associated with a variable of the first programming language, specifically, the first expression is replaced by a second expression, so that the second code which conforms to the grammatical rules of the second programming language is obtained. Then, the parser parses the second code based on the parsing capability of the second programming language to obtain a syntactic structure of the second code.
According to the method, the first expression which does not accord with the grammar rule of the second programming language in the first code is replaced, and then the replaced code is parsed, so that the grammar information of the original programming language is reserved, the parsing accuracy is improved, and the service requirement can be met. In addition, the method reuses the original grammar parsing capability, has low implementation difficulty and low cost, can expand any current programming language, and has higher usability.
In some possible implementations, the parser may be provided to the user in the form of a software package. Specifically, all parties of the parser may publish a software package of the parser, and the user acquires the software package and then runs the software package, thereby implementing parsing of the first code. Thus, the code analysis can be performed locally (for example, locally on a computing device), and the analysis accuracy is high.
In some possible implementations, the parser may be provided to the user in the form of a cloud service. The user can upload the first code to the cloud, the analysis service of the cloud can analyze the first code, and then an analysis result is returned to the user. Because the analysis process is mainly carried out at the cloud end, and the local computing device mainly assists in code analysis, the method has low performance requirement on the local computing device, and the lightweight computing device can meet the requirement, so the method has high availability.
In some possible implementation manners, after the syntax parsing is performed on the second code, the parser may further replace the second expression in the syntax structure of the second code with the first expression according to a mapping relationship between the second expression in the second code and the first expression in the first code, so as to obtain the syntax structure of the first code. Therefore, the grammar analysis of the first code is realized, the original grammar information is reserved, and the method has higher accuracy.
In some possible implementations, the parser may also generate the syntax tree from a syntax structure of the first code. Specifically, the parser may parse the code based on the first programming language to obtain a syntax structure of the code of the first programming language, and then generate a complete syntax tree according to the syntax structure of the code of the first programming language and the syntax structure of the first code.
A syntax tree is a graphical representation of the syntax structure of a statement in a code fragment. The graphical representation may specifically be a tree graph. The syntax tree helps a user to understand the hierarchy of the syntax structure of the code, so as to provide help for subsequent code compilation or code conversion.
In some possible implementations, the first code includes characteristic information of the first expression, and the characteristic information is used for describing an association relationship between the first expression and the second programming language. For example, when the second programming language is assembly language, the characteristic information is used to indicate that the first expression is converted into an operand of a corresponding type in assembly language, including a register operand, a memory operand, or an immediate.
The register operand refers to an operand stored in a register, the memory operand refers to an operand stored in a memory, and the immediate operand refers to an operand which is not stored in the register or the memory but is directly present in an instruction.
Based on the characteristic information, the parser can convert the first expression into a second expression which accords with the grammar rule of the second programming language according to the characteristic information, so as to obtain a second code. The second code conforms to the grammar rules of the second programming language, and the parser can parse the second code to obtain the grammar structure of the second code.
The method keeps the grammatical information of the original programming language, so that the method has higher resolution accuracy.
In some possible implementations, the first programming language and the second programming language may be set according to actual needs. In some embodiments, the first programming language may be a high-level programming language, such as Java, C + +, C #, Pascal, Python, and the second programming language may be a low-level programming language, such as assembly language, although the second programming language may also be machine language.
In some possible implementations, the parser may replace the first expression with a memory operand, a register operand, or an immediate that conforms to a syntax rule of the second programming language based on characteristic information of the first expression associated with a variable of the first programming language. Therefore, the second code obtained after replacing the first expression conforms to the grammar rule of the second programming language, and the parser can parse the second code to obtain the corresponding grammar structure with higher accuracy.
In some possible implementations, the parser may provide a user interface, such as a graphical user interface or a command user interface. For ease of description, the illustration is in a graphical user node. In some embodiments, the parser may present the code in the code file to the user through a graphical user interface, and then the parser may receive the first code selected by the user through the graphical user interface. Therefore, the codes appointed by the user can be analyzed, and the personalized requirements are met.
In some possible implementations, the parser may provide a user interface, such as a graphical user interface or a command user interface. In some embodiments, the parser may receive a first code input by a user through a graphical user interface. Thus, the syntax analysis of the whole code file can be realized.
In some possible implementations, the parser may also obtain the first code according to a keyword of the second programming language. For example, the parser may obtain the first code by obtaining code from a code file that includes keywords in the second programming language. Taking the second programming language as an assembly language as an example, the keyword is asm, and the parser can automatically recognize a code statement including asm in the code file, so as to obtain the first code. Therefore, the first code in the code file can be automatically analyzed, the user operation is simplified, and the user experience is improved.
In a second aspect, the present application provides a code parsing apparatus. The device includes:
the communication module is used for acquiring a first code, and the first code comprises a variable of a first programming language and a code based on a second programming language;
the replacing module is used for replacing a first expression in the first code, which is associated with a variable of the first programming language, so as to obtain a second code which accords with a grammatical rule of a second programming language, wherein the second code comprises the second expression;
and the parsing module is used for carrying out syntax parsing on the second code to obtain a syntax structure of the second code.
In some possible implementations, the replacement module is further to:
and after the syntax analysis is carried out on the second code, replacing the second expression in the syntax structure of the second code with the first expression according to the mapping relation between the second expression in the second code and the first expression in the first code to obtain the syntax structure of the first code.
In some possible implementations, the apparatus further includes:
and the generating module is used for generating a syntax tree according to the syntax structure of the first code.
In some possible implementation manners, the first code includes characteristic information of the first expression, and the characteristic information is used for describing an association relationship between the first expression and the second programming language;
the replacement module is specifically configured to:
and replacing the first expression in the first code, which is associated with the variable of the first programming language, with a second expression which conforms to the grammar rule of the second programming language according to the characteristic information.
In some possible implementations, the first programming language is a high-level programming language and the second programming language is a low-level programming language.
In some possible implementations, the replacement module is specifically configured to:
and replacing the first expression by a memory operand, a register operand or an immediate according to the grammar rule of the second programming language according to the characteristic information of the first expression related to the variable of the first programming language.
In some possible implementations, the apparatus further includes:
the display module is used for presenting the codes in the code file to a user through a graphical user interface;
the communication module is specifically configured to:
a first code selected by a user through a graphical user interface is received.
In a third aspect, the present application provides a computing device comprising a processor and a memory. The processor and the memory are in communication with each other. The processor is configured to execute the instructions stored in the memory to cause the computing device to perform the code parsing method as in the first aspect or any implementation manner of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having instructions stored therein, the instructions instructing a computing device to execute the code parsing method according to the first aspect or any implementation manner of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a computing device, cause the computing device to perform the code resolution method of the first aspect or any of the implementations of the first aspect.
The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.
Drawings
In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings used in the embodiments will be briefly described below.
Fig. 1 is a system architecture diagram of a code parsing method according to an embodiment of the present application;
fig. 2 is a system architecture diagram of a code parsing method according to an embodiment of the present application;
fig. 3 is a flowchart of a code parsing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a code segment provided in an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a first code received from a user via a GUI according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating parsing of a replaced second code according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a syntax structure of a code segment according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a syntax tree structure according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a code analysis apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
The terms "first" and "second" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
Some technical terms referred to in the embodiments of the present application will be first described.
A programming language (programming language) is a formal language used to define computer programs, and is also called a computer language. When a developer develops an application, the application is realized by writing a code file through the programming language.
Programming languages may be classified into a low-level programming language and a high-level programming language according to whether they are machine-oriented. Wherein the machine-oriented (e.g., computer) programming language is a low-level programming language. Low-level programming languages include machine languages, assembly languages, and the like.
Machine languages are languages that are represented using binary code. Machine language is a language that a computer can uniquely identify and execute. The assembly language is a language for expressing operation codes in machine instructions by names and symbols which are easy to understand and remember, so as to solve the defect that the machine language is difficult to understand and remember. Assembly language replaces binary code of machine language with symbols, so assembly language is essentially a symbolic language.
It should be noted that code files written in assembly language are generally not directly recognizable by a machine (e.g., a computer). To this end, developers typically also utilize assembler programs to translate assembly language into machine language so that the machine can be identified and executed.
The programming language other than the machine-oriented programming language is a high-level programming language. A high level programming language is a machine independent language. That is, high-level programming languages may be generally applied to different types of machines, such as those applied to the x86 architecture, or those applied to the advanced reduced instruction set machine (ARM) architecture. The high-level programming language is less dependent on the machine.
The high-level program language is usually close to natural language and can use mathematical expressions, so that the high-level program language has stronger expression capability, can conveniently represent the operation of data and the control structure of a program, and can better describe various algorithms. The high-level programming language specifically includes Java, C, C + +, C #, Pascal, Python, and other different languages. Similar to assembly language, high-level programming languages cannot be directly recognized and executed by a machine. A developer may compile a code file written based on a high-level program language through a compiler, such as a compiler, so as to be recognized and executed by a machine.
In consideration of the advantages of different programming languages, for example, a machine language and an assembly language have the advantages of high execution efficiency, small code file and the like, while a high-level programming language has stronger expression capability and can be suitable for different platforms and the like, developers can develop by adopting a plurality of programming languages when developing applications, so that the advantages of different programming languages can be exerted, the efficiency of application development is improved, and the performance of applications is improved.
Among them, a mode of development using two or more programming languages is also called hybrid programming. One language can be used as a main language for writing a main framework of the application, and the other languages can be used as auxiliary languages for improving the performance of the application, such as improving the execution efficiency of the application. For the convenience of understanding, taking the mixed programming of C language and assembly language as an example, a developer may write a framework of an application in C language, and write a part of the application, which requires high execution efficiency, in assembly language.
When code of another program language (for example, the second program language) is embedded in code of one program language (for example, the first program language), and code of an assembly language is embedded in code of a high-level program language such as C language, the embedded code does not completely conform to the grammar rule of the second program language (for example, the assembly language) but conforms to a new grammar rule due to the need of linking with code of the first program language. The new grammar rules are different from the grammar rules of the first programming language and the grammar rules of the second programming language, and are used for expressing variables of the first language in codes of the second programming language, for example, expressing variables of the C language in codes of an assembly language.
In many scenarios, for example, when performing similarity matching or when compiling a code file, syntax parsing of the code file is required. Parsing is also known as syntax analysis. Parsing refers to analyzing an input text composed of a sequence of words (e.g., a sequence of english words) according to a given grammar to determine its grammatical structure.
Where a grammar is a formal rule used to describe the grammatical structure of a language. Each formal rule may be characterized by a quadruple (V, T, P, S). Where V denotes a finite set of arguments or variables (variables), T denotes a finite set of terminal symbols (terminal symbols), P denotes a finite set of production equations (production), and S denotes a start symbol or an identification symbol of the formal rule.
At present, the syntax parsing is mainly performed by converting the code file into an intermediate language in the industry for the code file generated by developing the hybrid programming method. For example, when a code file includes a code fragment written based on a high-level programming language and an assembly language, the parser may convert the code fragment into an intermediate language for parsing. However, the method loses the grammatical information of the original programming language, has low resolution accuracy and is difficult to meet the service requirement.
In view of this, the present application provides a code parsing method. The method may be performed by a parser. The resolver may be a software module, and provides a code resolution service by running on a hardware device such as a computer. In some possible implementations, the parser may also be a hardware module with code parsing functionality.
Specifically, the parser obtains a first code, which includes a variable based on a first programming language and a code based on a second programming language, wherein the code based on the second programming language includes an expression associated with the variable of the first programming language. Because the first code is embedded in the code of the first programming language, the code does not conform to the grammatical rules of the second programming language, and the parser is difficult to directly parse the code which is similar to the second programming language but does not conform to the grammatical rules of the second programming language, when the parser parses the first code, the parser can replace a first expression in the first code, which is associated with a variable of the first programming language, specifically, the first expression is replaced by a second expression, so that the second code which conforms to the grammatical rules of the second programming language is obtained. Then, the parser parses the second code based on the parsing capability of the second programming language to obtain a syntactic structure of the second code.
According to the method, the first expression which does not accord with the grammar rule of the second programming language in the first code is replaced, and then the replaced code is parsed, so that the grammar information of the original programming language is reserved, the parsing accuracy is improved, and the service requirement can be met. In addition, the method reuses the original grammar parsing capability, has low implementation difficulty and low cost, can expand any current programming language, and has higher usability.
It should be noted that the parser provided in the embodiments of the present application may be provided to the user in the form of a software package. Specifically, all parties of the parser may publish a software package of the parser, and the user acquires the software package and then runs the software package, thereby implementing parsing of the first code.
In some possible implementations, the parser provided in the embodiments of the present application may be provided to a user in the form of a cloud service. The user can upload the first code to the cloud, the analysis service of the cloud can analyze the first code, and then an analysis result is returned to the user.
In order to make the technical solution of the present application clearer and easier to understand, the following describes in detail the deployment manner of the resolver with reference to the accompanying drawings.
In some possible implementations, referring to the system architecture diagram of the code parsing method shown in fig. 1, the editor 102 and the parser 104 are deployed on the same computing device. Computing devices include, but are not limited to, desktop, laptop, and like devices. Wherein, the editor 102 is used for code editing and the parser 104 is used for code parsing. The editor 102 may be a separate editor, such as a desktop + +, or may be a development tool integrated with the editor function, such as an Integrated Development Environment (IDE).
The editor 102 receives a code file edited by a user, the code file including a first code, and the editor 102 invokes the parser 104 to parse the first code in response to a user-triggered parsing operation. Specifically, the parser 104 replaces a first expression associated with a variable of the first programming language in the first code to obtain a second code that conforms to a grammar rule of the second programming language, and then parses the second code. Thereby enabling parsing of the first code locally (e.g., locally at a computing device).
In other possible implementations, referring to the system architecture diagram of the code parsing method shown in fig. 2, the editor 102 and the parser 104 are deployed in a cloud environment, and a user accesses the editor 102 and the parser 104 in the cloud environment through a computing device, so as to implement code editing and code parsing. Wherein the cloud environment indicates a cloud computing cluster owned by a cloud service provider for providing computing, storage, and communication resources. The cloud computing clusters may be divided into a center cloud and an edge cloud according to location in the network topology. The cloud computing cluster includes at least one cloud computing device, such as at least one central server, or at least one edge server.
It is noted that editor 102 and parser 104 may be provided by the same cloud service provider. Correspondingly, editor 102 and analyzer 104 may be deployed on the same cloud computing cluster. In some possible implementations, editor 102 and parser 104 may be provided by different cloud service providers. Correspondingly, editor 102 and parser 104 may be deployed in different cloud computing clusters. Fig. 2 is an illustration of editor 102 and analyzer 104 deployed on the same cloud computing cluster.
Fig. 1 and fig. 2 only exemplarily describe some deployment manners of the editor 102 and the parser 104, and in other possible implementation manners of the embodiment of the present application, the editor 102 and the parser 104 may also be deployed in other manners. For example, in some possible implementations, the editor 102 may be deployed on a computing device and the parser 104 is deployed in a cloud environment, or the editor 102 may be deployed in a cloud environment and the parser 104 may be deployed on a computing device. The embodiments of the present application do not limit this.
Next, a code parsing method provided by the embodiment of the present application is described in detail from the perspective of the parser 104 with reference to the drawings.
Referring to fig. 3, a flowchart of a code parsing method is shown, the method comprising:
s302: parser 104 obtains the first code.
When the application development is performed by using the hybrid programming, for example, when the application development is performed by using the first programming language and the second programming language, the developed code file includes not only the code segment of the first programming language but also the code embedded in the code segment of the first programming language.
For ease of description, the present application refers to code embedded in a code fragment of a first programming language as first code. The first code includes code based on a second programming language. The first code also includes variables of the first programming language for linking with code fragments of the first programming language. Wherein the code based on the second programming language comprises expressions associated with variables of the first programming language. An expression specifically refers to a string of symbols formed of at least one of characters and operators, the string of symbols having a physical meaning, such as addresses that may be used to express variables, and the like.
The first programming language and the second programming language are different programming languages. The first programming language, the second programming language may be any one of a high level programming language or a low level programming language. The user can select the first programming language and the second programming language from the high-level programming language or the low-level programming language according to the business requirement.
In some embodiments, the first programming language may be a high-level programming language, such as C, Java, Python, etc., and the second programming language may be a low-level programming language, such as assembly language, machine language, etc. For convenience of description, the first programming language is C language and the second programming language is assembly language.
For the convenience of understanding, the embodiment of the present application also provides a specific example to illustrate the first code.
Referring to the code segment shown in fig. 4, the code segment is a code segment in a code file developed by a hybrid programming manner, wherein, as shown in a labeling box 402 in fig. 4, the code segment in the labeling box 402 is a first code, as follows:
_asm__volatile_(“crc32b%1,%0”)”
:“+r”(result)
:“rm”(hello[0]));
where the asm tag in "asm __ release _" is here the key for the embedded assembly. "crc 32 b% 1,% 0" is code based on a second programming language (e.g., assembly language), and result, hello [0] following the colon is a variable of the first programming language (e.g., C language). The% 1,% 0, etc. in "crc 32 b% 1,% 0" is the first expression associated with the variables hello [0], result. Since the "crc 32 b% 1,% 0" part does not conform to the syntax rule of assembly language, it is difficult to parse the syntax structure of the first code during normal syntax parsing, and the parser 104 of the embodiment of the present application can parse the syntax structure of the first code to obtain the corresponding syntax structure.
In some implementations, the parser 104 may present the code in the code file to the user through a user interface, such as a Graphical User Interface (GUI) or a Command User Interface (CUI). The parser 104 may then receive the first code selected by the user through the user interface.
Referring to the schematic diagram of FIG. 5 showing the user-selected first code received via the GUI, the parser 104 presents the code 502 in the code file via the GUI, and the user may select the first code 504 in the GUI-presented code, which includes variables in the first programming language and code based on the second programming language, and then trigger a parsing operation on the user-selected first code 504, such as triggering a parsing operation on the user-selected code 504 via a parsing control 508 in the right-click menu 506.
In some possible implementations, the parser 104 may automatically retrieve the first code from the code file according to keywords of the second programming language. For example, when the second programming language is assembly language, the keyword may be asm (specifically, an abbreviation of assembly). The parser 104 may automatically obtain the code in the code file that includes the asm field so that the parser 104 may obtain the first code.
Further, when the code selected by the user is long, the parser 104 may also identify a statement including a keyword (e.g., asm) from the code selected by the user, thereby obtaining the first code.
S304: the parser 104 replaces the first expression associated with the variable of the first programming language in the first code to obtain a second code conforming to the grammatical rule of the second programming language.
Specifically, the first code includes feature information of a first expression associated with a variable of a first programming language. The characteristic information is used for describing the incidence relation between the first expression and the second programming language. For example, when the second programming language is assembly language, the characteristic information is used to indicate that the first expression is converted into an operand of a corresponding type in assembly language, including a register operand, a memory operand, or an immediate. The register operand refers to an operand stored in a register, the memory operand refers to an operand stored in a memory, and the immediate operand refers to an operand which is not stored in the register or the memory but is directly present in an instruction.
The code fragment of fig. 4 is still used as an example for illustration. As shown in fig. 4, the first code includes: "+ r" (result) and "rm" (hello [0]), where expression% 1 corresponds to the variable hello [0], and expression% 0 corresponds to the variable result. "+ r" represents a feature of% 0, and "rm" represents a feature of% 1. The parser 104 may extract feature information of the first expression, such as "+ r" of the feature information of% 0 and "rm" of the feature information of% 1, from the code fragment. Wherein r represents a register, m represents a memory, and rm represents both the register and the memory.
The parser 104 may replace the first expression associated with the variable of the first programming language with a second expression conforming to the grammar rule of the second programming language according to the feature information of the first expression, to obtain the second code. The second code includes the second expression. Taking fig. 4 as an example, the parser 104 may replace the first expression, such as% 0 and% 1, in the first code with a register operand, a memory operand, or an immediate that can be recognized by the assembly instruction according to the syntax rule of the assembly language, so as to obtain the second code. Where,% 0 may be replaced with any general register operand, e.g.,% 0 may be replaced with% eax and% 1 may be replaced with a memory operand, e.g.,% 1 may be replaced with foo. Correspondingly, the second expression in the second code is% eax, foo, etc.
In some possible implementations, the parser 104 may also store a mapping relationship of the second expression in the second code and the first expression in the first code. For example, parser 104 may store the following mapping:
%0→%eax;
%1→foo。
s306: the parser 104 parses the second code to obtain a syntactic structure of the second code.
The second code conforms to the grammar rules of the second programming language and the parser 104 may parse the second code directly using the existing parsing module of the second programming language. For example, when the second programming language is assembly language, the parser 104 may directly parse the assembly instruction by using a parsing module in the compiler framework, such as a low level virtual machine-machine code (llvm-mc) module, to obtain a corresponding syntax structure.
The syntactic structure parsed by the parser 104 may specifically be: instruction + operand type + operand attribute information. Taking "crc 32 b% 1,% 0" in the code segment in fig. 4 as an example, it is replaced with "crc 32b foo,% eax", and the parser 104 parses the corresponding syntax structure obtained by the replacement of "crc 32b foo,% eax". The syntax structure describes the instruction as crc32b, and the operand types of the operands associated with the instruction are memory operands and register operands. In addition, attribute information of the operand is also described in the syntax structure, such as ModeSize of the memory operand 64, Scale 1, Disp foo, where Disp represents memory offset displacement.
In some possible implementations, the parser 104 may further replace the second expression in the syntactic structure of the second code with the first expression according to a mapping relationship between the second expression in the second code and the first expression in the first code, so as to obtain the syntactic structure of the first code.
Referring to fig. 6 specifically, the parser 104 parses "crc 32b foo,% eax" to obtain a syntactic structure, specifically:
crc32b,Memory:ModeSize=64,Scale=1,Disp=foo,Reg=eax
the parser 104 performs inverse replacement on the syntax structure of the replaced code segment by using the replaced code according to the mapping relation "% 0 →% eax,% 1 → fool", specifically, replacing the second expression in the syntax structure of the second code by using the first expression, as follows:
crc32b,Memory:ModeSize=64,Scale=1,Disp=%1,Reg=%0
the parser 104 performs structure association to obtain a syntactic structure of the first code, as follows:
crc32b,Mem:%1,Reg:%0。
considering that the code file further includes a code segment written based on the first programming language, the parser 104 may parse the code segment written based on the first programming language and the replaced code segment (specifically, the second code) conforming to the grammar rule of the second programming language together when performing the grammar parsing, for example, the code may be parsed by using an XY grammar parsing module (X represents the first programming language, for example, the C language, and Y represents the second programming language, for example, the assembly language), so as to obtain the corresponding grammar structure. The parser may then replace the second expression in the syntactic structure of the second code with the first expression, resulting in the syntactic structure of the first code.
Fig. 7 shows a syntax structure obtained by parsing the code fragment shown in fig. 4. As shown in fig. 7, the syntax structure in the label box 702 is the syntax structure corresponding to the assembler instruction "crc 32 b% 1,% 0", and the syntax structure before the label box 702 is the syntax structure corresponding to the following statements:
char*hello=“hello”;
unsigned result=0。
parser 104, after generating the grammar structure for the first code, may output the grammar structure. In some embodiments, parser 104 may output the grammar structure in the form of a file. In other embodiments, the parser 104 may present the syntactic structure of the first code to the user through a user interface, such as a GUI or CUI.
Further, parser 104 may also generate a syntax tree based on the syntax structure of the first code. In some embodiments, the parser 104 may generate a syntax tree based on the syntactic structure of the code snippet in the first programming language and the syntactic structure of the first code. A syntax tree is a graphical representation of the syntax structure of a statement in a code fragment. The graphical representation may specifically be a tree graph. For example, for an assignment statement i ═ a + b ×, c, the syntax tree corresponding to the statement can be seen in fig. 8. As shown in FIG. 8, each node of the syntax tree is a sequence of words (also referred to as tokens) in the assignment statement. The syntax tree described above helps the user to understand the hierarchy of the syntax structure of the code.
Based on the above description, the embodiments of the present application provide a code parsing method. Specifically, for a first code including a variable of a first programming language and a code based on a second programming language, the parser 104 may replace a first expression associated with the variable of the first programming language in the first code when parsing the first code, so as to obtain a second code that conforms to a syntax rule of the second programming language. Parser 104 then parses the second code.
According to the method, the first expression which does not accord with the grammar rule of the second programming language in the first code is replaced, so that the grammar information of the original programming language is reserved, the parsing accuracy is improved, and the service requirement can be met. In addition, the method reuses the original grammar parsing capability, has low implementation difficulty and low cost, can expand any current programming language, and has higher usability.
In some mainstream open source software, such as cloudstock, OpenStack, moby, or kubernets, multiple programming languages are used. The mixed programming of multiple programming languages can utilize the characteristics of various programming languages to the maximum extent, and the maximum application efficiency is realized. The code analysis method of the embodiment of the application provides another implementation mode for grammar analysis of mixed programming of multiple programming languages, improves the accuracy of grammar analysis, and meets the requirements of services.
The code parsing method provided by the embodiment of the present application is described in detail above with reference to fig. 1 to 8, and the apparatus provided by the embodiment of the present application is described below with reference to the accompanying drawings.
Referring to the schematic structural diagram of the code analysis apparatus shown in fig. 9, the code analysis apparatus 900 is used for implementing the functions of the parser 100, and the apparatus 900 includes:
a communication module 902 for obtaining a first code, the first code comprising a variable of a first programming language and a code based on a second programming language;
a replacing module 904, configured to replace a first expression associated with a variable of a first programming language in the first code, to obtain a second code that conforms to a syntax rule of a second programming language, where the second code includes the second expression;
and the parsing module 906 is configured to perform syntax parsing on the second code to obtain a syntax structure of the second code.
In some possible implementations, the replacement module 904 is further configured to:
and after the syntax analysis is carried out on the second code, replacing the second expression in the syntax structure of the second code with the first expression according to the mapping relation between the second expression in the second code and the first expression in the first code to obtain the syntax structure of the first code.
In some possible implementations, the apparatus 900 further includes:
and the generating module is used for generating a syntax tree according to the syntax structure of the first code.
In some possible implementation manners, the first code includes characteristic information of the first expression, and the characteristic information is used for describing an association relationship between the first expression and the second programming language;
the replacement module 904 is specifically configured to:
and replacing the first expression in the first code, which is associated with the variable of the first programming language, with a second expression which conforms to the grammar rule of the second programming language according to the characteristic information.
In some possible implementations, the first programming language is a high-level programming language and the second programming language is a low-level programming language.
In some possible implementations, the replacing module 904 is specifically configured to:
and replacing the first expression by a memory operand, a register operand or an immediate according to the grammar rule of the second programming language according to the characteristic information of the first expression related to the variable of the first programming language.
In some possible implementations, the apparatus 900 further includes:
the display module is used for presenting the codes in the code file to a user through a graphical user interface;
the communication module is specifically configured to:
a first code selected by a user through a graphical user interface is received.
The code analysis apparatus 900 according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of each module/unit of the code analysis apparatus 900 are respectively for implementing the corresponding flow of each method in the embodiment shown in fig. 3, and are not described herein again for brevity.
The embodiment of the application also provides a computing device 1000. The computing device 1000 may be a laptop computer, a desktop computer, or other end-side device. The computing apparatus 1000 is specifically configured to implement the functions of the code parsing apparatus 900 in the embodiment shown in fig. 9.
Fig. 10 provides a schematic diagram of a computing device 1000, and as shown in fig. 10, the computing device 1000 includes a bus 1001, a processor 1002, a communication interface 1003, and a memory 1004. The processor 1002, the memory 1004, and the communication interface 1003 communicate with each other via a bus 1001.
The bus 1001 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
The processor 1002 may be any one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Micro Processor (MP), a Digital Signal Processor (DSP), and the like.
The communication interface 1003 is used for communication with the outside. For example, obtaining first code, the first code including variables of a first programming language and code based on a second programming language; or outputting the syntax structure of the first code, etc.
The memory 1004 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 1004 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a Hard Disk Drive (HDD), or a Solid State Drive (SSD).
The memory 1004 stores executable code that the processor 1002 executes to perform the code parsing method described above.
Specifically, in the case of implementing the embodiment shown in fig. 9, and in the case that the modules of the code parsing apparatus 900 described in the embodiment of fig. 9 are implemented by software, software or program codes required for executing the functions of the modules in fig. 9, such as the replacing module 904, the parsing module 906, and the like, are stored in the memory 1004. The communication module 902 functions are implemented by the communication interface 1003.
Specifically, the communication interface 1003 obtains a first code, where the first code includes a variable of a first programming language and a code based on a second programming language, the communication interface 1003 transmits the first code to the processor 1002 through the bus 1001, the processor 1002 executes program codes corresponding to modules stored in the memory 1004, such as program codes corresponding to the replacing module 904 and the parsing module 906, to perform a step of replacing a first expression associated with the variable of the first programming language in the first code to obtain a second code that conforms to a syntax rule of the second programming language, and performs syntax parsing on the second code to obtain a syntax structure of the second code.
In some possible implementations, the processor 1002 is further configured to execute the executable code to perform the steps of:
and replacing the second expression in the syntactic structure of the second code with the first expression according to the mapping relation between the second expression in the second code and the first expression in the first code to obtain the syntactic structure of the first code.
In some possible implementations, the processor 1002 is further configured to execute the executable code to perform the steps of:
a syntax tree is generated based on the syntax structure of the first code.
Optionally, the processor 1002 may also be configured to execute method steps corresponding to other possible implementations in the embodiment shown in fig. 3.
The computing device 1000 is illustrated as a device located at an end side, and the embodiment of the present application further provides a cloud computing device in the cloud environment 300, such as a central server. The cloud computing device has a similar structure to the computing device 1000 on the end side, and has the same function as the computing device 1000, that is, a function of parsing the first code.
The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, modules or modules, and may be an electrical, mechanical or other form for information interaction between the devices, modules or modules. The embodiment of the present application does not limit the specific connection medium among the communication interface 1003, the processor 1002, and the memory 1004. Such as memory, processor, and communication interfaces may be connected by a bus. The bus may be divided into an address bus, a data bus, a control bus, etc.
Based on the above embodiments, the present application further provides a computer storage medium, where a software program is stored, and when the software program is read and executed by one or more processors, the method performed by the peer-side device and the cloud computing device provided in any one or more of the above embodiments may be implemented. The computer storage medium may include: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
Based on the above embodiments, the present application further provides a chip, where the chip includes a processor, and is configured to implement the functions of the end-side device or the cloud computing device according to the foregoing embodiments, for example, to implement the methods executed by the computing apparatus in fig. 1 to fig. 2 and the cloud computing device in the cloud environment.
Optionally, the chip further comprises a memory for the processor to execute the necessary program instructions and data. The chip may be constituted by a chip, or may include a chip and other discrete devices.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (15)

1. A method for code resolution, the method comprising:
acquiring a first code, wherein the first code comprises a variable of a first programming language and a code based on a second programming language;
replacing a first expression in the first code, which is associated with a variable of the first programming language, to obtain a second code which conforms to a grammatical rule of the second programming language, wherein the second code comprises a second expression;
and carrying out syntax analysis on the second code to obtain a syntax structure of the second code.
2. The method of claim 1, wherein after parsing the second code, the method further comprises:
and replacing the second expression in the syntactic structure of the second code with the first expression according to the mapping relation between the second expression in the second code and the first expression in the first code to obtain the syntactic structure of the first code.
3. The method of claim 2, further comprising:
and generating a syntax tree according to the syntax structure of the first code.
4. The method according to any one of claims 1 to 3, wherein the first code includes characteristic information of the first expression, and the characteristic information is used for describing an association relationship between the first expression and the second programming language;
the replacing of the first expression in the first code associated with the variable of the first programming language, comprising:
and replacing a first expression in the first code, which is associated with a variable of the first programming language, with a second expression which conforms to the grammar rule of the second programming language according to the characteristic information.
5. The method of any of claims 1 to 4, wherein the first programming language is a high level programming language and the second programming language is a low level programming language.
6. The method of claim 5, wherein replacing the first expression in the first code associated with the variable of the first programming language comprises:
and replacing the first expression by a memory operand, a register operand or an immediate which accords with the grammar rule of the second programming language according to the characteristic information of the first expression which is related to the variable of the first programming language.
7. The method according to any one of claims 1 to 6, further comprising:
presenting the code in the code file to a user through a graphical user interface;
the acquiring the first code includes:
receiving a first code selected by the user through the graphical user interface.
8. A code parsing apparatus, the apparatus comprising:
a communication module for obtaining a first code, the first code comprising a variable of a first programming language and a code based on a second programming language;
a replacement module, configured to replace a first expression associated with a variable of the first programming language in the first code, to obtain a second code that conforms to a syntax rule of the second programming language, where the second code includes a second expression;
and the parsing module is used for carrying out syntax parsing on the second code to obtain a syntax structure of the second code.
9. The apparatus of claim 8, wherein the replacement module is further configured to:
and after the second code is subjected to syntactic analysis, replacing the second expression in the syntactic structure of the second code with the first expression according to the mapping relation between the second expression in the second code and the first expression in the first code to obtain the syntactic structure of the first code.
10. The apparatus of claim 9, further comprising:
and the generating module is used for generating a syntax tree according to the syntax structure of the first code.
11. The apparatus according to any one of claims 8 to 10, wherein the first code includes characteristic information of the first expression, the characteristic information being used to describe an association relationship between the first expression and the second programming language;
the replacement module is specifically configured to:
and replacing a first expression in the first code, which is associated with a variable of the first programming language, with a second expression which conforms to the grammar rule of the second programming language according to the characteristic information.
12. The apparatus of any of claims 8 to 11, wherein the first programming language is a high level programming language and the second programming language is a low level programming language.
13. The apparatus of claim 12, wherein the replacement module is specifically configured to:
and replacing the first expression by a memory operand, a register operand or an immediate which accords with the grammar rule of the second programming language according to the characteristic information of the first expression which is related to the variable of the first programming language.
14. The apparatus of any one of claims 7 to 13, further comprising:
the display module is used for presenting the codes in the code file to a user through a graphical user interface;
the communication module is specifically configured to:
receiving a first code selected by the user through the graphical user interface.
15. A computing device, wherein the computing device comprises a processor and a memory;
the processor is to execute instructions stored in the memory to cause the computing device to perform the method of any of claims 1 to 7.
CN202011066291.6A 2020-09-30 2020-09-30 Code analysis method, device, equipment and medium Pending CN114327469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011066291.6A CN114327469A (en) 2020-09-30 2020-09-30 Code analysis method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011066291.6A CN114327469A (en) 2020-09-30 2020-09-30 Code analysis method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114327469A true CN114327469A (en) 2022-04-12

Family

ID=81032501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011066291.6A Pending CN114327469A (en) 2020-09-30 2020-09-30 Code analysis method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114327469A (en)

Similar Documents

Publication Publication Date Title
CN106919434B (en) Code generation method and device
US11151018B2 (en) Method and apparatus for testing a code file
CN110704063B (en) Method and device for compiling and executing intelligent contract
EP3365772B1 (en) Missing include suggestions for external files
CN110688122B (en) Method and device for compiling and executing intelligent contract
CN103218294B (en) A kind of adjustment method of embedded system, debugging conversion equipment and system
US10613844B2 (en) Using comments of a program to provide optimizations
US7917899B2 (en) Program development apparatus, method for developing a program, and a computer program product for executing an application for a program development apparatus
US20230114540A1 (en) Checking source code validity at time of code update
US8990789B2 (en) Optimizing intermediate representation of script code by eliminating redundant reference count operations
US10599447B2 (en) System construction assistance system and method, and storage medium
CN111736840A (en) Compiling method and running method of applet, storage medium and electronic equipment
JP2004295398A (en) Compiler, method for compiling and program developing tool
JP4638484B2 (en) Data integrity in data processing equipment
CN111124479B (en) Method and system for analyzing configuration file and electronic equipment
US20090320007A1 (en) Local metadata for external components
JP2004341671A (en) Information processing system, control method, control program and recording medium
CN110352400B (en) Method and device for processing message
WO2022068556A1 (en) Code translation method and apparatus, and device
CN116107524B (en) Low-code application log processing method, medium, device and computing equipment
CN114327469A (en) Code analysis method, device, equipment and medium
CN115629762A (en) JSON data processing method and device, electronic equipment and storage medium
CN115640279A (en) Method and device for constructing data blood relationship
CN114174983B (en) Method and system for optimized automatic verification of advanced constructs
CN113760291A (en) Log output method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination