CN114237607A - Unreachable statement identification method, C language and Java conversion method and device - Google Patents

Unreachable statement identification method, C language and Java conversion method and device Download PDF

Info

Publication number
CN114237607A
CN114237607A CN202111363168.5A CN202111363168A CN114237607A CN 114237607 A CN114237607 A CN 114237607A CN 202111363168 A CN202111363168 A CN 202111363168A CN 114237607 A CN114237607 A CN 114237607A
Authority
CN
China
Prior art keywords
unreachable
language
syntax tree
processed
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111363168.5A
Other languages
Chinese (zh)
Inventor
杨辉
陈晗
吴德柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202111363168.5A priority Critical patent/CN114237607A/en
Publication of CN114237607A publication Critical patent/CN114237607A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Stored Programmes (AREA)

Abstract

The application relates to an unreachable statement identification method, a C language and JAVA conversion method and a device. The unreachable statement identification method comprises the following steps: acquiring a C language program to be processed; performing lexical analysis on the C language program to be processed to obtain words to be processed; carrying out syntactic analysis on the word to be processed to establish a syntactic tree; traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree. The method for converting the C language program into the Java program comprises the following steps: identifying the C language program according to an unreachable statement identification method in the C language to obtain an unreachable statement; and converting the C language program into a Java program according to the unreachable statement. By adopting the method, unreachable sentences in the C language can be identified.

Description

Unreachable statement identification method, C language and Java conversion method and device
Technical Field
The present application relates to the field of code conversion technologies, and in particular, to an unreachable statement identification method, a C language and JAVA conversion method and apparatus.
Background
The automatic conversion of the programming language can shorten a large amount of development cycles in the fields of software transplantation, maintenance and the like, and save the development cost of software. However, at present, the conversion research from C language to Java language is less at home and abroad, and further the grammar checking research on C language before C2J is less. In the system from Cobol to Java translation, goto and perfor compound statement elimination is reliably achieved through an algorithm.
Currently, syntax checking for C language is mainly implemented by using some tools, such as C2J + + tools or C2J tools from Novosoft corporation, through a certain algorithm or rule.
However, some existing syntax checking schemes in C language before C2J have many disadvantages, and the C2J + + tool can convert C/C + + source code into a Java program which is basically equivalent semantically, but cannot process goto statements, and cannot distinguish whether a sign is an indirect operator as a multiplication operator or a pointer; the C2J tool from Novosoft corporation solves many of the problems in the C2J + + tool, such as the goto statement. But it has a disadvantage that the code expansion rate is too large and the logic of the converted code is difficult to understand. This is mainly because it adds many complex conversion modes to simulate the behavior of C-source programs, and it stores various data structures in a huge array, avoids the type check and runtime security check of Java, and is difficult to combine with the mainstream Java.
Disclosure of Invention
Accordingly, it is desirable to provide an unreachable statement identification method, a C language and JAVA conversion method and apparatus capable of identifying unreachable statements in the C language.
In a first aspect, the present application provides a method for identifying unreachable sentences in C language, where the method for identifying unreachable sentences in C language includes:
acquiring a C language program to be processed;
performing lexical analysis on the C language program to be processed to obtain words to be processed;
carrying out syntactic analysis on the word to be processed to establish a syntactic tree;
traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree.
In one embodiment, the lexical analysis of the to-be-processed C language program to obtain a to-be-processed word includes:
preprocessing the C language program to be processed to delete useless symbols;
and recognizing the preprocessed C language program to be processed according to a keyword recognition rule to obtain a word to be processed.
In one embodiment, the parsing the word to be processed to build a syntax tree includes:
recombining the words to be processed to obtain a grammar unit;
and connecting the grammar units to obtain a grammar tree.
In one embodiment, before the concatenating the syntax units to obtain the syntax tree, the method further includes:
and performing form check on the grammar unit to correct the grammar unit.
In one embodiment, said traversing said syntax tree by depth-first traversal to identify unreachable statements in said syntax tree comprises:
traversing the syntax tree by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
In one embodiment, the traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree includes at least one of:
traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement;
traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword;
and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements after subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and jump keywords exist in branches corresponding to the branch keywords.
In one embodiment, after traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree, the method comprises:
and annotating the unreachable statement.
In a second aspect, the present application provides a method for converting a C language program into a Java program, including:
identifying the C language program according to the identification method of the unreachable sentences in the C language to obtain unreachable sentences;
and converting the C language program into a Java program according to the unreachable statement.
In a third aspect, the present application provides a device for recognizing unreachable sentences in C language, where the device for recognizing unreachable sentences in C language includes:
the source program acquisition module is used for acquiring a C language program to be processed;
the lexical analysis module is used for carrying out lexical analysis on the C language program to be processed to obtain words to be processed;
the grammar analysis module is used for carrying out grammar analysis on the word to be processed so as to establish a grammar tree;
a first identification module to traverse the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree.
In a fourth aspect, the present application provides an apparatus for converting a C language program into a Java program, comprising:
the second identification module is used for identifying the C language program according to the unreachable sentence identification device in the C language to obtain an unreachable sentence;
and the conversion module is used for converting the C language program into a Java program according to the unreachable statement.
In a fifth aspect, the present application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method as described in any of the above embodiments when the processor executes the computer program.
In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described in any of the above embodiments.
In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method described in any of the above embodiments.
According to the inaccessible sentence identification method, the C language and JAVA conversion method and the device, lexical analysis and syntax analysis are adopted, the syntax tree is traversed by a depth-first traversal method, the inaccessible sentences are found out, the C language source program can be well parsed, word symbols and semantic structures are obtained, and the phenomenon that converted Java codes cannot be executed is avoided.
Drawings
FIG. 1 is a diagram of an application environment for a method for unreachable statement identification in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for unreachable statement identification in one embodiment;
FIG. 3 is a diagram illustrating the classification of unreachable statements, in one embodiment;
FIG. 4 is a screenshot of a syntax tree visualization in one embodiment;
FIG. 5 is a diagram illustrating annotation results, in one embodiment;
FIG. 6 is a flow diagram for implementation of an unreachable statement in one embodiment;
FIG. 7 is a flow diagram illustrating the C language and JAVA conversion according to one embodiment;
FIG. 8 is a block diagram showing the structure of an unreachable sentence recognition apparatus in one embodiment;
FIG. 9 is a block diagram illustrating the structure of a C language and JAVA converter according to an embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The unreachable statement identification method, the C language and the JAVA conversion method provided by the embodiment of the application can be applied to the application environment shown in FIG. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The terminal 102 may send the to-be-processed C language program to the server 104, so that the server 104 performs lexical analysis on the to-be-processed C language program to obtain a to-be-processed word; carrying out syntactic analysis on the words to be processed to establish a syntactic tree; the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree. Thus, lexical analysis and syntactic analysis are adopted, a depth-first traversal method is utilized to traverse the syntactic tree, the unreachable sentences in the syntactic tree are found out, the C language source program can be well parsed, word symbols and semantic structures are obtained, and the situation that converted Java codes cannot be executed is avoided.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers. In other embodiments, the unreachable statement identification method may be applied only to the server, and is not specifically limited herein.
In one embodiment, as shown in fig. 2, an unreachable statement identification method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
s202: and acquiring the C language program to be processed.
Specifically, the to-be-processed C language program refers to a source program written in C language, and a large number of unreachable sentences (such as goto sentences, sentences after return, and the like) may exist in C language code. These unreachable sentences have no effect on the compilation and execution of the C language program, but the compilation after the translation into the Java program will report an error, which hinders the execution of the program.
In the embodiment, in the C language program, the reasons for the unreachable sentences are mainly classified into three types: jump statements are unreachable, while dead loops are unreachable, and branch selection is unreachable. The unreachable of the jump sentence is mainly unreachable caused by the continue/break/goto/return keywords, and the collocation of the continue, break, goto and return is used, so that the current segment program can be ended in advance in some scenes, and the subsequent program can never be executed and is in an unreachable state. The unreachable of the endless loop statement is mainly caused by the while statement, and if the judgment condition in the while is always satisfied and the program cannot jump out, the unreachable of the subsequent statement can be caused. If each branch has a return type statement, the subsequent statement block of the branch can never be executed and is in an unreachable state.
S204: and performing lexical analysis on the C language program to be processed to obtain words to be processed.
Specifically, the lexical analysis is to extract different categories of key words in the C language program to be processed, where the key words may be classified into five categories, specifically, identifiers, reserved words, constants, operators, and delimiters, and in other embodiments, the key words may also include other categories.
The server can configure corresponding rules in the program, and the rules mainly identify corresponding keywords and perform different analysis processing according to different keywords. Specifically, the server splits the to-be-processed C language program into single words and symbols, and the words and symbols are mainly summarized into five categories of identifiers, reserved words, constants, operators and delimiters in the program.
S206: the words to be processed are parsed to build a grammar tree.
Specifically, the parsing is to recombine the words to be processed into sentences and program segments according to the rules defined in c.g. 4. And establishing a syntax tree according to the relation between the sentences and the program segment quality inspection.
S208: the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree.
In particular, the depth-first traversal method belongs to a graph algorithm, and is a traversal algorithm for graphs and trees, the process is simply to go deep into each possible branch path until the branch path can not go deep, and each node can only be visited once.
And the server traverses the syntax tree by adopting a syntax tree analysis re-traversal mode and a depth-first traversal method according to the compiling principle knowledge, and finds out the unreachable sentences in the syntax tree for annotation. The C language source program can be well parsed, word symbols and semantic structures can be obtained, inaccessible sentences can be automatically annotated, and the situation that converted Java codes cannot be executed is avoided.
According to the unreachable sentence identification method, lexical analysis and syntactic analysis are adopted, the syntax tree is traversed by a depth-first traversal method, unreachable sentences in the syntax tree are found out, the C language source program can be well parsed, word symbols and semantic structures are obtained, and the situation that converted Java codes cannot be executed is avoided.
In one embodiment, the lexical analysis of the C language program to be processed to obtain the word to be processed includes: preprocessing a C language program to be processed to delete useless symbols; and recognizing the preprocessed C language program to be processed according to the keyword recognition rule to obtain a word to be processed.
In particular, extracting word symbols is the first step of a grammar check, also called lexical analysis. In this stage, the source program is read in from left to right one by one, then the read-in content is preprocessed to remove some useless symbols such as line feed characters, tab characters and the like, and finally the source program is scanned from beginning to end and is decomposed into five types of words (Token) such as identifiers, reserved characters, constants, operators and delimiters. Lexical analysis is directed to individual characters, which are the basis of all subsequent work, with the aim of composing them into valid words, which would irreparably affect the context if this step is made wrong, e.g. by stating that the left-shifted symbol '<' is split into '<' and '<'.
In one embodiment, parsing the word to be processed to build a grammar tree includes: recombining words to be processed to obtain a grammar unit; and connecting the grammar units to obtain a grammar tree.
In one embodiment, before the syntax unit is connected to obtain the syntax tree, the method further includes: the syntax unit is format checked to correct the syntax unit.
Building a syntax tree is the second step of syntax checking, also called parsing. The step is to serialize the word symbols in the lexical analysis stage into grammatical units such as phrases, sentences and program segments, to preliminarily check the errors such as lack of semicolons, mismatch of brackets and mismatch of begin/end in the grammatical units, and to express the obtained language structure in the form of a tree. Taking the simplest C code as an example, the syntax tree after syntax analysis can be seen in fig. 4. Wherein, the meaning of each unit in fig. 4 is: compatibility Unit: a compiling unit; transflationUnit: a translation unit; externalclassification: an external declaration; function definition: defining a function; decalarionon specificers: a declaration; typeSpecifier: type specifier, 1: class 1; int: "Int": the type of the return value of the function is int type, which corresponds to int before main () in the left code; function eliminator: declaring a function; identification: identifiers, i.e., (,) behind main is an identifier; ComponndStatement: compound statement, referring to function body; LeftBrace, RightBrace: refer to {, } in the code; blockitem: refers to a code block; int: "Int": refers to "int a in the code; int in "; initDeclarator: initializing a declaration; identification: definition, a: an attribute a defined as an int type in the code; semi: an identifier, "; "indicates that a sentence of code is finished. Fig. 4 is a screenshot of a syntax tree visualization, which is defined in the code rules, regardless of the syntax units described above.
In one embodiment, traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree comprises: the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
In one embodiment, traversing the syntax tree by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree comprises at least one of: traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement; traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword; and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements behind subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and the branches corresponding to the branch keywords all have jump keywords.
Specifically, in this embodiment, the first stage focuses on whether each word is legal, and the second stage combines the words obtained from the lexical analysis according to rules, where the results from the previous two steps are used to truly understand the function to be expressed by the source program. In this stage, a Depth-First traversal (Depth First search) method is adopted to perform Depth-First traversal on the syntax tree branches.
The jump keyword may include continue, break, goto, return, and so on. The loop key may include while. The branch keywords may include if, else, and so on.
The content of each curly brace in the C language program is a statement block which is expanded from left to right in the form of a particle tree in the syntax tree. In the traversing process of each sub-tree, if a jump keyword such as continue/break/goto/return is found, the subsequent nodes in the sub-tree do not traverse any more and are regarded as unreachable statements. If the left subtree of a certain subtree is while, it indicates that the subtree and while statements form a loop body, and when the conditional expression of the loop body is traversed, if the expression is constantly true, and the jump keyword break is not found in the subtree or a return occurs, all subsequent nodes of the syntax tree are considered to be in an unreachable state. Similarly, when a statement block is selected by if/else and switch branches when a syntax tree is traversed, a flag bit is set to record whether each subtree under the statement block encounters a keyword return when the syntax tree is traversed, and if each subtree has a return, all subsequent nodes of the syntax tree are in an unreachable state. In the C2J process, the reachable sentences are automatically translated from C language to Java language, and all the unreachable sentences are kept in the translated Java program in an annotated form until the traversal is finished. Thus, the syntax checking method is completed.
In one embodiment, traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree comprises: the unreachable statement is annotated.
In particular, reference may be made to fig. 5, in which unreachable sentences are annotated by means of a parallel-stroke or the like. Commonly used annotations in Java are typically// or//.
Specifically, referring to fig. 6, fig. 6 is a flowchart of an implementation of an unreachable sentence in an embodiment, in which word symbols are extracted, a syntax tree is then built, and finally a semantic structure is obtained, and the unreachable sentence is found and annotated.
In one embodiment, as shown in fig. 7, a C language and JAVA conversion method is provided, which is exemplified by the application of the method to the server in fig. 1, and includes the following steps:
s702: according to the method for recognizing the unreachable sentences in the C language in any embodiment, the unreachable sentences are obtained by recognizing the C language program.
S704: and converting the C language program into a Java program according to the unreachable statement.
In this embodiment, syntax checking in C language before C2J is implemented, and an unreachable statement is automatically found and annotated without manually combing the program execution flow. Meanwhile, the conversion processing of C language sentences such as goto, return, break and the like can be well carried out, and errors in the compiling and executing processes of the converted Java program are avoided.
According to the C language and JAVA conversion method, lexical analysis and syntax analysis are adopted, a depth-first traversal method is utilized to traverse a syntax tree, unreachable sentences in the syntax tree are found out, C language source programs can be well parsed, word symbols and semantic structures are obtained, and the phenomenon that converted Java codes cannot be executed is avoided.
It should be understood that, although the steps in the flowcharts related to the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides an unreachable statement identification apparatus, a C language and JAVA conversion apparatus, for implementing the above-mentioned unreachable statement identification method, C language and JAVA conversion method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so the specific limitations in the embodiments of one or more of the unreachable statement identification apparatus, the C language and the JAVA conversion apparatus provided below can be referred to the limitations in the above embodiments for the unreachable statement identification method, the C language and the JAVA conversion method, and are not described herein again.
In one embodiment, as shown in fig. 8, there is provided an unreachable sentence recognition apparatus including: a source program obtaining module 801, a lexical analysis module 802, a syntax analysis module 803, and a first recognition module 804, wherein:
a source program obtaining module 801, configured to obtain a C language program to be processed;
a lexical analysis module 802, configured to perform lexical analysis on the to-be-processed C language program to obtain a to-be-processed word;
a syntax analysis module 803, configured to perform syntax analysis on the word to be processed to establish a syntax tree;
a first identifying module 804 configured to traverse the syntax tree by a depth-first traversal to identify unreachable statements in the syntax tree.
In one embodiment, the lexical analysis module 802 may include:
the first preprocessing unit is used for preprocessing the C language program to be processed so as to delete useless symbols;
and the lexical analysis unit is used for identifying the preprocessed C language program to be processed according to the keyword identification rule to obtain a word to be processed.
In one embodiment, the parsing module 803 may include:
the recombination unit is used for recombining the words to be processed to obtain a grammar unit;
and the syntax analysis unit is used for connecting the syntax units to obtain a syntax tree.
In one embodiment, the parsing module 803 may further include:
and the second preprocessing unit is used for performing form check on the grammar unit so as to correct the grammar unit.
In one embodiment, the first identifying module 804 is further configured to traverse the syntax tree through a depth-first traversal method to identify unreachable statements in the syntax tree according to the keywords in the syntax tree.
In one embodiment, the first identifying module 804 is further configured to perform at least one of: traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement; traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword; and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements behind subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and the branches corresponding to the branch keywords all have jump keywords.
In one embodiment, the apparatus further includes:
and the annotation module is used for annotating the unreachable sentences.
In one embodiment, as shown in fig. 9, there is provided a C language and JAVA conversion apparatus, including: a second identification module 901 and a conversion module 902, wherein:
a second identifying module 901, configured to identify a C language program according to the unreachable sentence identifying apparatus in C language in any embodiment to obtain an unreachable sentence;
and the conversion module 902 is used for converting the C language program into a Java program according to the unreachable statement.
The modules in the unreachable sentence recognition device and the C language and JAVA conversion device may be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an unreachable statement identification method, a C language and a JAVA conversion method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring a C language program to be processed; performing lexical analysis on the C language program to be processed to obtain words to be processed; carrying out syntactic analysis on the words to be processed to establish a syntactic tree; the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree.
In one embodiment, the lexical analysis of the C language program to be processed to obtain the word to be processed, implemented when the computer program is executed by the processor, includes: preprocessing a C language program to be processed to delete useless symbols; and recognizing the preprocessed C language program to be processed according to the keyword recognition rule to obtain a word to be processed.
In one embodiment, parsing the words to be processed to build the grammar tree, as implemented by the processor executing the computer program, includes: recombining words to be processed to obtain a grammar unit; and connecting the grammar units to obtain a grammar tree.
In one embodiment, before the syntax tree is obtained by concatenating syntax units implemented when the processor executes the computer program, the method further includes: the syntax unit is format checked to correct the syntax unit.
In one embodiment, traversing the syntax tree by a depth-first traversal implemented by a processor executing a computer program to identify unreachable statements in the syntax tree comprises: the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
In one embodiment, traversing the syntax tree by a depth-first traversal implemented by the processor when executing the computer program to identify unreachable statements in the syntax tree based on keywords in the syntax tree comprises at least one of: traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement; traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword; and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements behind subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and the branches corresponding to the branch keywords all have jump keywords.
In one embodiment, after traversing the syntax tree by depth-first traversal implemented by the processor when executing the computer program to identify unreachable statements in the syntax tree, the method comprises: the unreachable statement is annotated.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: identifying a C language program according to the method for identifying unreachable sentences in C language of any embodiment to obtain unreachable sentences; and converting the C language program into a Java program according to the unreachable statement.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a C language program to be processed; performing lexical analysis on the C language program to be processed to obtain words to be processed; carrying out syntactic analysis on the words to be processed to establish a syntactic tree; the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree.
In one embodiment, a lexical analysis of a to-be-processed C language program to obtain a to-be-processed word, implemented when a computer program is executed by a processor, includes: preprocessing a C language program to be processed to delete useless symbols; and recognizing the preprocessed C language program to be processed according to the keyword recognition rule to obtain a word to be processed.
In one embodiment, the parsing of the to-be-processed words to build the syntax tree, as implemented by the computer program when executed by the processor, comprises: recombining words to be processed to obtain a grammar unit; and connecting the grammar units to obtain a grammar tree.
In one embodiment, before the syntax trees are derived from the concatenation of syntax elements, the computer program when executed by the processor further comprises: the syntax unit is format checked to correct the syntax unit.
In one embodiment, traversing the syntax tree by depth-first traversal implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree comprises: the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
In one embodiment, traversing the syntax tree by a depth-first traversal method implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree based on keywords in the syntax tree includes at least one of: traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement; traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword; and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements behind subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and the branches corresponding to the branch keywords all have jump keywords.
In one embodiment, after traversing the syntax tree by depth-first traversal implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree, comprising: the unreachable statement is annotated.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: identifying a C language program according to the method for identifying unreachable sentences in C language of any embodiment to obtain unreachable sentences; and converting the C language program into a Java program according to the unreachable statement.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of: acquiring a C language program to be processed; performing lexical analysis on the C language program to be processed to obtain words to be processed; carrying out syntactic analysis on the words to be processed to establish a syntactic tree; the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree.
In one embodiment, a lexical analysis of a to-be-processed C language program to obtain a to-be-processed word, implemented when a computer program is executed by a processor, includes: preprocessing a C language program to be processed to delete useless symbols; and recognizing the preprocessed C language program to be processed according to the keyword recognition rule to obtain a word to be processed.
In one embodiment, the parsing of the to-be-processed words to build the syntax tree, as implemented by the computer program when executed by the processor, comprises: recombining words to be processed to obtain a grammar unit; and connecting the grammar units to obtain a grammar tree.
In one embodiment, before the syntax trees are derived from the concatenation of syntax elements, the computer program when executed by the processor further comprises: the syntax unit is format checked to correct the syntax unit.
In one embodiment, traversing the syntax tree by depth-first traversal implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree comprises: the syntax tree is traversed by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
In one embodiment, traversing the syntax tree by a depth-first traversal method implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree based on keywords in the syntax tree includes at least one of: traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement; traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword; and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements behind subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and the branches corresponding to the branch keywords all have jump keywords.
In one embodiment, after traversing the syntax tree by depth-first traversal implemented when the computer program is executed by the processor to identify unreachable statements in the syntax tree, comprising: the unreachable statement is annotated.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of: identifying a C language program according to the method for identifying unreachable sentences in C language of any embodiment to obtain unreachable sentences; and converting the C language program into a Java program according to the unreachable statement.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (13)

1. A recognition method of unreachable sentences in C language is characterized in that the recognition method of unreachable sentences in C language comprises the following steps:
acquiring a C language program to be processed;
performing lexical analysis on the C language program to be processed to obtain words to be processed;
carrying out syntactic analysis on the word to be processed to establish a syntactic tree;
traversing the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree.
2. The method for recognizing unreachable sentences in C language according to claim 1, wherein the lexical analysis of the C language program to be processed to obtain words to be processed includes:
preprocessing the C language program to be processed to delete useless symbols;
and recognizing the preprocessed C language program to be processed according to a keyword recognition rule to obtain a word to be processed.
3. The method for recognizing unreachable sentences in C language according to claim 1, wherein the parsing the words to be processed to build a syntax tree comprises:
recombining the words to be processed to obtain a grammar unit;
and connecting the grammar units to obtain a grammar tree.
4. The method of claim 3, wherein before concatenating the syntax elements to obtain the syntax tree, further comprising:
and performing form check on the grammar unit to correct the grammar unit.
5. The method for identifying unreachable sentences in C language according to any of claims 1 to 4, wherein the traversing the syntax tree by depth-first traversal to identify unreachable sentences in the syntax tree comprises:
traversing the syntax tree by a depth-first traversal to identify unreachable statements in the syntax tree based on keywords in the syntax tree.
6. The method of claim 5, wherein traversing the syntax tree by depth-first traversal to identify unreachable sentences in the syntax tree based on keywords in the syntax tree comprises at least one of:
traversing the syntax tree by a depth-first traversal method to judge whether a skip keyword exists in the syntax tree, and if so, taking a statement after a subtree corresponding to the skip keyword as an unreachable statement;
traversing the syntax tree by a depth-first traversal method to judge whether a loop keyword exists in the syntax tree, and taking a statement after a sub-tree corresponding to the loop keyword as an unreachable statement when the loop keyword exists, a loop condition corresponding to the loop keyword is constantly established, and a skip keyword does not exist in a loop body corresponding to the loop keyword;
and traversing the syntax tree by a depth-first traversal method to judge whether branch keywords exist in the syntax tree, and taking statements after subtrees corresponding to the branch keywords as unreachable statements when the branch keywords exist and jump keywords exist in branches corresponding to the branch keywords.
7. The method for identifying unreachable sentences in C language according to any of claims 1 to 4, wherein the traversing the syntax tree by depth-first traversal to identify unreachable sentences in the syntax tree comprises:
and annotating the unreachable statement.
8. A method for converting C language programs into Java programs is characterized by comprising the following steps:
the method for recognizing unreachable sentences in C language according to any one of claims 1 to 7, recognizing C language programs to obtain unreachable sentences;
and converting the C language program into a Java program according to the unreachable statement.
9. An apparatus for recognizing unreachable sentences in C language, the apparatus comprising:
the source program acquisition module is used for acquiring a C language program to be processed;
the lexical analysis module is used for carrying out lexical analysis on the C language program to be processed to obtain words to be processed;
the grammar analysis module is used for carrying out grammar analysis on the word to be processed so as to establish a grammar tree;
a first identification module to traverse the syntax tree by depth-first traversal to identify unreachable statements in the syntax tree.
10. An apparatus for converting a C language program into a Java program, the apparatus comprising:
a second recognition module, configured to recognize the unreachable statement obtained by the C language program according to the unreachable statement recognition apparatus in C language of claim 9;
and the conversion module is used for converting the C language program into a Java program according to the unreachable statement.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7 or 8.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7 or 8.
13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7 or 8.
CN202111363168.5A 2021-11-17 2021-11-17 Unreachable statement identification method, C language and Java conversion method and device Pending CN114237607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111363168.5A CN114237607A (en) 2021-11-17 2021-11-17 Unreachable statement identification method, C language and Java conversion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111363168.5A CN114237607A (en) 2021-11-17 2021-11-17 Unreachable statement identification method, C language and Java conversion method and device

Publications (1)

Publication Number Publication Date
CN114237607A true CN114237607A (en) 2022-03-25

Family

ID=80749823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111363168.5A Pending CN114237607A (en) 2021-11-17 2021-11-17 Unreachable statement identification method, C language and Java conversion method and device

Country Status (1)

Country Link
CN (1) CN114237607A (en)

Similar Documents

Publication Publication Date Title
US8028276B1 (en) Method and system for generating a test file
AU2014315619B2 (en) Methods and systems of four-valued simulation
US20130152061A1 (en) Full fidelity parse tree for programming language processing
CN115309451A (en) Code clone detection method, device, equipment, storage medium and program product
Fedorchenko et al. Equivalent transformations and regularization in context-free grammars
JP2021111327A (en) Method for generating api knowledge graph, system, and non-transitory computer-readable medium
CN110737469B (en) Source code similarity evaluation method based on semantic information on function granularity
CN114692600B (en) Method and system for formal language processing using subroutine graph
CN113254023B (en) Object reading method and device and electronic equipment
Hughes et al. Polish parsers, step by step
CN114816356A (en) System and method for generating HTTP request code based on interface document
CN114237607A (en) Unreachable statement identification method, C language and Java conversion method and device
CN111831288B (en) Method and system for automatically generating Thrift IDL data structure and automatic transfer function
CN114661298A (en) Automatic public method generation method, system, device and medium
CN112948419A (en) Query statement processing method and device
Bai et al. Automatic generation of code comments based on comment reuse and program parsing
CN116451795B (en) Quantum circuit diagram processing method and device, electronic equipment and storage medium
CN113220800B (en) ANTLR-based data field blood-edge analysis method and device
Quesada et al. Parsing abstract syntax graphs with ModelCC
Chaochaisit et al. CSV-X: A Linked Data Enabled Schema Language, Model, and Processing Engine for Non-Uniform CSV
KR101921123B1 (en) Field-Indexing Method for Message
CN117407002A (en) Transcoding method, transcoding device, computer equipment and storage medium
CN111752967A (en) SQL-based data processing method and device, electronic equipment and storage medium
KR101770271B1 (en) Field-Indexing Method for Message
CN112650680A (en) Detection method and system for redundant variables and redundant method based on abstract syntax tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination