CN116484388A - Code entrainment identification method and device, electronic equipment and medium - Google Patents

Code entrainment identification method and device, electronic equipment and medium Download PDF

Info

Publication number
CN116484388A
CN116484388A CN202310456727.XA CN202310456727A CN116484388A CN 116484388 A CN116484388 A CN 116484388A CN 202310456727 A CN202310456727 A CN 202310456727A CN 116484388 A CN116484388 A CN 116484388A
Authority
CN
China
Prior art keywords
code
file
function
entrainment
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310456727.XA
Other languages
Chinese (zh)
Inventor
杨飞雪
施阳
成汉平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310456727.XA priority Critical patent/CN116484388A/en
Publication of CN116484388A publication Critical patent/CN116484388A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The disclosure provides a code entrainment identification method, a device, electronic equipment and a medium, which can be used in the financial field or other fields. The method comprises the following steps: acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software; calibrating code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing a topological relation function of a code database according to the code attributes and the weights of the different file levels; carrying out semantic analysis on the topological relation function, and determining whether letters with semantic confusion exist in a code database; under the condition that letters with semantic confusion do not exist in a code database and variables in functions of each code file are not wrong, establishing a logic matching function by using a graph network model; and determining the code entrainment state of the code database according to the calculation results of the topological relation function and the logic matching function.

Description

Code entrainment identification method and device, electronic equipment and medium
Technical Field
The present disclosure relates to the field of computer technology, and may be used in the financial field or other fields, and in particular, to a code entrainment identification method, apparatus, electronic device, medium, and program product.
Background
Code entrainment refers to a piece of code that is additionally entrained, typically in the normal program propagation of a computer. Code entrainment is prone to disruption to the network security of a computer.
At present, a method for detecting code entrainment comprises the steps of obtaining service demand data, and performing word segmentation processing on the data according to historical demand data and new version demand data to obtain a plurality of keywords; respectively constructing keyword vectors corresponding to the historical demand data and the new version demand data according to the keywords, and calculating similarity scores between the historical demand data and the new version demand data; the method comprises the steps of determining a requirement type of new-version requirement data and determining a code entrainment risk level associated with the requirement type in the new-version requirement data.
However, in practicing the disclosed concepts, applicants found that: (1) The existing method has the characteristic of multiple branches, each feature to be processed (new function, optimization, etc.) is branched independently, when the branches are developed, the branches are combined to a trunk, and a great deal of time and labor are consumed; (2) The existing method has characteristic switches, which are still developed on a trunk or a single branch, are used for debugging and publishing, and when the characteristic switches are required to be published but some characteristics are not completed, the switches are closed and opened after the completion, so that the compatibility is poor, and the operation is complex.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a code entrainment identification method, apparatus, electronic device, medium, and program product.
According to a first aspect of the present disclosure, there is provided a code entrainment identification method, comprising: acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software; calibrating code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing a topological relation function of a code database according to the code attributes and the weights of the different file levels; carrying out semantic analysis on the topological relation function, and determining whether letters with semantic confusion exist in a code database; under the condition that letters with semantic confusion do not exist in a code database and variables in functions of each code file are not wrong, establishing a logic matching function by using a graph network model; and determining the code entrainment state of the code database according to the calculation results of the topological relation function and the logic matching function.
According to an embodiment of the present disclosure, scaling code attributes of different file levels for each code file includes: marking a plurality of code attributes for each code file, wherein the code attributes comprise file names, file numbers, code line numbers of each file, function numbers and reference relations; and calibrating a file hierarchy to which each code attribute belongs according to business logic relations of different development versions, wherein the file hierarchy sequentially comprises folders, files, project programs, function definitions and function reference relations.
According to an embodiment of the present disclosure, determining the weight corresponding to each file hierarchy includes: counting the reference number of the code attribute of each file level in the code data according to the function reference relation; and determining the weight corresponding to the file hierarchy according to the number of references, wherein the more the number of references is, the larger the weight corresponding to the file hierarchy is.
According to an embodiment of the present disclosure, before calibrating the file hierarchy to which each code attribute belongs, the method further includes: and updating the code database according to the business logic relation and the time sequence.
According to an embodiment of the present disclosure, letters for which there is semantic confusion in a code database are determined as follows: extracting function categories and function names from the topological relation function as a plurality of keywords; carrying out semantic analysis on any two keywords in the plurality of keywords to obtain semantic similarity of the two keywords; in the event that the semantic similarity is above a similarity threshold, it is determined that there are semantically confusing letters in the code database.
According to an embodiment of the present disclosure, a graph network model includes a directed acyclic graph model including a plurality of nodes and paths between nodes with unidirectional arrows; the logical matching function includes a probability density function; building a logical match function using a graph network model includes: constructing a directed acyclic graph model based on the function definitions and the function reference relationships in each code file; for the directed acyclic graph model, carrying out semantic analysis on two nodes related to each path, and under the condition that the semantics of the two nodes are determined to be not feasible, calculating posterior probability according to the position information of the two nodes and the joint prior probability; based on the posterior probability, a probability density function is derived.
In accordance with an embodiment of the present disclosure, constructing a directed acyclic graph model includes: each function definition is used for representing a node, and each file level corresponds to a weight to represent a path between the nodes; a prior probability for each of the plurality of nodes is determined using a Bayesian network algorithm.
According to an embodiment of the present disclosure, the method further comprises: establishing an index mechanism of a code database according to the code attribute; in the case that the variable in the function of the code file is determined to be wrong, the position of the problem code is determined according to an index mechanism, and the problem code is changed.
According to an embodiment of the present disclosure, the code entrainment state includes the existence of code entrainment; the code database is determined to be entrained in the code as follows: under the condition that the calculation results of the topological relation function and the logic matching function are different, determining that the code database has code entrainment, and determining an entrained code file and a corresponding line number area according to an indexing mechanism.
After determining the entrained code file and the corresponding line number area, according to an embodiment of the present disclosure, further includes: changing codes in the line number area, and updating a code database after changing the codes; and repeatedly executing the operation of calibrating the code attribute of different file levels for each code file to determine the state code entrainment of the code database until the calculation result of the topological relation function and the logic matching function is the same.
A second aspect of the present disclosure provides a code entrainment identification device, comprising: the code database acquisition module is used for acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software; the topological relation function establishing module is used for calibrating the code attributes of different file levels for each code file, determining the weight corresponding to each file level and establishing the topological relation function of the code database according to the code attributes and the weights of different file levels; the semantic analysis module is used for carrying out semantic analysis on the topological relation function and determining whether letters with semantic confusion exist in the code database; the logic matching function building module is used for building a logic matching function by using a graph network model under the condition that letters with semantic confusion do not exist in the code database and variables in functions of each code file are not wrong; and the code entrainment determining module is used for determining the code entrainment state of the code database according to the calculation result of the topological relation function and the logic matching function.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the code entrainment identification method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described code entrainment identification method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the code entrainment identification method described above.
By the code entrainment identification method, the code entrainment identification device, the electronic equipment, the medium and the program product, the code files of each development version of a certain software product are summarized, and a topological relation function is established by step-by-step calibration and weight assignment; after semantic analysis, code entrainment is effectively identified based on a topological relation function and a logic matching function. The method provided by the disclosure is used as an autonomous iterative process of computer business logic, can effectively improve the safety of software products and reduce the complexity of code entrainment identification.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
Fig. 1 schematically illustrates a system architecture of a code entrainment identification method and apparatus according to an embodiment of the disclosure.
Fig. 2 schematically illustrates a flow chart of a code entrainment identification method according to an embodiment of the disclosure.
Fig. 3 schematically illustrates a flow chart of semantic analysis of topological relation functions according to an embodiment of the disclosure.
Fig. 4 schematically illustrates a flow chart for establishing a logical match function according to an embodiment of the disclosure.
FIG. 5 schematically illustrates a flow chart for modifying problem code according to an embodiment of the present disclosure.
Fig. 6 schematically illustrates a flow chart for eliminating code entrainment in accordance with an embodiment of the disclosure.
Fig. 7 schematically illustrates a block diagram of a code entrainment identification device according to an embodiment of the disclosure.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a code entrainment identification method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon, the computer program product being for use by or in connection with an instruction execution system.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
The topology theory is based on the theory of causal relation of various factors in the system, and key factors causing problems can be effectively found out.
In view of this, embodiments of the present disclosure provide a code entrainment identification method, apparatus, electronic device, storage medium, and program product, relating to the field of computer technology, which may be used in the financial field or other fields. The method comprises the following steps: acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software; calibrating code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing a topological relation function of a code database according to the code attributes and the weights of the different file levels; carrying out semantic analysis on the topological relation function, and determining whether letters with semantic confusion exist in a code database; under the condition that letters with semantic confusion do not exist in a code database and variables in functions of each code file are not wrong, establishing a logic matching function by using a graph network model; and determining the code entrainment state of the code database according to the calculation results of the topological relation function and the logic matching function.
Fig. 1 schematically illustrates a system architecture suitable for code entrainment identification methods and apparatus according to embodiments of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the code entrainment identification method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the code entrainment identification devices provided by embodiments of the present disclosure may be generally disposed in the server 105. The code entrainment identification methods provided by embodiments of the present disclosure may also be performed by a server or cluster of servers other than server 105 and capable of communicating with terminal devices 101, 102, 103 and/or server 105. Accordingly, the code entrainment identification apparatus provided by the embodiments of the present disclosure may also be provided in a server or server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The code entrainment identification method according to the embodiments of the present disclosure will be described in detail below with reference to fig. 2 to 6 based on the system architecture described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a code entrainment identification method according to an embodiment of the disclosure.
As shown in fig. 2, the code entrainment identification method of this embodiment may include operations S210 to S250, which may be performed by the server 105 described above.
In operation S210, a code database is acquired, the code database including a plurality of code files, the plurality of code files being different development versions of the same software.
For example, a code database is a collection of code files of different development versions formulated before a certain software product is tested and published (or delivered) from the project staging stage.
In embodiments of the present disclosure, prior to acquiring the code database, consent or authorization of the user to whom the code database pertains may be obtained. For example, before operation S210, a request to acquire a code database may be issued to the user. In case the user agrees or authorizes that the code database can be acquired, the operation S210 is performed.
In operation S220, the code attributes of different file levels are calibrated for each code file, the weight corresponding to each file level is determined, and the topology relation function of the code database is established according to the code attributes and weights of different file levels.
This operation marks each code file with a code attribute, and the file hierarchy to which the code data belongs.
In operation S230, semantic analysis is performed on the topological relation function to determine whether there are semantically confusing letters in the code database.
In operation S240, in the case where it is determined that there are no letters of semantic confusion in the code database and the variables within the function of each code file are not erroneous, a logical matching function is established using the graph network model.
In operation S250, a code entrainment state of the code database is determined according to the calculation results of the topology relation function and the logic matching function.
By the embodiment of the disclosure, the code files of each development version of a certain software product are summarized, and a topological relation function is established by step-by-step calibration and weight assignment; after semantic analysis, code entrainment is effectively identified based on a topological relation function and a logic matching function. The method provided by the disclosure is used as an autonomous iterative process of computer business logic, can effectively improve the safety of software products and reduce the complexity of code entrainment identification.
In the embodiment of the present disclosure, the calibrating the code attribute of the different file levels for each code file in operation S220 includes: marking a plurality of code attributes for each code file, wherein the code attributes comprise file names, file numbers, code line numbers of each file, function numbers and reference relations; and calibrating a file hierarchy to which each code attribute belongs according to business logic relations of different development versions, wherein the file hierarchy sequentially comprises folders, files, project programs, function definitions and function reference relations.
It can be seen that the code attributes of each code file are calibrated, then a plurality of file levels are defined based on the code attributes, and a corresponding file level is determined for each code attribute. The file hierarchy may include, for example: folder- > file- > project program- > function definition- > function reference.
For example, when the code attribute is the number of functions, the corresponding file hierarchy is defined for the functions; when the code attribute is a reference relationship, the corresponding file hierarchy is a function reference relationship. These code properties and file levels may be set, for example, according to the development needs of the actual software product, and the above is given by way of example only, but the disclosure is not limited thereto.
According to the embodiment of the disclosure, in the stage of planning the software project, code databases of different development versions are formulated, and code attributes of different file levels are calibrated according to the code versions delivered and tested each time, so that the management is convenient, and errors of artificial logic judgment are reduced.
In the embodiment of the present disclosure, before calibrating the file hierarchy to which each code attribute belongs, the method further includes: and updating the code database according to the business logic relation and the time sequence. In order to ensure the current logic relevance of the code database, the relation chain, the crossing type mark and the updating of the code database cannot be cut off in the process of calibrating the code file.
In the embodiment of the present disclosure, the determining, in operation S220, the weight corresponding to each file hierarchy includes: counting the reference number of the code attribute of each file level in the code data according to the function reference relation; and determining the weight corresponding to the file hierarchy according to the number of references, wherein the more the number of references is, the larger the weight corresponding to the file hierarchy is.
For example, the topological relation function can be expressed as P (a, x, b, y, c, z, d, T, e, i), wherein x, y, z, T, i represent folders, files, project programs, function definitions, and function reference relations, respectively; a, b, c, d, e represent the determined weights, respectively, which are obtained by scaling the number of references of the code attribute of each file level in the entire code database after calibrating the code attribute of the different file levels for each code file in operation S220 described above.
Through the embodiment of the disclosure, when the number of references of a certain class of function in a main program is larger, the weight corresponding to the file level is larger, that is, the code of the file level is easier to generate an entrainment problem, so that hidden danger is brought to the whole program.
Fig. 3 schematically illustrates a flow chart of semantic analysis of topological relation functions according to an embodiment of the disclosure.
As shown in fig. 3, in the embodiment of the present disclosure, letters for which there is a semantic confusion in the code database are determined according to the following operations S331 to S333.
In operation S331, function categories and function names are extracted from the topological relation function as a plurality of keywords.
For example, from a certain file hierarchy [ function definition ] in the topological relation function, the function class and the function name are extracted. Since each keyword contains one or more letters, the present operation requires strict case discrimination for each letter in each keyword.
In operation S332, semantic analysis is performed on any two keywords in the plurality of keywords, so as to obtain semantic similarity of the two keywords.
In operation S333, in the case where the semantic similarity is higher than the similarity threshold, it is determined that there are semantically confused letters in the code database.
For example, one function definition static void TestDemol (intl, int 2) and another function definition static void Testdemol (intl, int 2) can identify semantically confusing letters D, d by semantic analysis, since the extracted function names testdelol and testdelol have the same letter case D, d.
Through the embodiment of the disclosure, the visual confusion letters can be identified in the code database, and under the condition of determining the semantic confusion, the corresponding letters are modified to distinguish each other, so that the omission of the visual confusion function is avoided, and the singleness of the single function on the logic service is ensured.
Then, in the event that it is determined that there are no semantically confusing letters in the code database and that the variables within the function of each code file are not in error, a logical matching function is established using a graph network model.
Specifically, after determining that there is no letter with semantic confusion in the code database, the above operation S240 goes deep into each variable already defined inside the function from the function, and checks each variable to determine whether the variable has a problem; if no problem exists, a logic matching function is established by using a preset graph network model.
In an embodiment of the present disclosure, the graph network model in operation S240 includes a directed acyclic graph model, the directed acyclic graph model including a plurality of nodes and paths with unidirectional arrows between the nodes; the logical match function includes a probability density function.
Fig. 4 schematically illustrates a flow chart for establishing a logical match function according to an embodiment of the disclosure.
As shown in fig. 4, in the embodiment of the present disclosure, the above-described operation S240 of creating a logical matching function using a graph network model may include operations S441 to S443.
In operation S441, a directed acyclic graph model is constructed based on the function definitions and function reference relationships in each code file.
For example, based on the above-mentioned divided file hierarchy, the function definition and function reference relation in each code file are determined, and this is used as the basis of the underlying logic to construct the directed acyclic graph model.
In an embodiment of the present disclosure, constructing the directed acyclic graph model in operation S441 may specifically include: each function definition is used for representing a node, and each file level corresponds to a weight to represent a path between the nodes; a prior probability for each of the plurality of nodes is determined using a Bayesian network algorithm.
Specifically, the prior probability (prior probability) refers to a probability that can be obtained before an experiment or sampling based on past experience and analysis. The posterior probability (posterior probability) refers to the probability that an event has occurred, the reason for which is intended to be calculated is due to a factor.
For example, in a bayesian network algorithm, in general, the probability of an event X is different from that of an event Y under the condition that the event Y occurs, but there is a certain correlation between them, and the following formula is given:
where P (X), P (Y) represents the probability of X, Y occurrence, respectively, and P (Y) may be referred to as a priori probability; p (X|Y) is a sign of a conditional probability, representing the probability of occurrence of event X under the condition that event Y occurs; p (y|x) is also a sign of a conditional probability, which represents the probability of occurrence of event Y under the condition that event X occurs, and this probability is also referred to as a posterior probability.
Thus, using a Bayesian network algorithm, the prior probability for each of a plurality of nodes can be determined.
In operation S442, for the directed acyclic graph model, semantic analysis is performed on two nodes related to each path, and in the case that it is determined that the semantics of the two nodes are not feasible, a posterior probability is calculated according to the position information of the two nodes in combination with a priori probabilities.
For example, the directed acyclic graph model exists for node A, node B, and paths from A to B (denoted A- > B). The nodes A, B involved in path A- > B are sequentially semantically analyzed to determine if the semantics of the node A, B are smooth. In the case where it is determined that the semantics of the node A, B are not good, the posterior probability is calculated from the position information of the node A, B in combination with the prior probability.
In operation S443, a probability density function is derived based on the posterior probability.
The present operation uses posterior probability to derive probability density functions, which now take into account the logical matching functions of the code traffic logic.
FIG. 5 schematically illustrates a flow chart for modifying problem code according to an embodiment of the present disclosure.
As shown in fig. 5, in an embodiment of the present disclosure, the code entrainment identification method may further include operations S501 to S502.
In operation S501, an indexing mechanism of a code database is established according to the code attributes.
For example, this operation S501 is set after the above-described operation of marking a plurality of code attributes for each code file. And establishing an indexing mechanism of a code database according to the code attribute marked by each code file, including the file name, the number of files, the number of code lines of each file, the number of functions and the reference relation.
In operation S502, in case that it is determined that a variable within a function of a code file is in error, a location of a problem code is determined according to an indexing mechanism, and the problem code is changed.
According to the embodiment of the disclosure, based on the established indexing mechanism, when the problem of the variable in the function is detected, the problem code can be positioned in time, the change is detected, and the code change efficiency is improved.
In an embodiment of the present disclosure, the operation S250 code entrainment state includes existence of code entrainment; the code database is determined to be entrained in the code as follows: under the condition that the calculation results of the topological relation function and the logic matching function are different, determining that the code database has code entrainment, and determining an entrained code file and a corresponding line number area according to an indexing mechanism.
In an embodiment of the present disclosure, the code entrainment state further includes the absence of code entrainment; and if the topological relation function is determined to be the same as the calculation result of the logic matching function, determining that the code database is not entrained with codes.
Through the embodiment of the disclosure, on the basis of detecting the code entrainment problem, the line number area of the problem code can be indexed through the logic relationship, so that the code change is more accurate, and the code can be greatly popularized to business.
Fig. 6 schematically illustrates a flow chart for eliminating code entrainment in accordance with an embodiment of the disclosure.
As shown in fig. 6, in the embodiment of the present disclosure, after determining the entrained code file and the corresponding line number area, operations S601 to S602 are further included.
In operation S601, codes in the line number area are changed, and the code database is updated after the change.
In operation S602, the operations from calibrating the code attribute of different file levels for each code file to determining the code entrainment state of the code database are repeatedly performed for the updated code database until the calculation result of the topology relation function and the logic matching function is the same.
For example, operation S602 repeats operations S220 to S250 on the updated code database until the calculation result of the topology relation function and the logic matching function are the same, so as to eliminate the problem of code entrainment on the logic service and ensure the security of the software product.
According to the embodiment of the disclosure, the latest code database is updated based on the business logic relation of each development version, and the code attributes of different file levels are calibrated; establishing two types of functions, which respectively represent the logic relationship and the business relationship of each version code file; when the program runs normally and the program deviates, the cause and effect relationship and the logic relationship of the code business are not matched, and the potential risk code needs to be changed.
The method provided by the embodiment of the disclosure does not need to occupy excessive manpower and time to modify codes and check line by line, thereby effectively improving the safety problem of software development in the delivery process; on the basis of detecting the code entrainment problem, the line number area of the problem code can be indexed through the logic relation, so that the code change is more accurate, and the code can be greatly popularized to business.
Based on the code entrainment identification method, the disclosure also provides a code entrainment identification device. The device will be described in detail below in connection with fig. 7.
Fig. 7 schematically illustrates a block diagram of a code entrainment identification device according to an embodiment of the disclosure.
As shown in fig. 7, the code entrainment identifying device 700 of this embodiment includes a code database acquisition module 710, a topological relation function creation module 720, a semantic analysis module 730, a logic matching function creation module 740, and a code entrainment determination module 750.
The code database acquisition module 710 is configured to acquire a code database, where the code database includes a plurality of code files, and the plurality of code files are different development versions of the same software. In an embodiment, the code database acquisition module 710 may be configured to perform the operation S210 described above, which is not described herein.
The topological relation function establishing module 720 is used for calibrating the code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing the topological relation function of the code database according to the code attributes and the weights of different file levels. In an embodiment, the topology relation function establishment module 720 may be configured to perform the operation S220 described above, which is not described herein.
The semantic analysis module 730 is configured to perform semantic analysis on the topological relation function, and determine whether there are letters with semantic confusion in the code database. In an embodiment, the semantic analysis module 730 may be configured to perform the operation S230 described above, which is not described herein.
The logic matching function building module 740 is configured to build a logic matching function using the graph network model in a case where it is determined that there are no letters with semantic confusion in the code database and the variables in the function of each code file are not in error. In an embodiment, the logic matching function establishing module 740 may be configured to perform the operation S240 described above, which is not described herein.
The code entrainment determining module 750 is configured to determine a code entrainment state of the code database according to a calculation result of the topology relation function and the logic matching function. In an embodiment, the code entrainment determination module 750 may be configured to perform the operation S250 described above, which is not described herein.
According to embodiments of the present disclosure, any of the code database acquisition module 710, the topological relation function creation module 720, the semantic analysis module 730, the logical match function creation module 740, and the code entrainment determination module 750 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the code database acquisition module 710, the topological relation function creation module 720, the semantic analysis module 730, the logic matching function creation module 740, and the code entrainment determination module 750 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three. Alternatively, at least one of the code database acquisition module 710, the topological relation function creation module 720, the semantic analysis module 730, the logical match function creation module 740, and the code entrainment determination module 750 may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a code entrainment identification method according to an embodiment of the disclosure.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement a code entrainment identification method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code is for causing a computer system to implement the code entrainment identification methods provided by embodiments of the present disclosure when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (14)

1. A code entrainment identification method, comprising:
acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software;
Calibrating code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing a topological relation function of the code database according to the code attributes of the different file levels and the weights;
carrying out semantic analysis on the topological relation function, and determining whether letters with mixed semantics exist in the code database;
under the condition that letters with semantic confusion do not exist in the code database and variables in functions of each code file are not wrong, establishing a logic matching function by using a graph network model;
and determining the code entrainment state of the code database according to the calculation result of the topological relation function and the logic matching function.
2. The method of claim 1, wherein said scaling code attributes of different file levels for each code file comprises:
marking a plurality of code attributes for each code file, wherein the code attributes comprise file names, file numbers, code line numbers of each file, function numbers and reference relations;
and calibrating a file hierarchy to which each code attribute belongs according to the business logic relation of different development versions, wherein the file hierarchy sequentially comprises folders, files, project programs, function definitions and function reference relations.
3. The method of claim 2, wherein the determining the weight corresponding to each file hierarchy comprises:
counting the reference number of the code attribute of each file level in the code data according to the function reference relation;
and determining the weight corresponding to the file hierarchy according to the reference number, wherein the weight corresponding to the file hierarchy is larger as the reference number is larger.
4. The method of claim 2, wherein before said scaling the file hierarchy to which each code attribute belongs, further comprising:
and updating the code database according to the business logic relation and the time sequence.
5. The method of claim 1, wherein the presence of semantically confusing letters in the code database is determined as follows:
extracting function categories and function names from the topological relation function as a plurality of keywords;
carrying out semantic analysis on any two keywords in the plurality of keywords to obtain semantic similarity of the two keywords;
and determining that letters with semantic confusion exist in the code database under the condition that the semantic similarity is higher than a similarity threshold value.
6. The method of claim 2, wherein the graph network model comprises a directed acyclic graph model including a plurality of nodes and paths between nodes with unidirectional arrows; the logic matching function comprises a probability density function;
the establishing the logical matching function by using the graph network model comprises the following steps:
constructing the directed acyclic graph model based on the function definitions and function reference relationships in each code file;
carrying out semantic analysis on two nodes related to each path aiming at the directed acyclic graph model, and calculating posterior probability according to the position information of the two nodes and the joint prior probability under the condition that the semantics of the two nodes are determined to be not feasible;
the probability density function is derived based on the posterior probability.
7. The method of claim 6, wherein the constructing the directed acyclic graph model comprises:
representing a node by each function definition, and representing paths among the nodes by weights corresponding to each file hierarchy respectively;
a prior probability for each of the plurality of nodes is determined using a bayesian network algorithm.
8. The method of claim 1, wherein the method further comprises:
establishing an indexing mechanism of the code database according to the code attribute;
in the case that the variable in the function of the code file is determined to be wrong, the position of the problem code is determined according to the indexing mechanism, and the problem code is changed.
9. The method of claim 8, wherein the code entrainment state includes a presence of code entrainment; determining that the code database is entrained in the code as follows:
and under the condition that the calculation results of the topological relation function and the logic matching function are different, determining that code entrainment exists in the code database, and determining entrained code files and corresponding line number areas according to the indexing mechanism.
10. The method of claim 9, wherein after determining the entrained code file and the corresponding line number area, further comprising:
changing codes in the line number area, and updating the code database after changing the codes;
and repeatedly executing the operation of calibrating the code attribute of different file levels for each code file to the code entrainment state of the code database until the calculation result of the topological relation function and the logic matching function is the same.
11. A code entrainment identification device, comprising:
the code database acquisition module is used for acquiring a code database, wherein the code database comprises a plurality of code files, and the code files are different development versions of the same software;
the topological relation function establishing module is used for calibrating the code attributes of different file levels for each code file, determining the weight corresponding to each file level, and establishing the topological relation function of the code database according to the code attributes of different file levels and the weight;
the semantic analysis module is used for carrying out semantic analysis on the topological relation function and determining whether letters with semantic confusion exist in the code database;
the logic matching function building module is used for building a logic matching function by using a graph network model under the condition that letters with semantic confusion do not exist in the code database and variables in functions of each code file are not wrong;
and the code entrainment determining module is used for determining the code entrainment state of the code database according to the calculation result of the topological relation function and the logic matching function.
12. An electronic device, comprising:
One or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-10.
13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 10.
14. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 10.
CN202310456727.XA 2023-04-25 2023-04-25 Code entrainment identification method and device, electronic equipment and medium Pending CN116484388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310456727.XA CN116484388A (en) 2023-04-25 2023-04-25 Code entrainment identification method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310456727.XA CN116484388A (en) 2023-04-25 2023-04-25 Code entrainment identification method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN116484388A true CN116484388A (en) 2023-07-25

Family

ID=87213418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310456727.XA Pending CN116484388A (en) 2023-04-25 2023-04-25 Code entrainment identification method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN116484388A (en)

Similar Documents

Publication Publication Date Title
Kim et al. Similarity matching for integrating spatial information extracted from place descriptions
US8019756B2 (en) Computer apparatus, computer program and method, for calculating importance of electronic document on computer network, based on comments on electronic document included in another electronic document associated with former electronic document
CN111061833B (en) Data processing method and device, electronic equipment and computer readable storage medium
US20180113865A1 (en) Search and retrieval of structured information cards
Nesi et al. Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clustering
US10956153B2 (en) Violation match sets
US20200005089A1 (en) System and method for enrichment of ocr-extracted data
Mariani et al. Semantic matching of gui events for test reuse: are we there yet?
US11741379B2 (en) Automated resolution of over and under-specification in a knowledge graph
Visengeriyeva et al. Anatomy of metadata for data curation
CN114091426A (en) Method and device for processing field data in data warehouse
Ko et al. Natural language processing–driven model to extract contract change reasons and altered work items for advanced retrieval of change orders
CN114580383A (en) Log analysis model training method and device, electronic equipment and storage medium
US10831473B2 (en) Locating business rules in application source code
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
US10705810B2 (en) Automatic code generation
CN114443783B (en) Supply chain data analysis and enhancement processing method and device
CN113626558B (en) Intelligent recommendation-based field standardization method and system
CN116484388A (en) Code entrainment identification method and device, electronic equipment and medium
Li Feature and variability extraction from natural language software requirements specifications
CN111753164A (en) Link event guiding method and device, electronic equipment and storage medium
CN111309865B (en) Similar defect report recommendation method, system, computer device and storage medium
Xiao et al. ReviewLocator: Enhance User Review-Based Bug Localization with Bug Reports
CN113177122A (en) Associated asset determination method and device and electronic equipment
CN116755709A (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination