CN110427330B

CN110427330B - Code analysis method and related device

Info

Publication number: CN110427330B
Application number: CN201910747791.7A
Authority: CN
Inventors: 赵旸; 刘思凡; 邱旻峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2023-09-26
Anticipated expiration: 2039-08-13
Also published as: CN110427330A

Abstract

The embodiment of the application provides a code analysis method and a related device, which are used for calculating the initial probability of each output vector in a code text to be analyzed and the termination probability of each output vector in the code text to be analyzed through the comparison of the word vector to be analyzed and the error word vector after the word vector to be analyzed of the code text to be analyzed is obtained, so that the initial position and the termination position of a target code segment in the code text to be analyzed are determined according to the initial probability and the termination probability, and the target code segment is the error code segment obtained by an analysis result. The method for calculating the vector probability determines the target code segment, can analyze complex error types, and solves the technical problem that complex errors in codes cannot be checked at present.

Description

Code analysis method and related device

Technical Field

The application relates to the technical field of Internet, in particular to a code analysis method and a related device.

Background

With the development of the internet, various computer software or mobile phone software appears on the market, and software developers develop the computer software or the mobile phone software by writing codes, so that the method is the most widely and most common work in the computer industry. How to perform code analysis tasks such as error detection, code generation, code complement, and the like on codes written by software developers has become an industry hotspot.

Currently, a server can check specific errors possibly occurring in codes, such as code specifications, code security, code repetition rate and the like, through rules set by staff, for example, the server can check that brackets and the like are not used in the codes according to the specifications.

However, this method can only check the error of the simple code specification in the code, and cannot check the more complex error in the code.

Disclosure of Invention

The embodiment of the application provides a code analysis method and a related device, which are used for solving the technical problem that complex errors in codes cannot be checked at present.

In view of this, a first aspect of the present application provides a method for code analysis, including:

acquiring N word vectors to be analyzed corresponding to code texts to be analyzed and error word vectors corresponding to error code texts, wherein the error code texts represent code texts matched with the code texts to be analyzed, and N is an integer greater than 1;

n output vectors corresponding to the combined word vectors are obtained through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors;

calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

Determining an object code segment according to the initial probability of each output vector in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed;

and generating a code analysis result of the code text to be analyzed according to the target code segment.

A second aspect of an embodiment of the present application provides an apparatus for code analysis, including:

the acquisition unit is used for acquiring N word vectors to be analyzed corresponding to the code text to be analyzed and error word vectors corresponding to the error code text, wherein the error code text represents the code text matched with the code text to be analyzed, and N is an integer greater than 1;

the processing unit is used for obtaining N output vectors corresponding to the combined word vectors through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors;

the processing unit is further used for calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

The processing unit is further used for determining an object code segment according to the initial probability of each output vector in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed;

and the generating unit is used for generating a code analysis result of the code text to be analyzed according to the target code segment.

In one possible design, in an implementation manner of the second aspect of the embodiment of the present application, the processing unit is further configured to:

calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed according to the N output vectors and the error word vector;

calculating the termination probability of each output vector in the N output vectors in the code text to be analyzed according to the N output vectors and the error word vector;

the calculating, according to the N output vectors and the error word vector, the initial probability of each output vector in the N output vectors in the code text to be analyzed includes:

determining a starting weight fraction corresponding to the ith output vector according to the ith output vector, the set starting weight and the error word vector, wherein i is an integer greater than or equal to 1 and less than or equal to N;

Determining a total score of the initial weights according to the N output vectors, the set initial weights and the error word vectors;

determining the initial probability according to the initial weight score corresponding to the ith output vector and the initial weight total score;

the calculating, according to the N output vectors and the error word vector, a termination probability of each of the N output vectors in a code text to be analyzed includes:

determining termination weight scores corresponding to the j-th output vector according to the j-th output vector, the set termination weight and the error word vector;

determining total scores of the termination weights according to the N output vectors, the set termination weights and the error word vectors;

and determining the termination probability according to the termination weight fraction and the termination weight total fraction corresponding to the j-th output vector, wherein j is an integer which is greater than or equal to 1 and less than or equal to N.

determining the starting position of the target code segment according to the starting probability of each output vector in the code text to be analyzed;

Determining the termination position of the target code segment according to the termination probability of each output vector in the code text to be analyzed;

determining the target code segment according to the starting position of the target code segment and the ending position of the target code segment;

wherein the determining the starting position of the target code segment according to the starting probability of each output vector in the code text to be analyzed comprises:

acquiring the output vector with the highest initial probability;

determining the starting position of the target code segment according to the output vector with the highest starting probability and the mapping relation between the output vector and the code text to be analyzed;

wherein the determining the termination position of the target code segment according to the termination probability of each output vector in the code text to be analyzed comprises:

acquiring the output vector with the highest termination probability;

determining the end position of the target code segment according to the output vector with the highest termination probability and the mapping relation between the output vector and the code text to be analyzed;

and determining the target code segment according to the starting position and the ending position.

Determining a combined word vector corresponding to the word vector to be analyzed according to the word vector to be analyzed and the error word vector, wherein the combined word vector comprises an attention mechanism vector and the word vector to be analyzed, the attention mechanism vector is obtained by weighting according to attention scores of the error word vector and the word vector to be analyzed, and the attention scores are used for representing the correlation degree of the error word vector and the word vector to be analyzed;

and acquiring the N output vectors corresponding to the combined word vector through the neural network model.

acquiring the code text to be analyzed;

converting the code text to be analyzed into a mark sequence to be analyzed, wherein the mark sequence to be analyzed is formed by converting each word or symbol in the code text to be analyzed;

generating N word vectors to be analyzed through a word vector tool according to the marker sequence to be analyzed;

acquiring the set error code text;

converting the error code text into an error marker sequence, wherein the error marker sequence is formed by converting each word or symbol in the error code text;

And generating the error word vector through the word vector tool according to the error marking sequence.

In one possible design, in an implementation manner of the second aspect of the embodiment of the present application, the combined word vector further includes a matching identifier, where the matching identifier includes a first matching identifier and a second matching identifier, the first matching identifier is used to indicate that the to-be-analyzed tag corresponding to the combined word vector matches the error tag in the error tag sequence, and the second matching identifier is used to indicate that the to-be-analyzed tag corresponding to the combined word vector does not match the error tag in the error tag sequence.

In a possible design, in an implementation manner of the second aspect of the embodiment of the present application, the combined word vector further includes a duty cycle analysis tag, and the duty cycle analysis tag identifies a duty cycle of the duty cycle analysis tag in the tag sequence to be analyzed.

acquiring a first word vector sequence formed by the combination word vector positive sequence arrangement;

acquiring a second word vector sequence formed by arranging the combined word vectors in an inverted order;

The method comprises the steps of obtaining an output vector sequence corresponding to a first word vector sequence and a second word vector sequence through a bidirectional long-short-term memory LSTM network model, wherein the bidirectional LSTM network model comprises a forward LSTM network model and a reverse LSTM network model, the forward LSTM network model is used for generating a first output sequence corresponding to the first word vector sequence, the reverse LSTM network model is used for generating a second output sequence corresponding to the second word vector sequence, the output vector sequence is formed by splicing the first output sequence and the second output sequence, and the output vector sequence is formed by arranging output vectors.

A third aspect of an embodiment of the present application provides a server, including: memory, transceiver, processor, and bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, and comprises the following steps:

generating a code analysis result of the code text to be analyzed according to the target code segment;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

acquiring the output vector with the highest initial probability;

acquiring the output vector with the highest termination probability;

n output vectors corresponding to the combined word vectors are obtained through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors.

acquiring the code text to be analyzed;

Acquiring the set error code text;

The combined word vector further comprises a matching identifier, the matching identifier comprises a first matching identifier and a second matching identifier, the first matching identifier is used for indicating that a mark to be analyzed corresponding to the combined word vector is matched with an error mark in the error mark sequence, and the second matching identifier is used for indicating that the mark to be analyzed corresponding to the combined word vector is not matched with the error mark in the error mark sequence.

The combined word vector further comprises a duty ratio analysis mark, and the duty ratio analysis mark identifies the duty ratio of the duty ratio analysis mark in the mark sequence to be analyzed.

A fourth aspect of the embodiments of the application provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect.

A fifth aspect of the embodiments of the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect.

From the above technical solutions, the embodiment of the present application has the following advantages:

after the word vector to be analyzed of the code text to be analyzed is obtained, the initial probability of each output vector in the code text to be analyzed and the ending probability of each output vector in the code text to be analyzed are calculated through comparison of the word vector to be analyzed and the error word vector, so that the initial position and the ending position of a target code segment in the code text to be analyzed are determined according to the initial probability and the ending probability, and the target code segment is the error code segment obtained by an analysis result. The method for calculating the vector probability determines the target code segment, can analyze complex error types, and solves the technical problem that complex errors in codes cannot be checked at present.

Drawings

FIG. 1 is a block diagram of a developer platform according to an embodiment of the present application;

FIG. 2 is a representation of highlighting error codes in an embodiment of the present application;

FIG. 3 is a program display interface on a terminal device of a software developer;

FIG. 4 is an interface diagram for a manager to log into a developer platform to view;

FIG. 5 is a schematic diagram of an embodiment of a code analysis method according to an embodiment of the present application;

FIG. 6 is a diagram illustrating the generation of a combined word vector in accordance with an embodiment of the present application;

FIG. 7 is a diagram of error code block vectors according to an embodiment of the present application;

FIG. 8 is another diagram of error code block vectors in accordance with an embodiment of the present application;

fig. 9 is a flowchart of an embodiment of the present application applied to a terminal device;

FIG. 10 is a schematic diagram of calculating a start probability and a stop probability by using error code block vectors and respective output vectors according to an embodiment of the present application;

FIG. 11 is a schematic diagram of calculating an attention score in an embodiment of the application;

FIG. 12 is a schematic diagram of calculating an attention score in an embodiment of the application;

FIG. 13 is a schematic diagram of converting a code text to be analyzed into a tag sequence according to an embodiment of the present application;

FIG. 14 is a schematic diagram of a server or terminal device inputting a first word vector sequence into a forward LSTM network model;

FIG. 15 is a schematic diagram of a server or terminal device inputting a second word vector sequence into a reverse LSTM network model;

FIG. 16 is a schematic diagram of an application of a code analysis method according to an embodiment of the present application;

FIG. 17 is a schematic diagram of an application example of a code analysis method according to an embodiment of the present application;

FIG. 18 is a diagram of a display interface during server training;

FIG. 19 is a schematic diagram of an apparatus for code analysis according to an embodiment of the present application;

fig. 20 is a schematic diagram of a server structure according to an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be appreciated that after the software developer writes the program code, the running program code may be tested. If an error (BUG) occurs in the code, the application program cannot run correctly, so that the product cannot be on line, and at this time, a software developer needs to perform deep analysis and inspection on the code. The conventional code inspection relies on manual work, and has the defects of high labor cost, long inspection time and no necessity of inspecting error codes.

In view of this, an embodiment of the present application provides a developer platform for performing inspection analysis on codes. Fig. 1 is a schematic diagram of a developer platform according to an embodiment of the present application. It can be seen that after the software developer writes the code, the code can be sent to the server of the developer platform, and the developer platform can analyze and check the code, and can issue online after confirming the code. After the program code is compiled by the software developer through the terminal equipment, the program code can be uploaded to the server, the server detects the program code, the server returns the analysis result to the terminal equipment for display after detecting and analyzing the program code, for example, the server identifies that a certain code section is a suspected error code section, and the position of the code in the program code is sent to the terminal equipment, so that the terminal equipment displays the code section or the suspected error code section is highlighted in the program code.

Fig. 2 is a display diagram of highlighting an error code in an embodiment of the present application, and it can be seen that, in a display interface of a terminal device displaying a program code, a box selection portion is a sequence or a code-marked portion, after a software developer observes the highlight portion in the code, the code can be modified for the highlight portion, without paying attention to a non-highlight portion, so as to help the software developer to improve efficiency.

In the embodiment of the present application, the program code sent by the software developer to the developer platform may be the code of a complete program, such as a complete mobile phone Application (APP), a complete computer program software or a complete service framework; the program code may also be applet code, functional module code embedded within an application program or an operating system kernel or the like; the program code may be a front end code of a web page, a back end code of a web page, or a code segment to be analyzed, etc., and in practical application, may be other codes, which is not limited herein.

In the embodiment of the application, the program code sent by the software developer to the developer platform is sent in the form of a data packet or in the form of a file, and in practical application, the program code can also be sent in an encrypted mode, and the method is not limited in the specification.

In the embodiment of the application, a software developer can send the program codes to the developer platform after the program codes are written, can also send the program codes to the developer platform at intervals of preset time in the process of writing the program codes, obtain real-time feedback of the developer platform and highlight wrong program codes in real time, and can also set a 'checking' virtual button on a software interface of the software developer for writing the codes in practical application, and when the software developer clicks the 'checking' virtual button, the terminal equipment sends the current program codes to a server for checking.

Fig. 3 shows a program display interface on a terminal device of a software developer, where the interface for writing the program has a title bar, a functional board and a main interface, where the software developer can write the program through the main interface, when the software developer wants to check an error in a program code, the software developer can click on a "check" virtual button in the functional board, and trigger a check instruction, and then the terminal device can send the program code to a server for checking according to the check instruction, and highlight an error code segment in the program code in the main interface according to a check result returned by the server.

It will be appreciated that terminal devices include, but are not limited to, cell phones, desktop computers, tablet computers, notebook computers, and palm top computers.

In the embodiment of the present application, code analysis may be code retrieval, code classification, code marking, code error correction, etc., where in the foregoing description, code error correction is taken as an example, and in practical application, other code analysis methods and methods for displaying code analysis results may also be used, for example, after a developer platform receives a program code sent by a software developer, code analysis is performed to obtain categories of different code segments, and then different code segments may be sent to a terminal device in different colors, so that the terminal device displays different code segments in different colors, for example, a main program is represented by red, an embedded function is represented by blue, etc., which is not limited herein.

The manager of the developer platform can log in the developer platform to check the program codes uploaded by the terminal equipment and also check the error code segments of the program codes. Fig. 4 is an interface diagram for a manager to log in a developer platform to view, and it can be seen that an interface displayed on the developer platform may have a title bar, a function block, and a main interface, and that a terminal device identifier, a code language type, and a program code may be displayed on the main interface. It can be understood that the developer platform on the server can accept the program codes uploaded by the plurality of terminal devices, then the program code analysis can be performed on the plurality of terminal devices, and the corresponding analysis results can be sent to the terminal devices.

In the embodiment of the present application, the developer platform may perform code analysis on multiple programming languages, for example, the programming language of the terminal device 1 is php, the programming language of the terminal device 2 is C, and the programming language of the terminal device 3 is computer programming language (Java), where in practical application, the developer platform may also process other computer programming such as c++ language, which is not limited herein.

It can be understood that after the developer platform receives the program code sent by the terminal device, the program code will be analyzed, if the traditional manual code inspection is adopted, the scale of the platform and the application will be greatly limited, if only the inspection rule of the code specification is set, only simple program errors can be analyzed, and complex error codes cannot be analyzed. The embodiment of the application provides a code analysis method and a related device, which are used for solving the technical problem that more complex errors in codes cannot be checked at present.

Fig. 5 is a schematic diagram of an embodiment of a code analysis method according to an embodiment of the present application, and it can be seen that the code analysis method according to the embodiment of the present application includes:

501. Acquiring N word vectors to be analyzed corresponding to code texts to be analyzed and error word vectors corresponding to error code texts, wherein the error code texts represent code texts matched with the code texts to be analyzed, and N is an integer greater than 1;

in the embodiment of the application, after the server acquires the code text to be analyzed, the code text to be analyzed is converted into the word vector to be analyzed. The code text to be analyzed can be a code text written by a software developer on the terminal equipment, if code analysis is needed, the software developer sends the written code text to the server through the terminal equipment, after the server receives the code text, the code text is determined to be analyzed according to the instruction of the terminal equipment, and the code text is determined to be the code text to be analyzed. The server converts the code text to be analyzed into N word vectors to be analyzed, which may be converted by a word vector tool, or may be converted by converting the code text to be analyzed into a tag sequence first, and in practical application, there are many ways of converting the text into word vectors, which is not limited in detail herein. The server may establish a mapping relationship between the code text to be analyzed and N word vectors to be analyzed according to the conversion process, for example, the 1 st word in the code text to be analyzed corresponds to the 1 st word vector to be analyzed, the 2 nd word in the code text to be analyzed corresponds to the 2 nd word vector to be analyzed …, and the nth word in the code text to be analyzed corresponds to the nth word vector to be analyzed. Thus, the word vector to be analyzed may constitute a sequence of word vectors to be analyzed as:

The word vector sequence to be analyzed= [ 1 st word vector to be analyzed, 2 nd word vector to be analyzed … nth word vector to be analyzed ];

similarly, after the server acquires the error code text, the error code text is converted into an error word vector. The error code text is preset by an administrator of the developer platform and is used for checking whether error code segments similar to the error code text exist in the text to be analyzed. The manager can set a plurality of error code texts to be stored in the server according to the requirement, so that the server can select proper error code texts to carry out code analysis according to the error detected by the requirement. After receiving the code text to be analyzed and the instruction sent by the terminal device, the server can select an appropriate error code text according to the instruction of the terminal device, and then convert the error code text into an error word vector, wherein the conversion mode is similar to the mode of converting the code text to be analyzed into the word vector to be analyzed, and detailed description is omitted herein. It will be appreciated that error code text generally corresponds to more than one error word vector. For example, the 1 st word in the error code text corresponds to the 1 st error word vector, the 2 nd word in the error code text corresponds to the 2 nd error word vector …, and the M-th word in the error code text corresponds to the M-th error word vector. Thus, the wrong-word vectors may constitute a sequence of wrong-word vectors:

Error word vector sequence= [ 1 st error word vector, 2 nd error word vector … mth error word vector ];

502. n output vectors corresponding to the combined word vectors are obtained through the neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors;

in the embodiment of the application, the server can generate the combined word vector according to the word vector to be analyzed and the error word vector, then input the combined word vector into the neural network model, and acquire N output vectors corresponding to the combined word vector through the neural network model.

In the embodiment of the application, the combined word vector can be spliced by a plurality of vectors and marks, at least comprises an attention word vector, and can also comprise a word vector to be analyzed, a matching mark and a duty ratio analysis mark.

Fig. 6 is a schematic diagram of generating a combined word vector according to an embodiment of the present application, where it can be seen that the i-th combined word vector may be composed of an i-th attention word vector 601, an i-th word vector to be analyzed 602, an i-th matching identifier 603, and an i-th duty analysis tag 604, where the attention vector is generated by an attention mechanism. The server calculates an attention word vector 601 through an attention mechanism according to the 1 st error word vector 605, the 2 nd error word vector 606 … Mth error word vector and the i th word vector 602 to be analyzed.

The attention word vector and the attention mechanism are described in detail below.

The server can generate corresponding weights according to the tightness degree of the word vector to be analyzed and the error word vector, and then weight the word vector to be analyzed according to the weights to obtain the attention word vector. The degree of correlation of the word vector to be analyzed and the error word vector can be calculated by using the score a _i,j The table 1 is a score table of the word vector to be analyzed and the error word vector, and it can be seen that the server can calculate n×m scores according to N word vectors to be analyzed and M error word vectors, and if the server needs to weight the i th word vector to be analyzed to obtain the i th attention word vector, the server can calculate by the following formula:

wherein vector is _attention For the ith attention word vector, a _i,j Negvector, a score representing the degree of correlation of a word vector to be analyzed with an erroneous word vector _j Is the j-th wrong word vector. In connection with table 1, the i-th attention word vector actually calculated by the server is:

the i-th attention word vector= [ a ] _i,1 * 1 st error word vector, a _i,2 * 2 nd error word vector … a _i,M * M-th error word vector]；

As can be seen from the above equation, the server actually calculates the ith attention word vector 601 through the attention mechanism by using the 1 st error word vector 605, the 2 nd error word vector 606 … mth error word vector and the ith word vector 602 to be analyzed.

TABLE 1

In the embodiment of the present application, the combined word vector generally further includes a word vector to be analyzed, that is, the combined word vector is formed by splicing an attention word vector and the word vector to be analyzed, that is:

the i-th combined word vector= [ i-th attention word vector, i-th word vector to be analyzed ];

the server may form a mapping relationship of the combined word vector, the attention word vector, and the word vector to be analyzed according to the above procedure, that is, the i-th combined word vector corresponds to the i-th attention word vector to correspond to the i-th word vector to be analyzed.

In the embodiment of the application, after the server calculates N combined word vectors, the combined word vectors may be input into the neural network model to obtain output vectors corresponding to the combined word vectors, and it may be understood that the server inputs the N combined word vectors into the neural network model to obtain N output vectors corresponding to the N combined word vectors, that is, the 1 st combined word vector corresponds to the 1 st output vector, the 2 nd combined word vector corresponds to the 2 nd output vector … nth combined word vector corresponds to the nth output vector.

The neural network model used by the server may be a cyclic neural network model (recurrent neural networks, RNN), a long short-term memory artificial neural network model (LSTM) model, a bidirectional LSTM model, a gated cyclic unit model (gated recurrent uni, GRU), etc., which are not limited herein.

503. Calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

in the embodiment of the application, the server can calculate the initial probability of each output vector in the N output vectors in the code text to be analyzed according to the N output vectors and the M error word vectors, and the initial probability can represent the probability that the code text to be analyzed corresponding to the output vector is the initial position of the target code segment. When the target code segment is the code segment closest to the error code text and found from the code text to be analyzed, the server needs to find the starting position and the ending position of the target code segment in the text to be analyzed when the server needs to determine the target code segment. According to the mapping relation between words or symbols in the text to be analyzed and N output vectors, the server selects the output vector with the highest initial probability by calculating the initial probability of the N output vectors, and then the server can acquire the initial position of the target code segment in the text to be analyzed through the output vector with the highest initial probability and the mapping relation. Similarly, the server can acquire the termination position of the target code segment in the text to be analyzed through the output vector with the highest termination probability and the mapping relation.

The server may first obtain one output vector, for example, an ith output vector, then determine a start probability for the ith output vector according to a similarity between the ith output vector and a start portion of an error code block vector, and determine a stop probability for the ith output vector according to a similarity between the ith output vector and a stop portion of the error code block vector, where the error code block vector is determined by M error code word vectors. The server may splice M error code word vectors into an error code block vector, or splice the error code word vectors after weighting, or may input the error code word vectors into a bidirectional LSTM neural network, then splice the vectors output by the neural network after weighting to obtain the error code block vector, which is not limited in the specific point. The error code block vector start portion may be found by the error code block vector and the start weight, and the error code block vector end portion may be found by the error code block vector and the end weight.

FIG. 7 is a schematic diagram of error code block vectors according to an embodiment of the present application, in which error code text is converted into error code word vectors and then into error code block vectors. Generally, an error code text corresponds to an error code block vector, as shown on the left side of FIG. 7 for error code text and on the right for error code blocks.

FIG. 8 is another schematic diagram of error code block vectors in an embodiment of the present application, and it can be seen that the method of the embodiment of the present application can convert program codes in any language into vectors, and the left side of FIG. 8 is error code text, and the right side is error code block. Fig. 7 is a diagram showing conversion of the php language code into a vector, and fig. 8 is a diagram showing conversion of the C language code into a vector.

504. Determining an object code segment according to the initial probability of each output vector in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed;

in the embodiment of the application, the server can acquire the initial position of the target code segment in the text to be analyzed through the output vector with the highest initial probability and the mapping relation, then acquire the final position of the target code segment in the text to be analyzed through the output vector with the highest final probability and the mapping relation, and finally determine the target code segment in the text to be identified according to the initial position and the final position of the target code block.

As shown in FIG. 2, the box-select portion is a target code segment determined by the server, and it can be seen that the start position of the target code segment is "switch" and the end position is "; "the target code segment is the code segment shown in FIG. 2, which the server may highlight. It can be understood that, according to the initial probability of each output vector in the code text to be analyzed, the server determines that the output vector with the highest initial probability is the ith output vector corresponding to the switch, and then the switch can be found according to the mapping relationship between the output vector and the text to be analyzed, so as to determine the initial position of the target code segment, and similarly, the server can determine the end position of the target code segment, so as to determine the target code segment.

505. And generating a code analysis result of the code text to be analyzed according to the target code segment.

In the embodiment of the application, the server may generate a code analysis result of the code text to be analyzed according to the target code segment, where the code analysis result may include a start position identifier and an end position identifier of the target code block to locate the target code segment, and may also be the text of the target code segment, and the specific application is not limited herein.

After the server generates the code analysis result, the code analysis result may be sent to the terminal device of the software developer, so that the terminal device highlights the target code segment in the code analysis result, as shown in fig. 3, the box selection part in the main interface of the code written by the software developer is highlighted, so as to remind the software developer that the highlight part is the target code segment detected by the server.

The method of the embodiment of the application can also be applied to the terminal equipment, and fig. 9 is a flowchart of an embodiment of the application applied to the terminal equipment. The embodiment of the application provides a code analysis method, which is applied to terminal equipment and comprises the following steps:

901. the method comprises the steps that terminal equipment obtains N word vectors to be analyzed corresponding to code texts to be analyzed and error word vectors corresponding to error code texts, the error code texts represent code texts matched with the code texts to be analyzed, and N is an integer greater than 1;

In the embodiment of the application, the terminal equipment writes the terminal equipment used by the software for the software developer, and the terminal equipment can receive the code text input by the software developer through the client used by the software developer and serve as the code text to be analyzed. The error code text may be stored in advance in a database of the terminal device or may be obtained from a server.

Other contents of step 901 in the embodiment of the present application are similar to those of step 501 in the respective embodiments corresponding to fig. 5, and are not repeated here.

902. The terminal equipment acquires N output vectors corresponding to the combined word vectors through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors;

step 902 in the embodiment of the present application is similar to step 502 in the corresponding embodiments of fig. 5, and will not be described herein.

903. The terminal equipment calculates the initial probability of each output vector in N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

step 903 in the embodiment of the present application is similar to step 503 in the corresponding embodiments of fig. 5, and will not be described herein.

904. The terminal equipment determines an object code segment according to the initial probability of each output vector in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed;

step 904 in the embodiment of the present application is similar to step 504 in the corresponding embodiments of fig. 5, and will not be described here again.

905. And the terminal equipment generates a code analysis result of the code text to be analyzed according to the target code segment.

Step 905 in the embodiment of the present application is similar to step 505 in the corresponding embodiments of fig. 5, and will not be described herein.

After the terminal equipment generates the code analysis result of the code text to be analyzed, the corresponding target code segment can be directly highlighted in the client used by the software developer for writing the software.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, further including:

calculating the initial probability of each output vector in N output vectors in the code text to be analyzed according to the N output vectors and the error word vector;

According to the N output vectors and the error word vector, calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed comprises the following steps:

determining a starting weight score corresponding to the ith output vector according to the ith output vector, the set starting weight and the error word vector, wherein i is an integer which is greater than or equal to 1 and less than or equal to N;

determining a starting probability according to a starting weight score and a starting weight total score corresponding to the ith output vector;

according to the N output vectors and the error word vector, calculating the termination probability of each output vector in the N output vectors in the code text to be analyzed, wherein the method comprises the following steps:

determining termination weight scores corresponding to the jth output vector according to the jth output vector, the set termination weight and the error word vector;

and determining termination probability according to the termination weight fraction and the termination weight total fraction corresponding to the j-th output vector, wherein j is an integer which is greater than or equal to 1 and less than or equal to N.

In the embodiment of the application, the server or the terminal equipment can calculate the initial probability through the output vector, the set initial weight and the error word vector, and the calculation formula is as follows:

wherein P is ^(start) (i) H is the start probability of the ith output vector _i For the ith output vector, W ^(start) For the starting weight, negvector is the error code block vector,for the initial weight score corresponding to the i-th output vector,is the total score of the starting weights.

The initial weight is a weight preset by a server or terminal equipment through training. The server or the terminal device may directly use the trained initial weight to calculate, and the specific training process may refer to the subsequent embodiment.

In the embodiment of the application, the server or the terminal equipment can calculate the termination probability through the output vector, the set termination weight and the error word vector, and the calculation formula is as follows:

wherein P is ^(end) (i) Terminating probability for the ith output vector, h _i For the ith output vector, W ^(end) To terminate the weights, negvector is the error code block vector,for the termination weight score corresponding to the i-th output vector,to terminate the total score of weights.

The termination weight is a weight preset by the server or the terminal equipment through training. The server or the terminal device may directly use the trained termination weight to calculate, and the specific training process may refer to the subsequent embodiment.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where determining, according to a start probability of each output vector in a code text to be analyzed and a stop probability of each output vector in the code text to be analyzed, the target code segment includes:

wherein determining the starting position of the target code segment according to the starting probability of each output vector in the code text to be analyzed comprises:

obtaining an output vector with highest initial probability;

determining the initial position of the target code segment according to the output vector with the highest initial probability and the mapping relation between the output vector and the code text to be analyzed;

wherein determining the termination position of the target code segment according to the termination probability of each output vector in the code text to be analyzed comprises:

Obtaining an output vector with highest termination probability;

the target code segment is determined based on the start position and the end position.

In the embodiment of the present application, the server or the terminal device may calculate the start probability of each output vector, for example, the start probability of the 1 st output vector is 0.05, the start probability of the 2 nd output vector is 0.1, and the like, and then select the output vector with the highest start probability, for example, the output vector with the highest start probability of the i th output vector, and then the server or the terminal device determines the start position of the target code segment according to the mapping relationship of the i th output vector.

In the embodiment of the present application, the server or the terminal device may calculate the termination probability of each output vector, for example, the termination probability of the 1 st output vector is 0.08, the termination probability of the 2 nd output vector is 0.01, and the like, and then select the output vector with the highest termination probability, for example, the termination probability of the i-th output vector is the highest, and then the server or the terminal device determines the termination position of the target code segment according to the mapping relationship of the i-th output vector.

Table 2 is a data table obtained by calculating the start probability and the stop probability according to the embodiment of the present application, and it can be seen that each output vector can be calculated to obtain the corresponding start probability and stop probability.

TABLE 2

Output vector identification	1	2	…	N
					Probability of initiation	0.05	0.10	…	0.06
Probability of termination	0.08	0.01	…	0.05

Fig. 10 is a schematic diagram of calculating a start probability and a stop probability by using an error code block vector and each output vector according to an embodiment of the present application, it can be seen that a server or a terminal device first converts the error word vector into the error code block vector, and then calculates the start probability and the stop probability corresponding to each output vector by using an attention mechanism, that is, a start probability calculation formula and a stop probability calculation formula.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where N output vectors corresponding to a combined word vector are obtained through a neural network model, where the combined word vector is generated according to a word vector to be analyzed and an error word vector, and includes:

N output vectors corresponding to the combined word vectors are obtained through the neural network model.

In the embodiment of the present application, the server or the terminal device first generates the combined word vector according to the word vector to be analyzed and the error word vector, and the generating manner may refer to each embodiment corresponding to fig. 5, which is not described herein again. The server or the terminal device can calculate the attention score a according to the error word vector and the word vector to be analyzed _i,j As shown in table 1. Attention score a _i,j There are a number of calculation methods, and embodiments of the present application provide one of the attention scores a _i,j The calculation formula of (2) is as follows:

wherein a is _i,j For the attention score between the ith vector to be analyzed and the jth error word vector, original_vector _i For the i-th word vector to be analyzed, neg vector _j For the j-th error word vector, a calculation formula of a multi-layer perfect neural network (MLP) function is as follows:

MLP(x)＝max(0,Wx+b),W∈R ^d×d ,b∈R ^d ；

in the MLP function, W and b are weight vectors that are self-learned by the network, and may be obtained through training, and the specific training method is not described herein. In the embodiment of the application, one layer of the multi-layer neural network is generally adopted, and a function of taking the maximum value is adopted.

Fig. 11 is a schematic diagram of calculating an attention score according to an embodiment of the present application, it can be seen that, the server or the terminal device calculates the 1 st attention score 1106 through the 1 st error word vector 1101 and the i th word vector to be analyzed, calculates the 2 nd attention score 1107 through the 2 nd error word vector 1102 and the i th word vector to be analyzed, calculates the 3 rd attention score 1108 through the 3 rd error word vector 1103 and the i th word vector to be analyzed, calculates the 4 th attention score 1109 through the 1 st error word vector 1104 and the i th word vector to be analyzed, and the 1 st error word vector 1101 and the i th word vector to be analyzed have a relatively similar logical relationship and structural relationship, so the 1 st attention score 1106 is relatively high, and the 2 nd error word vector 1102 and the i th word vector to be analyzed have a very similar logical relationship and structural relationship, so the 2 nd attention score 1107 is the highest.

Fig. 12 is a schematic diagram of calculating an attention score according to an embodiment of the present application, it can be seen that the server or the terminal device calculates the 1 st attention score 1206 through the 1 st error word vector 1201 and the j-th word vector to be analyzed, calculates the 2 nd attention score 1207 through the 2 nd error word vector 1202 and the j-th word vector to be analyzed, calculates the 3 rd attention score 1208 through the 3 rd error word vector 1203 and the j-th word vector to be analyzed, calculates the 4 th attention score 1209 through the 1 st error word vector 1204 and the j-th word vector to be analyzed, and the 3 rd error word vector 1203 and the i-th word vector to be analyzed have very close logical relationship and structural relationship, so the 2 nd attention score 1208 is the highest.

Therefore, the server can calculate the attention score through the word vector to be analyzed and the error word vector, and the attention score is used for representing the correlation degree of the word vector to be analyzed and the error word vector. And then the server or the terminal equipment calculates the attention word vector according to the attention score and the error word vector, and finally the attention word vector is spliced with other parts to obtain a combined word vector.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where obtaining N word vectors to be analyzed corresponding to a code text to be analyzed and an error word vector corresponding to an error code text includes:

acquiring a code text to be analyzed;

generating N word vectors to be analyzed through a word vector tool according to the tag sequence to be analyzed;

acquiring a set error code text;

converting the error code text into an error mark sequence, wherein the error mark sequence is formed by converting each word or symbol in the error code text;

generating an error word vector by a word vector tool according to the error marking sequence.

In the embodiment of the application, after the server or the terminal equipment can acquire the code text to be analyzed, the code text to be analyzed is converted into the mark sequence to be analyzed, and the mark sequence to be analyzed is formed by converting each word or symbol in the code text to be analyzed. The tag sequence is also referred to as a token sequence, which is a code segment having a type that can determine a semantic representation (e.g., a keyword, a string, or a comment) of text, can be obtained using a conventional lexical analyzer pygments, or can be obtained using a modified lexical analyzer pygments, and is not limited in this regard.

FIG. 13 is a schematic diagram of converting a code text to be analyzed into a markup sequence according to an embodiment of the present application. It can be seen that the left side of fig. 11 is the code text to be analyzed, the right side is the tag sequence to be analyzed, and after the server or the terminal device obtains the code text to be analyzed, the code text is converted into the tag sequence to be analyzed.

It can be understood that the code text to be analyzed has a mapping relationship with the tag sequence to be analyzed, as in fig. 11, if the first word or symbol of the code text to be analyzed is "$type", then it may be converted into the 1 st tag sequence to be analyzed, "Variable assignment", that is, a mapping relationship between "$type" and "Variable assignment", through which the server or the terminal device may read "Variable assignment" according to "$type" or read "$type" according to "Variable assignment".

It will be appreciated that the tag to be analysed in the tag sequence to be analysed may be repeated, for example the tag "Variable assignment" in figure 11 may be repeated a plurality of times. In particular, the conversion of the code text to the tag sequence may be performed in a manner similar to table 3. Table 3 is a data table for converting a code text and a tag sequence in the embodiment of the present application, and it can be seen that when a server or a terminal device detects a text similar to $a in the code text in the conversion process, the text is converted into a tag "Variable assignment", so if multiple texts similar to $a appear in the text, there may be multiple tags "Variable assignment".

TABLE 3 Table 3

token type	Code text
		Variable assign	$a,x
Operator	->,！＝,+,-
		Keyword	for,in,while,return,continue
String	“this is an example”
		Comment	//must be negative

It can be appreciated that after the server or the terminal device converts the code text to be analyzed into the tag sequence to be analyzed, the tag sequence to be analyzed can be converted into the word vector to be analyzed by a word vector tool (word 2vec tool). The labels to be analyzed and the word vectors to be analyzed have a mapping relation, namely, the 1 st label to be analyzed corresponds to the 1 st word vector to be analyzed, the 2 nd label to be analyzed corresponds to the 2 nd word vector to be analyzed … nth label to be analyzed corresponds to the nth word vector to be analyzed.

In the embodiment of the application, after the server or the terminal equipment can acquire the error code text, the error code text is converted into the error mark sequence, and the error mark sequence is formed by converting each word or symbol in the error code text. The tag sequence is also referred to as a token sequence, which is a code segment having a type that can determine a semantic representation (e.g., a keyword, a string, or a comment) of text, can be obtained using a conventional lexical analyzer pygments, or can be obtained using a modified lexical analyzer pygments, and is not limited in this regard. The error code text may be a code text preset to be stored in the server.

It will be appreciated that, similar to the code text to be analyzed, the error code text has a mapping relationship with the error marker sequence.

It will be appreciated that similar to the code text to be analyzed, the false marks in the sequence of false marks may be repeated.

It will be appreciated that after the server or terminal device converts the error code text into an error tag sequence, the error tag sequence may be converted into an error word vector by a word vector tool (word 2vec tool). There is a mapping relationship between the error flags and the error word vectors, i.e. the 1 st error flag corresponds to the 1 st error word vector, the 2 nd error flag corresponds to the 2 nd error word vector … nth error flag corresponds to the nth error word vector.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where the combined word vector further includes a matching identifier, and the matching identifier includes a first matching identifier and a second matching identifier, where the first matching identifier is used to indicate that a to-be-analyzed tag corresponding to the combined word vector matches an error tag in the error tag sequence, and the second matching identifier is used to indicate that the to-be-analyzed tag corresponding to the combined word vector does not match the error tag in the error tag sequence.

In the embodiment of the application, when the server or the terminal equipment generates the combined word vector, the matching identifier can be spliced in the combined word vector. The matching identifiers comprise a first matching identifier and a second matching identifier, the first matching identifier can be a numerical value of 1 and is used for indicating that the to-be-analyzed mark corresponding to the combined word vector is matched with the error mark in the error mark sequence, the second matching identifier can be a numerical value of 0 and the second matching identifier is used for indicating that the to-be-analyzed mark corresponding to the combined word vector is not matched with the error mark in the error mark sequence. The first matching identifier may be a value of 1 and the second matching identifier may be a value of 0 so that the server and the terminal device can quickly identify the matching identifier.

Table 4 is a table example of a to-be-analyzed tag sequence and an error tag sequence when matching identifiers are spliced on a combined word vector in the embodiment of the present application, and it can be seen that, if the to-be-analyzed tag corresponding to the combined word vector and the error tag in the error tag sequence are different, the server or the terminal device splices a second matching identifier on the combined word vector. For example, when the server or the terminal device calculates the combined word vector corresponding to the to-be-analyzed tag "case", the to-be-analyzed tag corresponding to the combined word vector is "case", and the to-be-analyzed tag "case" is different from the error tag "type", "Barek" < ", and the server or the terminal device splices the second matching identifier to the combined word vector corresponding to the to-be-analyzed tag" case ".

For another example, if the to-be-analyzed tag corresponding to the combined word vector matches the error tag in the error tag sequence, the server or the terminal device splices the first matching identifier to the combined word vector. That is, when the server or the terminal device calculates the combined word vector corresponding to the "+" of the to-be-analyzed tag, the to-be-analyzed tag corresponding to the combined word vector is "+", and the to-be-analyzed tag "+" is matched with the error tag "+", the server or the terminal device splices the first matching identifier to the combined word vector corresponding to the to-be-analyzed tag "+".

TABLE 4 Table 4

The marker sequence to be analyzed	Error marker sequence
		case	type
if	Barek
		+	+

Optionally, on the basis of the respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where the combined word vector further includes a duty ratio analysis tag, and the duty ratio analysis tag identifies a duty ratio of the duty ratio analysis tag in the tag sequence to be analyzed.

In the embodiment of the application, when the server or the terminal equipment generates the combined word vector, the duty ratio analysis mark can be spliced in the combined word vector. Table 5 is a table of a tag sequence to be analyzed in the embodiment of the present application, and it can be seen that when the server or the terminal device generates the combined word vector, the duty ratio analysis tag is spliced into the combined word vector according to the duty ratio of the corresponding tag to be analyzed in the tag sequence to be analyzed being processed. For example, when the server or the terminal device generates the combined word vector corresponding to the mark "if" to be analyzed, the server or the terminal device detects that the mark "if" to be analyzed occupies 3/5 of all the marks to be analyzed in the currently processed mark sequence to be analyzed, and then the server or the terminal device splices the 3/5 mark as the mark to be analyzed into the combined word vector.

TABLE 5

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 5 or fig. 9, an embodiment of the present application further provides an optional embodiment of a method for generating a code vector, where determining, by a neural network model, an output vector of the neural network model according to the combined word vector includes:

acquiring a first word vector sequence formed by arranging combination word vectors in a positive sequence;

acquiring a second word vector sequence formed by arranging the combined word vectors in a reverse order;

the method comprises the steps of obtaining an output vector sequence corresponding to a first word vector sequence and a second word vector sequence through a bidirectional long-short-term memory LSTM network model, wherein the bidirectional LSTM network model comprises a forward LSTM network model and a reverse LSTM network model, the forward LSTM network model is used for generating the first output sequence corresponding to the first word vector sequence, the reverse LSTM network model is used for generating the second output sequence corresponding to the second word vector sequence, the output vector sequence is formed by splicing the first output sequence and the second output sequence, and the output vector sequence is formed by arranging output vectors.

In the embodiment of the present application, a server or a terminal device determines an output vector of a neural network model through the neural network model according to the combined word vector, and the server or the terminal device first obtains a first word vector sequence formed by arranging the combined word vectors in a positive sequence, where the first word vector sequence may be:

A first word vector sequence= [ 1 st combined word vector, 2 nd combined word vector … nth combined word vector ];

the server or the terminal device obtains a second word vector sequence formed by arranging the combined word vectors in an inverted order, and the second word vector sequence may be:

second word vector sequence= [ nth combined word vector, nth-1 combined word vector … 1 st combined word vector ];

fig. 14 is a schematic diagram of a server or a terminal device inputting a first word vector sequence into a forward LSTM network model, and it can be seen that after the server inputs the first word vector sequence into the forward LSTM network model, a first output sequence may be obtained, where the first output sequence is:

first output sequence= [ 1 st forward output vector, 2 nd forward output vector, … nth forward output vector ];

the 1 st forward output vector is calculated by the server according to the 1 st word vector through a forward LSTM network model, the 2 nd forward output vector is calculated by the server according to the 2 nd word vector through a forward LSTM network model, and the … nth forward output vector is calculated by the server according to the nth word vector through a forward LSTM network model.

Fig. 15 is a schematic diagram of a server or a terminal device inputting a second word vector sequence into a reverse LSTM network model, and it can be seen that after the server inputs the second word vector sequence into the reverse LSTM network model, a second output sequence may be obtained, where the second output sequence is:

Second output sequence= [ 1 st reverse output vector, 2 nd reverse output vector, … nth reverse output vector ];

the 1 st reverse output vector is calculated by the server according to the 1 st word vector through a reverse LSTM network model, the 2 nd reverse output vector is calculated by the server according to the 2 nd word vector through a reverse LSTM network model, and the … nth reverse output vector is calculated by the server according to the nth word vector through a reverse LSTM network model.

In the embodiment of the application, the neuron structure of the reverse LSTM network model is the same as that of the forward direction, but the state is transferred, and the data input direction is opposite to the forward direction.

After the server inputs the first word vector sequence and the first word vector sequence into the bidirectional LSTM network model, a first output sequence and a second output sequence are obtained, and then the server can splice the first output sequence and the second output sequence together, so that an output vector sequence is obtained. The output vector sequence is formed by arranging output vectors, wherein the output vectors are positive output vectors, and the output vectors are reverse output vectors, namely, the output vectors are formed by splicing the positive output vectors and the reverse output vectors. The 1 st output vector is formed by splicing the 1 st forward output vector and the 1 st reverse output vector, the 2 nd output vector is formed by splicing the 2 nd forward output vector and the 2 nd reverse output vector, and the … nth output vector is formed by splicing the nth forward output vector and the nth reverse output vector. Finally, the server obtains an output vector sequence by splicing:

Output vector sequence= [ 1 st output vector, 2 nd output vector … nth output vector ].

According to the embodiments corresponding to fig. 5 or fig. 9, fig. 16 is a schematic diagram of an application of the code analysis method according to the embodiment of the present application, where the method is applied to a server, and it can be seen that the server first obtains an error code block, which is also called an error code text, and determines whether the error code block is a tag sequence, if yes, an error code block word vector is generated according to the error tag sequence, and if not, the error code block is converted into the tag sequence and then the error code block word vector is generated.

The server can also acquire a source code text, the source code text can also be called as a code text to be analyzed, then the server judges whether the source code text is a mark sequence, if so, a source code word vector is generated according to the source code text, the source code word vector can also be called as a word vector to be analyzed, if not, the server firstly converts the source code text into the mark sequence, and then the source code word vector is generated according to the mark sequence.

It will be appreciated that the server does not have the order of time to acquire the error code blocks and the source code, but typically acquires the error code blocks first.

After the server generates the error code block word vector and the source code word vector, step one may be performed, i.e., generating a combined word vector according to the error code block word vector and the source code word vector through an attention mechanism. The combined word vector is not shown in fig. 16, but the server inputs the combined word vector into a bi-directional LSTM network, i.e., a forward LSTM network and a reverse LSTM network, after generating the combined word vector.

Step one in fig. 16 is similar to step 502 in the respective embodiments corresponding to fig. 5, and detailed descriptions thereof are omitted here.

The server inputs the combined word vector into the bi-directional LSTM network to obtain a concatenated output vector, and on the other hand, the server also converts the error code word vector into an error code block vector. Then, the server predicts the starting position and the ending position of the error code according to the attention mechanism through the error code block vector and the spliced output vector, namely, the step two.

In this application, step two is similar to step 503 and step 504 in each embodiment corresponding to fig. 5, and detailed descriptions thereof are omitted here.

It will be appreciated that after the server obtains the start and end positions of the error code, the connected terminal device may be instructed to highlight the error code, as shown in fig. 3.

Fig. 17 is a schematic diagram of an application example of the code analysis method provided by the embodiment of the present application, it can be seen that a server obtains an error code block, a source code text 1 and a source code text 2, then converts the error code block, the source code text 1 and the source code text 2 into a tag sequence, then performs algorithm prediction, calculates to obtain an object code segment of the source code text 1 and an object code segment of the source code text 2, and then performs highlighting on the source code text 1 and the source code text 2 according to the object code segment of the source code text 1 and the object code segment of the source code text 2, and simultaneously performs corresponding highlighting on the source tag sequence 1 and the source tag sequence 2.

In the foregoing embodiments or application examples, the preset parameters may be obtained through training, the server first randomly initializes each parameter, then trains each parameter in the method flow provided by the embodiment of the present application through the error code text and the training code text, optimizes each parameter, and finally obtains the trained parameter. The training method is described below.

The server firstly acquires error code texts and training code texts, wherein the error code texts and the training code texts can be code texts input in advance by a manager of the developer platform, and the manager searches corresponding error code texts in a plurality of code texts according to the error types which want to train. Typically, the error code text is only a section of code text, and is not complete, the administrator needs to determine a whole section of code (a relatively complete code) where the error code text is located, that is, a training code text, and typically, the administrator needs to mark a start position and an end position of the error code text in the training code text, so that the administrator inputs the error code text, the training code text, and an identifier of the start position and the end position of the error code text in the training code text to the server, and the server obtains the error code text, the training code text, and an identifier of the start position and the end position of the error code text in the training code text.

The server then trains the parameters in the overall method flow, with the task of allowing the network to learn the optimal parameters by minimizing the objective function. The objective function is:

L＝-ΣlogP ^(start) (a _start )-ΣlogP ^(end) (a _end )；

wherein a is _start Representing the starting position of the error code text in the training code text, which can be represented by a corresponding marker sequence, a _end The end position of the error code text in the training code text can be represented by a corresponding mark sequence, for example, the mark sequence of the training code is A, B, C, D, E, the mark error code is B, C, D, the start position is 2, and the end position is 4.

The server uses a random gradient descent algorithm to continually compare the position of the start and end of the error code predicted by the algorithm with the correct start and end positions (a _start And a _end ) In contrast, optimize all of the models mentioned in the algorithm flowAnd (3) continuously changing the values of the unknown parameters to minimize the objective function, and finally, enabling the accuracy of the algorithm model to the prediction result of the training set to be the highest, so that model training is completed.

Fig. 18 is a diagram showing an interface during training of a server, and it can be seen that the server obtains each parameter in the whole trained method flow after multiple rounds of training optimization.

Fig. 19 is a schematic diagram of a device for code analysis according to an embodiment of the present application, where the device 1900 for code analysis according to an embodiment of the present application includes:

an obtaining unit 1901, configured to obtain N word vectors to be analyzed corresponding to a code text to be analyzed and an error word vector corresponding to an error code text, where the error code text represents a code text that matches the code text to be analyzed, and N is an integer greater than 1;

the processing unit 1902 is configured to obtain N output vectors corresponding to a combined word vector through a neural network model, where the combined word vector is generated according to a word vector to be analyzed and an error word vector;

the processing unit 1902 is further configured to calculate, according to the N output vectors and the error word vector, a start probability of each of the N output vectors in the code text to be analyzed, and a stop probability of each of the N output vectors in the code text to be analyzed;

the processing unit 1902 is further configured to determine an object code segment according to the start probability of each output vector in the code text to be analyzed and the end probability of each output vector in the code text to be analyzed;

a generating unit 1903, configured to generate a code analysis result of the code text to be analyzed according to the target code segment.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 19, an embodiment of the present application further provides an optional embodiment of an apparatus for generating a code vector, where the processing unit 1902 is further configured to:

obtaining an output vector with highest initial probability;

obtaining an output vector with highest termination probability;

Acquiring a code text to be analyzed;

acquiring a set error code text;

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 19, an embodiment of the present application further provides an optional embodiment of a device for generating a code vector, where the combined word vector further includes a matching identifier, and the matching identifier includes a first matching identifier and a second matching identifier, where the first matching identifier is used to indicate that a to-be-analyzed tag corresponding to the combined word vector matches an error tag in the error tag sequence, and the second matching identifier is used to indicate that the to-be-analyzed tag corresponding to the combined word vector does not match an error tag in the error tag sequence.

Optionally, on the basis of the respective embodiments corresponding to fig. 19, an embodiment of the present application further provides an optional embodiment of a device for generating a code vector, where the combined word vector further includes a duty ratio analysis tag, and the duty ratio analysis tag identifies a duty ratio of the duty ratio analysis tag in the tag sequence to be analyzed.

Fig. 20 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 2000 may have a relatively large difference between configurations or performances, and may include one or more central processing units (central processing units, CPU) 2022 (e.g., one or more processors) and a memory 2032, and one or more storage media 2030 (e.g., one or more mass storage devices) storing application programs 2042 or data 2044. Wherein the memory 2032 and the storage medium 2030 may be transitory or persistent. The program stored on the storage medium 2030 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 2022 may be arranged to communicate with a storage medium 2030, and execute a series of instruction operations in the storage medium 2030 on the server 2000.

The server 2000 may also include one or more power supplies 2026, one or more wired or wireless network interfaces 2050, one or more input/output interfaces 2058, and/or one or more operating systems 2041 such as Windows server (tm), mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 20.

In the embodiment of the present application, the CPU2022 is specifically configured to perform the following steps:

n output vectors corresponding to the combined word vectors are obtained through the neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors;

in an embodiment of the present application, the CPU2022 is further configured to perform the following steps:

obtaining an output vector with highest initial probability;

obtaining an output vector with highest termination probability;

acquiring a code text to be analyzed;

acquiring a set error code text;

The combined word vector also comprises a matching identifier, the matching identifier comprises a first matching identifier and a second matching identifier, the first matching identifier is used for indicating that the to-be-analyzed mark corresponding to the combined word vector is matched with the error mark in the error mark sequence, and the second matching identifier is used for indicating that the to-be-analyzed mark corresponding to the combined word vector is not matched with the error mark in the error mark sequence;

the combined word vector also includes a duty cycle analysis tag that identifies the duty cycle of the duty cycle analysis tag in the tag sequence to be analyzed.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A method of code analysis, comprising:

the calculating, according to the N output vectors and the error word vector, a start probability of each output vector in the N output vectors in the code text to be analyzed, and a stop probability of each output vector in the code text to be analyzed, including:

2. The method of claim 1, wherein said determining the target code segment based on the start probability of each output vector in the code text to be analyzed and the end probability of each output vector in the code text to be analyzed comprises:

Acquiring the output vector with the highest initial probability;

acquiring the output vector with the highest termination probability;

3. The method of claim 1, wherein the obtaining, by the neural network model, N output vectors corresponding to a combined word vector, wherein the combined word vector is generated according to the word vector to be analyzed and the error word vector, includes:

4. The method of claim 3, wherein the obtaining N word vectors to be analyzed corresponding to the code text to be analyzed and the error word vector corresponding to the error code text comprises:

acquiring the code text to be analyzed;

acquiring the set error code text;

5. The method of claim 4, wherein the combined word vector further comprises a matching identifier, the matching identifier comprising a first matching identifier for indicating that the to-be-analyzed tag corresponding to the combined word vector matches the error tag in the error tag sequence, and a second matching identifier for indicating that the to-be-analyzed tag corresponding to the combined word vector does not match the error tag in the error tag sequence.

6. The method of claim 4, wherein the combined word vector further comprises a duty cycle analysis tag that identifies a duty cycle of the duty cycle analysis tag at the sequence of tags to be analyzed.

7. The method of claim 3, wherein the obtaining, by the neural network model, the N output vectors corresponding to the combined word vector comprises:

8. An apparatus for code analysis, comprising:

the processing unit is used for obtaining N output vectors corresponding to the combined word vectors through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors; calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

the processing unit is used for determining an object code segment according to the initial probability of each output vector in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed;

the generation unit is used for generating a code analysis result of the code text to be analyzed according to the target code segment;

The processing unit is specifically configured to:

9. The apparatus of claim 8, wherein the processing unit is further configured to:

acquiring the output vector with the highest initial probability;

acquiring the output vector with the highest termination probability;

10. The apparatus of claim 8, wherein the processing unit is further configured to:

11. The apparatus according to claim 10, wherein the acquisition unit is specifically configured to:

acquiring the code text to be analyzed;

acquiring the set error code text;

12. The apparatus of claim 11, wherein the combined word vector further comprises a match indicator, the match indicator comprising a first match indicator for indicating that the marker to be analyzed corresponding to the combined word vector matches the error marker in the error marker sequence, and a second match indicator for indicating that the marker to be analyzed corresponding to the combined word vector does not match the error marker in the error marker sequence.

13. The apparatus of claim 11, wherein the combined word vector further comprises a duty cycle analysis tag that identifies a duty cycle of the duty cycle analysis tag at the sequence of tags to be analyzed.

14. The apparatus of claim 10, wherein the processing unit is further configured to:

15. A server, the server comprising: memory, transceiver, processor, and bus system;

wherein the memory is used for storing programs;

n output vectors corresponding to the combined word vectors are obtained through a neural network model, wherein the combined word vectors are generated according to the word vectors to be analyzed and the error word vectors; calculating the initial probability of each output vector in the N output vectors in the code text to be analyzed and the termination probability of each output vector in the code text to be analyzed according to the N output vectors and the error word vector;

determining the termination probability according to a termination weight fraction corresponding to the j-th output vector and the termination weight total fraction, wherein j is an integer greater than or equal to 1 and less than or equal to N;

16. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of code analysis of any of claims 1-7.