CN110928546A

CN110928546A - Method, apparatus, electronic device, medium, and program for determining existence of dependency violations

Info

Publication number: CN110928546A
Application number: CN201811101865.1A
Authority: CN
Inventors: 韩克�; 高亮
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2020-03-27
Also published as: WO2020058120A1

Abstract

The invention relates to a method, an apparatus, an electronic device, a medium, and a program for determining existence of a dependency violation. The method comprises the following steps: inputting the architectural design document for developing the source code into an entity extraction model to extract a word vector for each word included in the architectural design document, the word including a word of the entity and a word of the non-entity; converting each sentence into sequence data expressed by word vectors of the words according to the sequence of the words in each sentence of the architecture design document, and respectively inputting the converted sequence data of the sentences into a relationship extraction model to extract the relationship among all entities in the architecture design document; generating a dependent design rule representing entities included in the architecture design document and relationships between the entities based on the extracted relationships between the entities; converting source code developed based on an architecture design document into a first dependency tree; and comparing the first dependency tree to the dependency design rule to determine whether a dependency violation exists in the source code.

Description

Method, apparatus, electronic device, medium, and program for determining existence of dependency violations

Technical Field

The present invention relates generally to the field of software engineering, and more particularly, to a method, apparatus, electronic device, computer-readable medium, and program for determining the presence of dependency violations in source code.

Background

Software architecture design documents typically employ natural language to define dependencies between modules, components, layers, classes, methods, subsystems, and the like. However, these dependencies may have changed in the software code developed based on the software architecture design document.

It is desirable to provide a method that can check whether a dependency violation exists for software code developed based on a software architecture design document as compared to the software architecture design document.

Disclosure of Invention

The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to determine the key or critical elements of the present invention, nor is it intended to limit the scope of the present invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

According to one aspect of the invention, a method of determining the presence of a dependency violation in source code, comprises: inputting an architecture design document for developing source code into a pre-stored entity extraction model to extract a word vector of each word included in the architecture design document, wherein the words comprise a word of an entity and a word of a non-entity; converting each sentence into sequence data expressed by word vectors of the words according to the sequence of the words in each sentence of the architecture design document, and respectively inputting the converted sequence data of each sentence into a pre-stored relationship extraction model to extract the relationship among all entities in the architecture design document; generating a dependent design rule representing entities included in the architecture design document and relationships between the entities based on the extracted relationships between the entities; converting source code developed based on an architecture design document into a first dependency tree; and comparing the first dependency tree to the dependency design rule to determine whether a dependency violation exists in the source code.

In this manner, by comparing the dependency tree based on the source code translation with the dependency design rules generated based on the architectural design document, it may be determined whether there is an inconsistency in the entities and relationships between the entities implemented by the source code as compared to the entities and relationships between the entities defined in the architectural design document, i.e., whether there is a dependency violation.

Preferably, in one example of the above aspect, after generating the dependent design rules representing the entities and the relationships between the entities contained in the architectural design document, the method further comprises: converting the dependent design rule into a second dependency tree, wherein comparing the first dependency tree to the dependent design rule comprises: the first dependency tree is compared to the second dependency tree.

In such a way, the dependency design rule is converted into the second dependency tree, and then the first dependency tree is compared with the second dependency tree, so that the method is more visual and convenient.

Preferably, in one example of the above aspect, the dependency design rules are formatted as a triple of (entity 1, entity 2, relationship between entity 1 and entity 2).

In this manner, the dependent design rules in the architectural design document may be represented in a more clear form.

Preferably, in one example of the above aspect, converting the source code developed based on the architectural design document into the first dependency tree comprises: and scanning source codes by adopting a static code analysis tool to obtain a dependency structure matrix, and constructing a dependency tree representing the dependency relationship among the entities included in the architecture design document based on the dependency structure matrix. An entity may include, among other things, a module, component, layer, class, method, subsystem, etc.

In this manner, source code developed based on the architectural design document may be translated into a dependency tree to facilitate comparison.

Preferably, in one example of the above aspect, scanning the source code comprises: scanning source code stored in a file system or scanning source code stored in a version control system.

In this manner, source code stored in different file systems may be scanned, and different versions of source code may be scanned as needed.

Preferably, in an example of the above aspect, the entity extraction model and the relationship extraction model are obtained by training a plurality of architecture design documents labeled with tags in advance by using a neural network as a training data set.

In this manner, the neural network may be trained to derive an entity extraction model and a relationship extraction model.

According to another aspect of the invention, there is provided an apparatus for determining the presence of dependency violations in source code, comprising: an entity extraction unit configured to input an architectural design document for developing source code into a pre-stored entity extraction model to extract a word vector for each word included in the architectural design document, the word including words representing entities and words representing non-entities; a relationship extraction unit configured to convert each sentence into sequence data expressed by word vectors of words in an order of the words in each sentence of the architecture design document, and input the converted sequence data of each sentence into a relationship extraction model stored in advance, respectively, to extract a relationship between entities included in the architecture design document; a dependent design rule generating unit configured to generate a dependent design rule representing entities included in the architecture design document and relationships between the entities based on the extracted relationships between the entities; a first dependency tree transformation unit configured to transform source code developed based on the architecture design document into a first dependency tree; and a dependency violation determination unit configured to compare the first dependency tree to the dependency design rule to determine whether a dependency violation exists in the source code.

Preferably, in one example of the above aspect, the apparatus further comprises: a second dependency tree conversion unit configured to convert the dependency design rule into a second dependency tree, wherein the dependency violation determination unit is further configured to compare the first dependency tree with the second dependency tree.

Preferably, in one example of the above aspect, the dependent design rule generating unit is further configured to: the dependency design rules are formatted as triples of (entity 1, entity 2, relationship between entity 1 and entity 2).

Preferably, in one example of the above aspect, the first dependency tree conversion unit is further configured to: and scanning source codes by adopting a static code analysis tool to obtain a dependency structure matrix, and constructing a dependency tree representing the dependency relationship among the entities included in the architecture design document based on the dependency structure matrix. An entity may include, among other things, a module, component, layer, class, method, subsystem, etc.

Preferably, in one example of the above aspect, the first dependency tree conversion unit is further configured to scan source code stored in the file system or scan source code stored in the version control system.

Preferably, in an example of the above aspect, the entity extraction model and the relationship extraction model are obtained by pre-training a plurality of architecture design documents labeled with tags as a training data set by using a neural network.

According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor; and a memory coupled with the at least one processor, the memory having instructions stored therein that, when executed by the at least one processor, cause the electronic device to perform a method for determining that a dependency violation exists in the source code as described above.

According to another aspect of the invention, there is provided a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause a machine to perform a method of adjusting an operating parameter of a power consuming device as described above.

A computer program comprising computer executable instructions which, when executed, cause at least one processor to perform the method as described above.

Drawings

A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same or similar reference numerals.

FIG. 1 is a flow diagram of a method 100 for determining the presence of dependency violations in source code according to one embodiment of the invention;

FIG. 2 is a schematic diagram of a neural network architecture for training an entity extraction model;

FIG. 3 is a schematic diagram of a neural network structure for training a relationship extraction model;

fig. 4 shows a block diagram of an exemplary configuration of an apparatus 500 for adjusting operating parameters of a power consuming device according to another embodiment of the invention;

FIG. 5 illustrates a block diagram of an apparatus 500 for determining the presence of dependency violations in source code, according to one embodiment of the invention;

FIG. 6 depicts a block diagram of an exemplary configuration of an apparatus 600 for determining the presence of a dependency violation in source code, according to another embodiment of the invention; and

FIG. 7 illustrates a block diagram of an electronic device 700 for determining the presence of a dependency violation in source code in accordance with one embodiment of the present invention.

Reference numerals

100. 400: method for determining the presence of dependency violations in source code

S102, S104, S106, S108, S110, S402, S404, S406, S407, S408, S410: step (ii) of

202: input layer

204: hidden layer W (V X N dimension)

206: output layer W' (N X V dimension)

302: input layer

304: convolutional layer

306: maximum pooling layer

308: full connection layer

310. 312: entity word vector

500. 600: apparatus for determining the presence of dependency violations in source code

502. 602: entity extraction unit

504. 604: relationship extraction unit

506. 606: dependent design rule generating unit

508. 608: first dependency tree conversion unit

510. 610: dependency violation determination unit

607: second dependency tree conversion unit

700: electronic device

702: processor with a memory having a plurality of memory cells

704: memory device

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

The invention provides a solution for determining whether dependency violations exist in source code developed based on an architectural design document compared with the architectural design document based on the architectural design document. The method comprises the steps of firstly extracting dependent design rules among modules, components, layers, classes, methods, subsystems and the like from the architecture design document, then generating a dependency tree of software codes by scanning the software codes developed according to the architecture design document, and then determining whether dependency violation exists in the developed source codes compared with the architecture design document or not by mapping and comparing the dependency tree with the dependent design rules. As will be understood by those skilled in the art, a dependency violation as described herein refers to: and respectively comparing the entities in the source code developed based on the architecture design document and the relationships between the entities with the entities defined in the architecture design document and the relationships between the entities one by one, wherein if inconsistent places exist, the dependence violation exists, and otherwise, the dependence violation does not exist.

Before describing the method according to an exemplary embodiment of the present invention, two definitions of related terms are given.

Natural Language Processing (NLP): natural language processing is a term in the field of computer science and artificial intelligence, and relates to the interaction between a computer and human (natural) language, and in particular how the computer is programmed to process and analyze large amounts of natural language data in order to recognize, understand and generate natural language.

Version Control System (VCS): version control systems typically operate as stand-alone applications, but version control is also embedded in various types of software, such as word processors and spreadsheet programs, collaborative Web documents, and various content management systems (e.g., the page history of wikipedia). Versioning is able to convert a document to a previous version, which is important to allow editors to track each other's edits, modification errors, and prevent corruption and spam.

A method and apparatus for determining whether dependency violations exist for source code and architectural design documents according to embodiments of the present invention will now be described with reference to the accompanying drawings.

FIG. 1 is a flow diagram of a method 100 for determining the presence of dependency violations in source code according to one embodiment of the invention.

As shown in FIG. 1, in block S102, an architectural design document for developing source code is input into a pre-stored entity extraction model to extract a word vector for each word included in the architectural design document, the word including both solid words and non-solid words.

Wherein, a unknown word vector of unknown category can be set, and all unknown words whose category can not be determined by the entity extraction model all correspond to the unknown word vector.

In block S104, each sentence is converted into sequence data represented by word vectors of words in an order of words in each sentence of the architectural design document, and the converted sequence data of each sentence is input into a pre-stored relationship extraction model, respectively, to extract relationships between all entities in the architectural design document.

The entity extraction model and the relationship extraction model used in the method according to an embodiment of the present invention may be obtained by using neural network training, and the following describes the entity extraction model training method according to an embodiment of the present invention.

First, tags are defined separately for entity classes (e.g., modules, components, layers, classes, methods, subsystems, etc.) and relationships (including, called, unrelated, etc.) between entities in several architectural design documents. Each word in the architectural design document is labeled, the word includes a word representing an entity and a word representing a non-entity (i.e., all words except the entity word), the name of each word is labeled for non-entity words, labels representing entity categories, such as modules, components, layers, classes, methods, subsystems, etc., are labeled for entity words, and each sentence is labeled with labels representing relationships, such as containment, invocation, irrelevance, etc., between included entities. And taking the marked words and sentences as a training data set to train the entity extraction model.

Specifically, the architectural design document is divided into a sentence set and a word set, the sentence set including all sentences in the document, and the word set including all words occurring in the document. For each word, it includes information of the tags of that word, and also information of the tags containing the context of the sentence of that word.

The words are then converted into vectors using a one-hot (one-hot) encoding method. The dimension of the vector of one-hot coded words is denoted V x 1, where V is the number of all words present in the training set.

After generating the input of the entity extraction model, the input is converted into word embedding (distributed representation of the words), i.e. a number vector representing the words. The meaning of a word is inferred from the context of the word (C represents the number of words before and after the target word), and therefore it is assumed here that words having similar contexts should have similar meanings.

Preferably, the CBOW (continuous bag of words) algorithm of Word2Vec can be adopted as a training algorithm to train the entity extraction model. FIG. 2 is a schematic diagram of a neural network architecture for training an entity extraction model.

In fig. 2, at the input layer 202, the word set is traversed, and for each word (i.e., target word) in the word set, a one-hot coded vector of its C context words is used as an input, i.e., the dimension of the input is C × V. The total number of inputs is the total number of sentences that include the target word.

At the hidden layer 204, a matrix of C N is obtained by multiplying the one-hot coded vectors of context words by a weight matrix W with a dimension V N. Here, N is a dimension of the customized word embedding, and may be set as needed. The size of N represents the complexity of the word vector. The sum of all word vectors of the context transformed by the weight matrix W results in a vector of 1 x N.

At the output layer 206, the result from the hidden layer is multiplied by another weight matrix W' with dimension N × V to get a vector of 1 × V, which is the one-hot coded vector of the target word. Preferably, the predictive result vector may be generated as a representation of the target word using Hierarchical Softmax. In this vector, the largest element represents the probability that the target word may belong to an entity.

In the back propagation process, a loss function (loss function) is calculated based on the comparison between the predicted vector and the real label of the word, then the weights W and W' are updated by using a gradient descent algorithm based on the loss of the predicted vector, and the training process is repeated until the loss function is converged, so that the entity extraction model is obtained through training.

The specific process of training to obtain the entity extraction model can be understood by those skilled in the art through the contents of the above description. In addition, it can be understood by those skilled in the art that, in the method of the present invention, an entity extraction model for converting entity words in the architecture design document into word vectors may also be stored in advance, and the entity extraction model is not limited to be generated by the training method described above.

The following describes a method for training a relationship extraction model according to an embodiment of the present invention.

In the method of the present invention, preferably, a CNN (convolutional neural network) may be employed to train the relationship extraction model. FIG. 3 shows a schematic diagram of a neural network structure for training a relationship extraction model.

At the input layer 302, for each sentence in the architectural design document, each word it includes is represented by a word vector extracted through the entity extraction model, such that the sentence can be represented as a matrix, where each row represents a word and the columns are dimensions of the word vector. By representing the length of the sentence (i.e., the number of words included) as M and the number of dimensions of each word as N, a sentence can be represented as a matrix of M × N, with the matrix of M × N as input.

Reference numerals

310 and 312 in fig. 3 schematically represent word vectors for two entities.

At convolutional layer 304, the input matrix is scanned with a convolution kernel (convolution kernel), and the sequence information of the sequence is integrated by the convolution kernel.

The 1-Max pooling Layer is then used to aggregate the signature generated by each convolution kernel into a number at Max-pooling Layer (Max-pooling Layer) 306.

Finally, at the fully-connected layer 308, the feature vectors are converted into classes, which can preferably be processed with a softmax function.

In the finally obtained feature vector, the index of the maximum element is the category of the relation of the sentence.

In the back propagation process, a loss function is calculated based on a comparison of the predicted probability vector with the true tags of the sentence. Based on the predicted loss, the weights can be updated by using a gradient descent algorithm, and the training process is repeated until the loss function converges, so that the relationship extraction model is obtained through training.

The specific process of training to obtain the relationship extraction model can be understood by those skilled in the art through the contents described above. Furthermore, it will be understood by those skilled in the art that, in the method of the present invention, a relationship extraction model for extracting the relationship between entities in the architecture design document may also be stored in advance, and the relationship extraction model is not limited to being generated by the training method described above.

As described above, with the entity extraction model and the relationship extraction model, entities included in the architecture design document and relationships between the entities can be extracted. Next, in block 106, dependent design rules representing the entities included in the architectural design document and relationships between the entities may be generated based on the extracted relationships between the entities.

In one example, the dependency design rules may be formatted as (entity 1, entity 2, relationship between entity 1 and entity 2) triples.

In block S108, source code developed based on the architectural design document is converted into a first dependency tree.

An exemplary dependency tree based on source code translation is shown below, and one skilled in the art can see the dependency relationship between entities in the source code by looking at the dependency tree.

Specifically, converting the source code developed based on the architectural design document into the first dependency tree may include: and scanning source codes by adopting a static code analysis tool to obtain a dependency structure matrix, and constructing a dependency tree representing the dependency relationship among the entities included in the architecture design document based on the dependency structure matrix.

In a method according to an embodiment of the present invention, in addition to scanning source code stored in a file system, source code stored in a Version Control System (VCS) may also be scanned.

Those skilled in the art will appreciate that the specific process of scanning source code and converting it into a dependency tree using a static code analysis tool, such as LATTIX, will not be described in detail herein.

Finally, in block S110, the first dependency tree is compared to the dependency design rules to determine whether a dependency violation exists in the source code.

Specifically, the entities in the first dependency tree converted by the source code and the relationships between the entities may be compared with the entities in the dependency design rule generated based on the architecture design document and the relationships between the entities one by one, and if there is an inconsistent place, it is considered that a dependency violation exists, otherwise, it is considered that no dependency violation exists.

FIG. 4 is a flow diagram illustrating a method 400 for determining the presence of a dependency violation in source code according to another embodiment of the invention.

In block S402 of fig. 4, inputting an architectural design document for developing source code into a pre-stored entity extraction model to extract a word vector for each word included in the architectural design document, the word including a word of an entity and a word of a non-entity; in block S404, converting each sentence into sequence data represented by word vectors of words according to the order of the words in each sentence of the architecture design document, and inputting the converted sequence data of each sentence into a pre-stored relationship extraction model to extract the relationship between entities included in the architecture design document; in block S406, generating a dependent design rule representing the entities included in the architecture design document and the relationships between the entities based on the extracted relationships between the entities; in block S408, source code developed based on the architectural design document is converted into a first dependency tree.

It can be seen that the processing in blocks S402, S404, S406, and S408 in the method 400 in fig. 4 is similar to the processing in blocks S102, S104, S106, and S108 in fig. 1, and is not described again here.

After the process of S406, in block S407, the dependency design rule obtained in step S406 is converted into a second dependency tree.

Those skilled in the art will understand the specific operation of converting the dependency design rule into the second dependency tree, and will not be described in detail herein.

In block S410, the first dependency tree generated in block S408 is compared to the second dependency tree generated in block S407 to determine whether a dependency violation exists in the source code.

In particular, the entities and relationships between the entities in the first dependency tree may be compared one-to-one with the entities and relationships between the entities in the second dependency tree to determine whether a dependency violation exists.

In this embodiment, comparing the first dependency tree generated based on the source code to the second dependency tree generated based on the architecture design document may more intuitively determine whether a dependency violation exists in the source code.

FIG. 5 illustrates a block diagram of an apparatus 500 for determining the presence of dependency violations in source code according to one embodiment of the invention. As shown in fig. 5, the apparatus 500 for determining that a dependency violation exists in source code includes an entity extraction unit 502, a relationship extraction unit 504, a dependency design rule generation unit 506, a first dependency tree transformation unit 508, and a dependency violation determination unit 510.

Wherein the entity extraction unit 502 is configured to input an architectural design document for developing source code into a pre-stored entity extraction model to extract a word vector for each word included in the architectural design document, the word including words representing entities and words representing non-entities.

The relationship extraction unit 504 is configured to extract the relationship between entities included in the architecture design document by converting each sentence into sequence data expressed by word vectors of words in the order of the words in each sentence of the architecture design document, and inputting the converted sequence data of each sentence into a relationship extraction model stored in advance, respectively.

The dependent design rule generation unit 506 is configured to generate a dependent design rule representing entities included in the architecture design document and relationships between the entities based on the extracted relationships between the entities.

The first dependency tree conversion unit 508 is configured to convert source code developed based on the architecture design document into a first dependency tree.

The dependency violation determination unit 510 is configured to compare the first dependency tree to the dependency design rules to determine whether a dependency violation exists in the source code.

Wherein the dependent design rule generating unit 506 is further configured to: the dependency design rules are formatted as triples of (entity 1, entity 2, relationship between entity 1 and entity 2).

Wherein the first dependency tree conversion unit 508 is further configured to: and scanning source codes by adopting a static code analysis tool to obtain a dependency structure matrix, and constructing a dependency tree representing the dependency relationship among the entities included in the architecture design document based on the dependency structure matrix.

Wherein the first dependency tree conversion unit 508 is further configured to scan source code stored in the file system or scan source code stored in the version control system.

The entity extraction model and the relationship extraction model are obtained by taking a plurality of architecture design documents marked with labels as training data sets and training the architecture design documents by using a neural network.

FIG. 6 illustrates a block diagram of an exemplary configuration of an apparatus 600 for determining the presence of a dependency violation in source code, according to another embodiment of the invention.

In the example shown in fig. 6, the apparatus 600 includes an entity extracting unit 602, a relationship extracting unit 604, a dependency design rule generating unit 606, a second dependency tree converting unit 607, a first dependency tree converting unit 608, and a dependency violation determining unit 610.

The configuration of the entity extracting unit 602, the relationship extracting unit 604, the dependency design rule generating unit 606, and the first dependency tree converting unit 608 included in the apparatus 600 is similar to the configuration of the entity extracting unit 502, the relationship extracting unit 504, the dependency design rule generating unit 506, and the first dependency tree converting unit 508 included in the apparatus 500 shown in fig. 5, and are not described herein again.

In the apparatus 600 for determining the existence of a dependency violation in source code shown in fig. 6, a second dependency tree conversion unit 607 is further included, and is configured to: the dependent design rule generated in the dependent design rule generation unit 606 is converted into a second dependency tree.

Wherein the dependency violation determination unit 610 is configured to compare the first dependency tree transformed by the first dependency tree transformation unit 608 with the second dependency tree transformed by the second dependency tree transformation unit 607 to determine whether a dependency violation exists in the source code.

The details of the operation and function of the various portions of the

apparatus

500 and 600 for determining the presence of dependency violations in source code may be, for example, the same as or similar to the relevant portions of the method for determining the presence of dependency violations in source code according to embodiments of the present invention described above in connection with fig. 1-4, and will not be described in detail herein.

It should be noted that the structures of the

apparatuses

500 and 600 and their constituent units for determining the existence of a dependency violation in source code shown in fig. 5-6 are merely exemplary, and those skilled in the art may modify the structural block diagrams shown in fig. 5-6 as needed.

Embodiments of methods and apparatus for determining the presence of dependency violations in source code according to the present application are described above with reference to FIGS. 1-6. The above-described means for determining the presence of a dependency violation in source code may be implemented in hardware, software, or a combination of hardware and software.

In the present application, the

apparatus

500 and 600 for determining the presence of dependency violations in source code may be implemented using an electronic device. FIG. 7 illustrates a block diagram of an electronic device 700 for determining the presence of a dependency violation in source code in accordance with one embodiment of the present invention. According to one embodiment, the electronic device 700 may include at least one processor 702, the processor 702 executing at least one computer-readable instruction (i.e., an element described above as being implemented in software) stored or encoded in a computer-readable storage medium (i.e., memory 704).

In one embodiment, computer-executable instructions are stored in the memory 704 that, when executed, cause the at least one processor 702 to: inputting an architecture design document for developing source code into a pre-stored entity extraction model to extract a word vector of each word included in the architecture design document, wherein the words comprise a word of an entity and a word of a non-entity; converting each sentence into sequence data expressed by word vectors of the words according to the sequence of the words in each sentence of the architecture design document, and respectively inputting the converted sequence data of each sentence into a pre-stored relationship extraction model to extract the relationship between entities included in the architecture design document; generating a dependent design rule representing entities included in the architecture design document and relationships between the entities based on the extracted relationships between the entities; converting source code developed based on an architecture design document into a first dependency tree; and comparing the first dependency tree to the dependency design rule to determine whether a dependency violation exists in the source code.

It should be appreciated that the computer-executable instructions stored in the memory 704, when executed, cause the at least one processor 702 to perform the various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present invention.

According to one embodiment, a program product, such as a non-transitory machine-readable medium, is provided. A non-transitory machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-4 in various embodiments of the present application.

According to one embodiment, there is provided a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present application.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining the presence of dependency violations in source code, comprising:

inputting an architecture design document for developing source code into a pre-stored entity extraction model to extract a word vector of each word included in the architecture design document, wherein the words include a word of an entity and a word of a non-entity;

converting each sentence into sequence data expressed by word vectors of words according to the sequence of the words in each sentence of the architecture design document, and respectively inputting the converted sequence data of each sentence into a pre-stored relationship extraction model to extract the relationship among all entities in the architecture design document;

generating a dependent design rule representing entities included in the architectural design document and relationships between the entities based on the extracted relationships between the entities;

converting source code developed based on the architectural design document into a first dependency tree; and

comparing the first dependency tree to the dependency design rules to determine whether a dependency violation exists in the source code.

2. The method of claim 1, wherein,

after generating the dependent design rules representing the entities and relationships between the entities contained in the architectural design document, the method further comprises: converting the dependency design rule into a second dependency tree,

comparing the first dependency tree to the dependency design rule includes: comparing the first dependency tree to the second dependency tree.

3. The method of claim 1, wherein the dependency design rules are formatted in a triplet form that includes entity 1, entity 2, a relationship between entity 1 and entity 2.

4. The method of any of claims 1 to 3, wherein converting source code developed based on the architectural design document into a first dependency tree comprises:

and scanning the source code by adopting a static code analysis tool to obtain a dependency structure matrix, and constructing a dependency tree representing the dependency relationship among the entities included in the architecture design document based on the dependency structure matrix.

5. The method of claim 4, wherein scanning the source code comprises: scanning source code stored in a file system or scanning source code stored in a version control system.

6. The method of any one of claims 1 to 5, wherein the entity extraction model and the relationship extraction model are pre-trained using a neural network with a plurality of labeled architectural design documents as training data sets.

7. An apparatus (500, 600) for determining the presence of dependency violations in source code, comprising:

an entity extraction unit (502, 602) configured to input an architectural design document for developing source code into a pre-stored entity extraction model to extract a word vector for each word included in the architectural design document, the word including words representing entities and words representing non-entities;

a relation extraction unit (504, 604) configured to convert each sentence of the architectural design document into sequence data expressed by word vectors of words in an order of the words in the sentence, and input the converted sequence data of each sentence into a pre-stored relation extraction model, respectively, to extract relations between all entities in the architectural design document;

a dependent design rule generating unit (506, 606) configured to generate a dependent design rule representing entities included in the architectural design document and relationships between the entities based on the extracted relationships between the entities;

a first dependency tree transformation unit (508, 608) configured to transform source code developed based on the architectural design document into a first dependency tree; and

a dependency violation determination unit (510, 610) configured to compare the first dependency tree to the dependency design rule to determine whether a dependency violation exists in the source code.

8. Electronic device (700), comprising:

at least one processor (702); and

a memory (704) coupled with the at least one processor (702), the memory (704) having instructions stored therein, which when executed by the at least one processor (702) cause the electronic device (700) to perform the method of any of claims 1-6.

9. A non-transitory machine-readable medium having stored thereon computer-executable instructions that, when executed, cause at least one processor to perform the method of any of claims 1-6.

10. A computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of any one of claims 1 to 6.