CN116483314A

CN116483314A - Automatic intelligent activity diagram generation method

Info

Publication number: CN116483314A
Application number: CN202310018314.3A
Authority: CN
Inventors: 许斌; 俞文军; 崔秋兰; 殷史弘; 亓晋; 孙雁飞
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-07-25

Abstract

The invention discloses an automatic intelligent activity diagram generation method, which comprises the following steps: step 1, normalizing a pre-acquired demand text by using grammar rules and a neural network technology to obtain a normalized demand text; step 2, processing the normalized demand text by using an AT-SRnn model obtained by pre-training and a dependency syntax analysis method to obtain the dependency relationship between the active object and the demand text words; step 3, extracting UML elements from the required text by utilizing the dependency relationship among the required text words and the activity diagram element identification rule; and 4, generating an active diagram by using the UML element and the active object. The invention provides a method for replacing synonyms of a model named GASR, which improves the accuracy of generating an activity diagram and introduces an AT-SRnn model to identify an activity object; according to the method, the device and the system, the demand text is normalized, and then the activity diagram is generated according to the processed demand text, so that the accuracy of activity diagram generation can be improved.

Description

Automatic intelligent activity diagram generation method

Technical Field

The invention relates to an automatic intelligent activity diagram generation method, and belongs to the technical field of natural language processing.

Background

The process of generating Unified Modeling Language (UML) charts from natural language requirements is considered a complex and challenging task. Natural language requirements are often relatively ambiguous, which can lead to difficulties in interpreting and understanding the requirements. In software development, the software requirement specification is written in natural language. However, the process of analyzing the software requirements is a critical task, and takes a lot of time and effort. Therefore, natural language processing techniques must be used as an aid in analyzing and processing natural language requirements.

The current advanced method in the field of automatic generation of active graphs is to normalize texts by using a rule-based method, then process the texts by using the existing tool such as a Steady parser, and then apply rules to identify active graph elements to finally generate active graphs. However, the method does not consider the replacement of synonyms for the text, and does not consider the problem that the data set in the software requirement field is small, so that the trained related model is over-fitted.

1. Constructing Activity Diagrams from Arabic User Requirements using Natural Language Processing Tool proposes a method for building an activity map from Arabic user requirements using a MADA+TOKAN parser. The paper simply writes restrictions on the original demand text and then analyzes and processes the demand text to generate the activity map, which reduces the accuracy of the final activity map generation.

2. Static UML Model Generator from Analysis of Requirements (SUGAR) proposes a tool named "static UML model builder from demand analysis" (SUGAR), which generates use case models and class models from natural language demands. Although the paper performs text normalization processing on the initial required text, the paper does not consider the replacement of synonyms, and the accuracy of the generation of the activity diagram is reduced to a certain extent.

3. Generating UML Use Case and Activity Diagrams Using NLP Techniques and Heuristics Rules proposes a method based on NLP technology and heuristic rules to generate usage and activity graphs from informal demand text. The paper uses the Stanford CoreNLP tool to analyze and process the required text, however the model of the toolkit is trained using a common dataset, and doing some of the text processing work directly in the software requirements field results in lower accuracy.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides an automatic intelligent activity diagram generation method, relates to natural language processing, deep learning, neural network, activity diagram generation and other technologies, in particular to a text normalization method combining rules and a GASR model, and solves the problem that the relation between text contexts cannot be considered in the rule-based method, and substitution errors can be caused when synonym substitution is carried out. The method has important significance for improving the accuracy of the generation of the activity diagram.

In order to achieve the above object, the present invention provides an automated intelligent activity diagram generating method, comprising:

step 1, normalizing a pre-acquired demand text by using grammar rules and a neural network technology to obtain a normalized demand text;

step 2, processing the normalized demand text by using an AT-SRnn model obtained by pre-training and a dependency syntax analysis method to obtain the dependency relationship between the active object and the demand text words;

step 3, extracting UML elements from the required text by utilizing the dependency relationship among the required text words and the activity diagram element identification rule;

and 4, generating an active diagram by using the UML element and the active object.

Preferentially, step 1, normalize a demand text acquired in advance by using grammar rules and a neural network technology to obtain a normalized demand text, and the method is realized by the following steps:

step 1.1, modifying a pre-acquired demand text by using grammar rules to obtain a modified demand text;

and 1.2, performing synonym replacement on the modified demand text by using a neural network technology to obtain a normalized demand text.

Preferentially, step 1.1, modify the pre-acquired demand text by using grammar rules, and obtain the modified demand text, which is implemented by the following steps:

Deleting sentences without verbs in the pre-acquired demand text;

deleting modifier words and structural auxiliary words in the sentences of the pre-acquired required text;

replacing the object pronouns in the pre-acquired demand text by using object names;

converting passive sentences in the pre-acquired demand text into active sentences;

the method comprises the steps of (1) regarding 'sum' in a pre-acquired demand text and 'taking'; "regarded as". ", will": "replace with" yes ";

splitting sentences connected by conjunctions representing parallel relations in the pre-acquired demand text to obtain a plurality of sentences, and splitting the sentences into a plurality of sentences if one sentence only has one subject and two or more moving guest phrases to obtain the modified demand text.

Preferably, the synonym replacement is carried out on the modified demand text by utilizing the neural network technology, the normalized demand text is obtained, and the method is realized by the following steps:

identifying all words needing synonymous replacement in the modified demand text through a rule base to obtain a text sequence containing the words needing synonymous replacement;

acquiring the start and stop positions of all words needing synonymous replacement in the modified demand text;

And inputting the text sequence containing the required synonym replacement words into a GASR model to perform synonym replacement, and obtaining the normalized required text.

Preferably, the GASR model comprises an embedded layer, a bidirectional gate cycle unit BGRU, a gate cycle unit GRU, a local multi-head attention mechanism module, a linear block and a classifier Softmax;

at the encoder end, the embedding layer adopts a word embedding method to extract and obtain basic characteristics of the text sequence from the text sequence;

the bidirectional gate cycle unit BGRU extracts and obtains the context related characteristics of the text sequence from the text sequence containing the synonym replacement words;

at the decoder end, the embedding layer adopts a word embedding method to extract basic characteristics of a text sequence, and first characteristics are obtained;

the gate control circulating unit GRU extracts a second characteristic from the first characteristic;

sending the context related feature and the second feature of the text sequence to a local multi-head attention mechanism module to calculate a local attention feature, and obtaining a third feature;

carrying out feature fusion on the third feature and the context related feature of the text sequence by utilizing a linear block to obtain a fusion feature result; inputting the fusion characteristic result into a classifier Softmax, and outputting normalized words by the classifier;

And inserting the normalized words into corresponding positions in the modified demand text according to the start-stop positions of all the words needing synonymous replacement in the modified demand text, so as to obtain the normalized demand text.

Preferentially, step 2, processing the normalized demand text by using an AT-SRnn model and dependency syntax analysis method obtained by pre-training to obtain the dependency relationship between the active object and the demand text words, and the method is realized by the following steps: step 2.1, firstly, dividing the normalized demand text to obtain a plurality of complete sentences;

step 2.2, decomposing the complete sentences in the required text by taking words as units;

step 2.3, adding corresponding labels to words in the complete sentences to obtain required texts after part-of-speech tagging;

step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model obtained by pre-training, and extracting to obtain a movable object; and 2.5, identifying the dependency relationship between the active object and other words in the demand text according to the extracted active object by using a dependency syntactic analysis method, and obtaining the demand text after syntactic analysis.

Preferably, step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model obtained by pre-training, extracting and obtaining a movable object, and realizing the method by the following steps:

The AT-SRnn model comprises an embedding layer, a slice cyclic neural network SRnn, an Attention layer Attention and a CRF layer, wherein the embedding layer, the slice cyclic neural network SRnn, the Attention layer Attention and the CRF layer are sequentially connected;

step 2.4.1, embedding sentences in the required text by utilizing an embedding layer to obtain sentence vectors;

step 2.4.2, dividing the sentence vector into a plurality of subsequences by using a slice cyclic neural network SRnn;

step 2.4.3, obtaining the context relation among a plurality of subsequences by using the attention layer, and giving different weights according to the importance of the characters;

and 2.4.4, finally calculating the condition possibility through the CRF layer, and selecting a label corresponding to the highest score from the scores of the labels corresponding to the words as an activity object.

Preferably, the AT-SRnn model is obtained through pre-training, and the method is realized through the following steps:

acquiring a data set related to the software requirement field, wherein the data set related to the software requirement field comprises a software requirement design analysis text, a software project, a UML model software requirement text and a movable object true value of an actual application project; inputting a data set related to the software requirement field into a constructed AT-SRnn model, and outputting a predicted value of an active object by the AT-SRnn model;

Disturbance is added in a data set related to the software demand field by adopting an countermeasure training method, a difference value between a predicted value of an active object and a true value of the active object is calculated iteratively by utilizing a cross entropy loss function, and the weight of the constructed AT-SRnn model is updated;

if the difference value is converged to a certain value, judging that the constructed AT-SRnn model is qualified, and outputting to obtain a final AT-SRnn model. Preferably, the activity map element identification rule includes an activity node, a decision node, an execution sequence/control flow, a start node, and an end node;

extracting 'if/if' adverbs from the adverb labels in the demand text after the sentence analysis in the decision node, mapping "if/provided/if" to a conditional particle;

sentences behind the conditional particles in the syntactically analyzed demand text are extracted and mapped into conditional answers.

Preferentially, step 4, generating an active graph by using UML elements and active objects, is implemented by the following steps:

step 4.1, creating lanes with the same number as the movable object labels, and labeling the movable objects above the lanes in sequence;

step 4.2, constructing a solid circle in any lane to obtain a starting node;

Step 4.3, traversing the demand text after syntactic analysis, judging whether a sentence where the movable object is located has a movable node or a decision node when the movable object is identified, if so, constructing a rounded rectangle in a lane where the movable object is located, and using a movable guest phrase for the name;

if the decision node exists, constructing a diamond in the lane where the movable object is located, and using a conditional sentence by the name;

the decision node sets two derived control flows which respectively represent a true branch and a false branch, and judges whether an active node or a decision node exists in the conditional answer;

step 4.4, continuously traversing the demand text after the syntactic analysis, judging whether unidentified movable objects exist or not, if yes, executing the step 4.3, and if not, entering the step 4.5;

and 4.5, each sentence in the parsed requirement text is processed. "or" is converted into a control stream;

and 4.6, constructing a concentric circle, obtaining an end node, and generating an active graph.

The invention has the beneficial effects that:

the invention provides a method for replacing synonyms of a model named GASR, which improves the accuracy of generating an activity diagram and introduces an AT-SRnn model to identify an activity object;

According to the method, the device and the system, the demand text is normalized, and then the activity diagram is generated according to the processed demand text, so that the accuracy of activity diagram generation can be improved.

In the invention, besides the method of applying rules to normalize the required text, the GASR model is used for replacing synonyms of the text, so that the accuracy of generating the active diagram is improved.

The invention provides a movable object recognition model named as AT-SRnn, which is fused with an antagonism training and slicing cyclic neural network to effectively improve generalization capability, robustness and training speed of the model.

Drawings

FIG. 1 is a block diagram of a GASR model in accordance with the invention;

FIG. 2 is a block diagram of an AT-SRnn model in accordance with the present invention;

FIG. 3 is a flow chart of the present invention;

FIG. 4 is a block diagram of an automated intelligent activity graph generation system of the present invention.

Detailed Description

The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Example 1

1. In the conventional activity diagram generation technology, synonym replacement is not considered, but the expression of the text required by software is relatively spoken, words are not uniform enough, a plurality of synonyms can occur, and if unprocessed text is directly used, the accuracy of activity diagram element identification is reduced. Aiming at the problems, firstly, carrying out synonym replacement operation on a required text, firstly, matching all words needing synonym replacement through a dictionary and a rule base, labeling the words needing synonym replacement, inputting the labeled text into a GASR model, identifying the words needing synonym replacement by the model according to the labels, and outputting the replaced words.

2. Aiming at the problem that the traditional synonym replacement model can not well extract effective characteristics, the invention provides a synonym replacement model named GASR, which fuses a bidirectional GRU and an attention mechanism and improves the accuracy of prediction.

3. Aiming AT the situation that a data set in the related software requirement field causes less overfitting of a related model, a movable object recognition model named AT-SRnn is provided, and the model is fused with an countermeasure training and slicing recurrent neural network, so that the generalization capability, robustness and training speed of the model can be effectively improved.

The invention provides an automatic intelligent activity diagram generation method and system aiming at the defects of the background technology. The method comprises the following specific steps:

step 1, normalizing a demand text by using grammar rules and a neural network technology to obtain a normalized demand text, wherein the specific steps are as follows:

modifying the demand text by using grammar rules to obtain a modified demand text;

and carrying out synonym replacement on the modified demand text by using a neural network technology to obtain the normalized demand text.

The demand text is modified by grammar rules, and the modified demand text is obtained by the following steps:

Step 1.1, firstly, normalizing a required text by using grammar rules, wherein the method comprises the following steps:

in order to reconstruct natural language text, converting complex requirements into simpler sentences, a set of grammar rules needs to be formulated. When a user writes a demand text, the demand text needs to be normalized by using the grammar rule set, so that the accuracy of information extraction from the demand text is improved, the time for analyzing the demand is saved, and the extraction of UML elements is promoted. Normalizing the required text according to grammar rules, wherein the grammar rules comprise:

(1) sentences without verbs in the demand text are deleted.

(2) The modifier words and structural aid words (typically "connected to," "ground," "get") in the sentence of the desired text are deleted.

(3) The object pronouns are replaced with object names.

(4) And converting the passive sentence in the demand text into an active sentence.

(5) Conjunctions in the demand text, such as 'seen' and 'representing parallel relations, will be'; "regarded as". ", will": "replace with" yes, i.e. divide it into several independent sentences. In the demand text. ? The%! Reference numerals, brackets, … …,

-, (in) and (in) the special names remain unchanged.

(6) Sentences connected by conjunctions such as ' sum ' and ' representing parallel relations are split. For example, sentences of the structures of "subject+verb1+object 1 and verb2+object 2" are written in the form of "subject+verb1+object 1" and "subject+verb2+object 2".

(7) If a sentence has only one subject and two or more moving guest phrases, the sentence is split into a plurality of sentences, namely the subject is respectively matched with the moving guest phrases.

And carrying out synonym replacement on the modified demand text by utilizing a neural network technology to obtain a normalized demand text, wherein the method is realized by the following steps:

step 1.2, recognizing all words needing synonymous replacement in the modified demand text through a rule base to obtain a text sequence containing the words needing synonymous replacement;

and step 1.3, inputting a text sequence containing the required synonym replacement into a GASR model to perform synonym replacement, and obtaining the normalized required text.

Step 1.3, realized by the following steps:

the GASR model comprises an embedded layer, a bidirectional gate cycle unit BGRU, a gate cycle unit GRU, a local multi-head attention mechanism module, a linear block and a classifier Softmax;

At the encoder end, the embedding layer adopts a word embedding method to extract and obtain basic characteristics of a text sequence from the text sequence containing the synonym replacement words;

sending the context related feature and the second feature of the text sequence to a local multi-head attention mechanism module (multi-head attention) for calculating local attention features to obtain a third feature;

performing feature fusion on the third feature and the context related feature by using a Linear block (Linear & CRF) to obtain a fusion feature result;

inputting the fusion characteristic result into a classifier Softmax, and outputting normalized words by the classifier;

Specifically, in step 1.2, the modified demand text is subjected to synonym replacement by using a GASR model.

Firstly, for an input sentence, matching all words needing synonymous replacement through a rule base, and taking the words needing synonymous replacement as upper labels, wherein the rule base is a regular expression set. For the input sequence, matching all possible words needing synonymous replacement through a rule base in the prior art, and returning the start and stop positions of the words to finally obtain a text sequence containing the words needing synonymous replacement. The text sequence containing the required synonym replacement is then input into the GASR model.

Step 1.3, inputting a text sequence containing a required synonym replacement word into a GASR model to perform synonym replacement, and obtaining a normalized required text, wherein the method is realized by the following steps:

the GASR model is shown in fig. 1, and the encoder extracts basic features of a source text sequence by using a word embedding method, and obtains context-related features of the text sequence through BGRU (bi-directional gate cycle unit).

At the decoder side, the basic features of the text sequence are extracted using word embedding. The word embedded output further extracts features through unidirectional GRU, then the GRU output and BGRU output obtained by the encoder are sent to a local multi-head attention mechanism module (Multihead Attention) to calculate local attention features, and a Linear block (Linear & CRF) fuses the multi-head attention mechanism module output with the GRU output. The fused result is sent to a classifier (Softmax) and normalized words are output. (for the encoder input is a sequence of characters, the decoder input is a sequence of word indices).

And 2, processing the normalized demand text by using an AT-SRnn model and a dependency syntax analysis method to obtain the dependency relationship between the movable object and the demand text words, wherein the method is realized by the following steps:

step 2.1, firstly, dividing the normalized demand text to obtain a plurality of complete sentences;

step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model, and extracting to obtain an active object;

and 2.5, identifying the dependency relationship between the active object and other words in the required text according to the extracted active object by using a dependency syntax analysis method.

Further, in this embodiment, the AT-SRnn model is obtained through pre-training, which is implemented through the following steps:

a data set related to the field of software requirements is obtained,

if the difference value is converged to a certain value, judging that the constructed AT-SRnn model is qualified, and outputting to obtain a final AT-SRnn model.

In particular, text is processed using natural language processing techniques. Firstly, dividing and marking a text into sentences, then dividing the sentences into data structures taking words as units, then, marking parts of speech, marking words in the text with labels (noun labels, verb labels, adjective labels, adverb labels, preposition labels, number labels and the like), identifying active objects (such as nouns which can be used as active objects by users, systems, background, administrators and the like) from key sentences by using an AT-SRnn model, and finally, identifying the dependency relationship between the active objects and other words in a required text by using a dependency syntax analysis method according to the extracted active objects.

Step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model, extracting and obtaining a movable object, and realizing the method by the following steps:

In order to solve the problem that the model is over-fitted due to fewer data sets in the software requirement field, a model named AT-SRnn (an active object recognition model based on an countermeasure training and slicing recurrent neural network) is proposed. The model mainly comprises an embedded layer, a slice cyclic neural network SRnn, an attention layer and a CRF layer. During training, disturbance is added into sentence vectors to construct countermeasure sample R by using a countermeasure training method, so that robustness and generalization capability of the model are improved.

An embedding layer:

the embedding layer performs embedding operation on the input sentence to convert the sentence into a vector representation, and obtains a sentence vector.

The antagonism training method comprises the following steps:

resistance training is a training way to improve the generalization ability and robustness of a model by adding noise. Challenge training typically incorporates challenge perturbations in the embedded layer to enhance the generalization ability of the model. Sentence vector is expressed as x= (x) ₁ ,x ₂ ,x ₃ ,…,x _i ,…,x _n ) An anti-disturbance r is added. The calculation formula is as follows:

wherein x is _i Word vector representing input samples, g representing x _i And L represents the classification loss function, m is a hyper-parameter controlling the disturbance intensity.

Calculating a challenge sample R:

R＝x+r _ap

slice recurrent neural network SRnn:

the addition of the slice cyclic neural network to the model increases the model training speed. Compared to RNNs, slice-loop neural networks (SRnn) have the ability to obtain advanced information through a multi-layer network and a small number of additional parameters, and an input sequence can be sliced into multiple sub-sequences to achieve parallel computation.

Attention layer:

the attention layer may obtain the context between characters, assign different weights according to the importance of the characters, i.e. assign a larger weight to a particular character playing an important role and assign a smaller weight to other useless characters.

CRF layer to calculate conditional likelihood:

the CRF layer is a model of the likelihood of the computation conditions that can learn the transfer rules between tags to make reasonable predictions. For input sequence x _Att ＝(x ₁ ,x ₂ ,x ₃ ,…,x _n ) And the corresponding tag sequence y= (y) ₁ ,y ₂ ,y ₃ ,…,y _n ) The calculation formula of the calculated score is as follows:

a represents the transfer score matrix and,representing the ith word mark y _i Is a probability of (2).

The AT-SRnn model selects the label corresponding to the highest score from the scores of the labels corresponding to the words as a final result, and finally outputs the keywords (active objects).

Step 3: extracting UML elements by a set of activity diagram element recognition rules, and recognizing the UML elements of the activity diagrams in the flow according to the dependency relationship among the words and the combination rules:

active node:

1. "verb tags" are identified in the parsed demand text and used as activities.

2. And using the motor guest phrase in the parsed demand text as an activity name.

3. The "active nodes" in the activity graph are represented using rounded rectangles.

Decision node:

1. if the sentence pattern structure of a sentence in the syntactic demand text is "if/if+noun" subject "+verb" predicate "+noun" object ";" if/if + noun "subject" + verb "predicate" "or" if/if + verb "predicate" + noun "object", the sentence is mapped to a decision node.

2. Extracting adverbs such as "if/if" from the 'adverb label' in the required text after the sentence analysis, and mapping the adverbs to one condition particle.

3. The "yes/no" in the parsed demand text is mapped as a conditional example.

4. And extracting sentences behind the 'conditional particles' in the syntactic analyzed requirement text, and mapping the sentences to one conditional sentence.

5. The conjunctions comprising then, just, then and then in the parsed demand text are extracted and mapped to answer particles.

6. Sentences behind the "conditional particles" in the parsed demand text are extracted and mapped to conditional answers.

7. If the next sentence in the parsed demand text does not have "if not/else", then the decision is a simple flow without any decision boxes and separations.

8. If the next sentence in the parsed demand text has "if not/else" then the arrow is used to split into two branches, a true branch and a false branch, respectively: the words after "then/on/then" denote a true branch, and the words after "if not/else" denote a false branch.

9. The "decision node" in the activity diagram is represented by a diamond, the decision node having one incoming control flow and at least two outgoing control flows.

Execution order/control flow:

1. and (5) ending each sentence in the syntactic analyzed requirement text. "or" is identified as a control flow.

2. The conjunctions of table-carried relations such as "then/then" and the like which exist in sentences in the demand text after the syntactic analysis are identified as control flows.

Start node: the starting node does not need to be extracted from the user requirements, is firstly constructed when the activity diagram is drawn, and has one and only one node connected with the next activity element, and is represented by a solid circle.

End node: the end node does not need to be extracted from the user's requirements, and must exist in connection with the last active element, represented by concentric circles.

And 4, generating an activity diagram by using the UML element. The operation flow is shown in fig. 3, and the specific steps are as follows:

and 4.1, creating lanes with the same number as the tags of the active objects according to the active objects identified by the AT-SRnn model, and marking nouns marked as the active objects above the lanes in sequence.

Step 4.2, constructing a filled circle in any lane, and defining the name as a starting node;

And 4.3, traversing the syntactic analyzed demand text by taking the 'active object' as an entry, judging whether a sentence where the 'active object' is located has a 'active node' or a 'decision node' label when the 'active object' is identified, if so, constructing a round rectangle in a lane where the 'active object' is located, and using a movable guest phrase as a name. If the decision node label exists, a diamond is constructed in the lane where the active object exists, and the name is a conditional sentence. For "decision node" it is necessary to have two derived control flows, representing a real branch and a false branch, respectively, while judging whether there is an "active node" or a "decision node" tag in the "conditional answer".

And 4.4, continuously scanning the syntactic analyzed demand text, and drawing the activity relationship in the activity diagram by using each 'activity object' as an entrance and using the same steps as the step 4.4.

And 4.5, ending each sentence in the parsed requirement text. "translates into one control flow.

And 4.6, constructing a concentric circle named as an end node, and obtaining an activity map.

The invention provides an automatic intelligent activity diagram generation system, which is shown in fig. 4, and comprises a text normalization module, a natural language processing module, an activity diagram element identification module and an activity diagram generation module.

Text normalization module: the module performs standardization operation on an input text, after writing normalization is performed on a required text by a rule application method, all words needing synonymous replacement are matched through a dictionary and a rule base, then a GASR model is used for synonymous replacement, and finally the replaced words are inserted into corresponding positions in the text to be used as output sentences of the module.

A natural language processing module: text is processed using natural language processing techniques. Firstly, dividing and marking a text into sentences, then dividing the sentences into data structures taking words as units, then, marking parts of speech, marking words in the text with labels (noun labels, verb labels, adjective labels, adverb labels, preposition labels, number labels and the like), identifying active objects (such as nouns which can be used as active objects by users, systems, background, administrators and the like) from key sentences by using an AT-SRnn model, and finally, identifying the dependency relationship between the active objects and other words in a required text by using a dependency syntax analysis method according to the extracted active objects.

The activity diagram element identification module: the module is used for extracting the starting node, the active node, the decision node, the control flow and the ending node of the active graph element.

The activity diagram generation module: the specific steps of the activity diagram generation module according to the result of the activity diagram element identification module are as follows:

step 1, traversing the demand text after syntactic analysis, identifying tags of 'active objects' in the text, creating the same number of lanes, and labeling nouns marked as 'active objects' above the lanes in sequence.

Step 2, constructing a filled circle in any lane, named as a 'start node'.

And 3, traversing the demand text after the syntactic analysis again.

And 4, taking the 'active object' as an entry, and judging whether a sentence in which the 'active object' exists has a 'active node' or a 'decision node' label when the 'active object' is identified. If the label of the 'active node' exists, a rounded rectangle is constructed in the lane where the 'active object' exists, and the name uses a moving guest phrase. If the decision node label exists, a diamond is constructed in the lane where the active object exists, and the name is a conditional sentence. For "decision node" it is necessary to have two derived control flows, representing a real branch and a false branch, respectively, while judging whether there is an "active node" or a "decision node" tag in the "conditional answer".

And 5, continuously scanning the syntactic analyzed demand text, and drawing the activity relationship in the activity diagram by using the same steps by taking each 'activity object' as an entry.

And 6, after the end of each sentence. "is converted into a control stream.

And 7, constructing a concentric circle, namely an end node.

Example two

And step 1, taking the required text as input, and normalizing the required text by applying rules and a neural network technology.

The text is normalized by aiming at the application rule and the neural network technology in the step 1, and the specific steps are as follows:

step 1.1, firstly, normalizing the text by a rule application method, which is specifically as follows

And reconstructing the natural language text, converting the complex requirements into simpler sentences, and making a set of grammar rules. When a user writes a demand text, the demand text needs to be normalized by using the set of rules, so that the accuracy of information extraction from the demand text is improved, the time for analyzing the demand is saved, and the extraction of UML elements is promoted. The required text is normalized according to rules, and the Chinese rules comprise:

(1) sentences without verbs are deleted.

(2) The modifier words and structural aid words in the sentence are deleted (typically in conjunction with "ground", "get").

(3) The object pronouns are replaced with object names.

(4) Turning the passive sentence into the active sentence.

(5) Treat the "as" and "in the demand text as conjunctions representing parallel relationships, will"; "regarded as". ",": "replace with" yes, i.e. divide it into several independent sentences. ". ? The%! The quotation marks, brackets, … …, -, < >, and the special names remain unchanged.

(6) Sentences connected by conjunctions such as ' sum ' and ' representing parallel relations are split. For example, sentences of "subject verb 1 object 1 and verb 2 object 2" are written as "subject verb 1 object 1". "subject verb 2 object 2".

And step 1.2, using a GASR model to make synonym replacement.

First, for an input sentence, all words requiring synonymous substitution are matched and labeled through a rule base, wherein the rule base is a regular expression set. And matching all possible words needing synonymous replacement with the input sequence through a rule base, and returning the start and stop positions of the words to be synonymous replacement, so as to finally obtain an identification sequence containing characters and needing synonymous replacement. The identification sequence containing the character, requiring a synonym replacement, is then input into the GASR model. The GASR model is shown in fig. 1, and the encoder extracts the basic features of the source text sequence by using a word embedding method, and obtains the context-related features of the text sequence through BGRU. At the decoder side, the basic features of the text sequence are extracted using word embedding. The output is transmitted through unidirectional GRU, then the GRU output obtained by the encoder and the initial text related information are sent to a local multi-head attention mechanism module (Multihead Attention) to calculate local attention characteristics, and a Linear block (Linear & CRF) fuses the local block output with the GRU output. The fused result is sent to a classifier (Softmax) and normalized words are output. (for the encoder input is a sequence of characters, the decoder input is a sequence of word indices).

And 1.3, finally, after synonymous replacement is carried out on the words by using a GASR model, inserting the replaced words into corresponding positions in the text, and taking the words as output sentences of the module.

And 2, processing the text by using a natural language processing technology. Firstly, dividing and marking a text into sentences, then dividing the sentences into data structures taking words as units, then, marking parts of speech, marking words in the text with labels (noun labels, verb labels, adjective labels, adverb labels, preposition labels, number labels and the like), identifying active objects (such as nouns which can be used as active objects by users, systems, background, administrators and the like) from key sentences by using an AT-SRnn model, and finally, identifying the dependency relationship between the active objects and other words in a required text by using a dependency syntax analysis method according to the extracted active objects.

And 3, extracting UML elements by using a set of activity diagram element recognition rules after performing dependency syntactic analysis processing on the required text so as to further generate an activity diagram. The following rules are set for identifying common elements of the activity diagram in the flow:

active node:

1. "verb tags" are identified in the parsed demand text and used to identify activities.

2. A motor phrase is used as the activity name.

3. In the activity graph "activity nodes" are represented using rounded rectangles.

Decision node:

1. if the sentence pattern structure of a sentence is "if/if + noun" subject "+ verb" predicate "+ noun" object ":" if/if + noun "subject" + verb "predicate" "or" if/if + verb "predicate" + noun "object", the sentence is mapped to a decision node.

2. The adverbs such as "if/if" are extracted from the "adverb tag" and mapped to one conditional particle.

"whether/not/whether or not" is also mapped as a conditional example.

4. Sentences after the "conditional particles" are extracted and mapped to one conditional sentence.

5. The conjunctions "then/just/then" are extracted and mapped to answer particles.

6. Sentences after the "conditional particles" are extracted and mapped to conditional answers.

7. If the next sentence does not have "if not/otherwise/else" then it is considered a simple flow without any decision blocks and separations.

8. If the next sentence has "if not/otherwise/else" then the arrow is used to split into two branches, a true branch and a false branch, respectively: the words after "then/on/then" denote a true branch, and the words after "if not/else" denote a false branch.

9. In the activity diagram "decision nodes" are represented using diamonds, the decision nodes having one incoming control flow and at least two outgoing control flows.

Execution order/control flow:

1. and (5) ending each sentence in the syntactic analyzed requirement text. "is identified as a control flow.

2. The conjunctions of the table-taking relations such as "then/then" and the like existing in the sentence are identified as control flows.

Start node: the node need not be extracted from the user's needs, is first constructed when drawing the activity map, and has one and only one connected to the next activity element, represented by a filled circle.

End node: the node does not need to be extracted from the user's needs, and must exist, connected to the last active element, represented by concentric circles.

Step 4, the activity map elements are identified and marked according to the rules defined in the step three, and then the activity map is generated, wherein the specific steps are as follows:

And 4.1, traversing the demand text after the syntactic analysis, identifying the tags of the 'moving objects' in the text, creating the same number of lanes, and marking nouns marked as the 'moving objects' above the lanes in sequence.

Step 4.2, constructing a filled circle in any lane, named "start node".

And 4.3, traversing the parsed demand text again.

And 4.4, taking the 'active object' as an entry, and judging whether the sentence has an 'active node' or 'decision node' label when the 'active object' is identified. If the label of the 'active node' exists, a rounded rectangle is constructed in the lane where the 'active object' exists, and the name uses a moving guest phrase. If the decision node label exists, a diamond is constructed in the lane where the active object exists, and the name is a conditional sentence. For "decision node" it is necessary to have two derived control flows, representing a real branch and a false branch, respectively, while judging whether there is an "active node" or a "decision node" tag in the "conditional answer".

And 4.5, continuously scanning the syntactic analyzed demand text, and drawing the activity relationship in the activity diagram by using the same steps by taking each 'activity object' as an entry.

Step 4.6, after each sentence is finished. "is converted into a control stream.

And 4.7, constructing a concentric circle, namely an end node.

Example III

Step 1: and taking the required text as input, and normalizing the required text by applying rules.

The text is normalized according to the application rule in the step 1, and the method specifically comprises the following steps:

and reconstructing the natural language text, converting the complex requirements into simpler sentences, and making a set of grammar rules. When a user writes a demand text, the demand text needs to be normalized by using the set of rules, so that the accuracy of information extraction from the demand text is improved, the time for analyzing the demand is saved, and the extraction of UML elements is promoted. The required text is normalized according to a rule base, and the Chinese rule base comprises:

(1) sentences without verbs are deleted.

(3) The object pronouns are replaced with object names.

(4) Turning the passive sentence into the active sentence.

(5) Conjunctions of the table parallel relation of ' seen ' and ' in the requirement text are shown in the specification; "regarded as". ",": "replace with" yes, i.e. divide it into several independent sentences. ". ? The%! The quotation marks, brackets, … …, -, < >, and the special names remain unchanged.

And 2, processing the text by using a natural language processing technology. Firstly, dividing and marking a text to divide the text into sentences, then decomposing the sentences into a data structure taking words as units, then marking parts of speech to label the words in the text (noun label, verb label, adjective label, adverb label, preposition label, number label and the like), identifying active objects (such as nouns which can be used as active objects by users, systems, background, administrators and the like) from key sentences by using an AT-SRnn model, and finally identifying the dependency relationship among the words by using a dependency syntax analysis method.

And 3, extracting UML elements by using a set of activity diagram element recognition rules after performing dependency syntactic analysis processing on the required text so as to further generate an activity diagram. We set the following rules for identifying common elements of an activity map in a flow:

Active node:

2. A motor phrase is used as the activity name.

Decision node:

"whether/not/whether or not" is also mapped as a conditional example.

Execution order/control flow:

1. after each sentence in the required text is finished. "is identified as a control flow.

Step 4, generating an activity map after identifying and marking the activity map elements, wherein the specific steps are as follows:

And 4.1, traversing the whole syntactic analyzed demand text, identifying the tags of the 'active objects' in the text, creating the same number of lanes, and marking nouns marked as the 'active objects' above the lanes in sequence.

Step 4.2, constructing a filled circle in any lane, named "start node".

And 4.3, traversing the parsed demand text again.

Step 4.6, after the end of each sentence. "is converted into a control stream.

And 4.7, constructing a concentric circle, namely an end node.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. An automated intelligent activity diagram generation method, comprising:

2. An automated intelligent activity diagram generation method according to claim 1, wherein,

step 1, normalizing a pre-acquired demand text by using grammar rules and a neural network technology to obtain a normalized demand text, wherein the normalized demand text is realized by the following steps:

3. An automated intelligent activity diagram generation method according to claim 2, wherein,

step 1.1, modifying a pre-acquired demand text by using grammar rules to obtain a modified demand text, wherein the modified demand text is realized by the following steps:

deleting sentences without verbs in the pre-acquired demand text;

the method comprises the steps of (1) regarding 'sum' in a pre-acquired demand text and 'taking'; "treat as period, will": "replace with" yes ";

splitting sentences connected by conjunctions representing parallel relations in the pre-acquired demand text to obtain a plurality of sentences,

if one sentence only has one subject and two or more moving object phrases, splitting the sentence into a plurality of sentences to obtain a modified demand text.

4. An automated intelligent activity diagram generation method according to claim 2, wherein,

5. An automated intelligent activity diagram generation method according to claim 4, wherein,

carrying out feature fusion on the third feature and the context related feature of the text sequence by utilizing a linear block to obtain a fusion feature result;

6. An automated intelligent activity diagram generation method according to claim 1, wherein,

step 2, processing the normalized demand text by using an AT-SRnn model and a dependency syntax analysis method which are obtained by pre-training to obtain the dependency relationship between the active object and the demand text words, wherein the method is realized by the following steps:

step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model obtained by pre-training, and extracting to obtain a movable object;

and 2.5, identifying the dependency relationship between the active object and other words in the demand text according to the extracted active object by using a dependency syntactic analysis method, and obtaining the demand text after syntactic analysis.

7. The automated intelligent activity diagram generation method of claim 5, wherein,

step 2.4, inputting the part-of-speech marked requirement text into an AT-SRnn model obtained by pre-training, extracting and obtaining a movable object, and realizing the method by the following steps:

8. The automated intelligent activity diagram generation method of claim 7, wherein,

the AT-SRnn model is obtained through pre-training, and is realized through the following steps:

acquiring a data set related to the software requirement field, wherein the data set related to the software requirement field comprises a software requirement design analysis text, a software project, a UML model software requirement text and a movable object true value of an actual application project;

Inputting a data set related to the software requirement field into a constructed AT-SRnn model, and outputting a predicted value of an active object by the AT-SRnn model;

9. The automated intelligent activity diagram generation method of claim 6, wherein,

the activity diagram element identification rule comprises an activity node, a decision node, an execution sequence/control flow, a starting node and an ending node;

10. The automated intelligent activity diagram generation method of claim 9, wherein,

Step 4, generating an active diagram by using UML elements and active objects, wherein the active diagram is realized by the following steps:

step 4.2, constructing a solid circle in any lane to obtain a starting node;

step 4.5, converting the period or the "" of each sentence in the syntactic demand text into a control stream;