CN116775954A

CN116775954A - Function point extraction processing method and system

Info

Publication number: CN116775954A
Application number: CN202310540554.XA
Authority: CN
Inventors: 胡贝贝; 樊志强; 夏晓凯; 刘禹; 牛婵; 陈方悦; 孙悦
Original assignee: Beihang University; CETC Information Science Research Institute
Current assignee: Beihang University; CETC Information Science Research Institute
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-09-19

Abstract

The invention belongs to the technical field of software analysis, and provides a method and a system for extracting and processing function points, wherein the method acquires a data set through acquiring a demand analysis text, a software design text and a system design text which are available in a public channel, and performs function point labeling to establish a knowledge base; according to the established knowledge base, performing character string matching on the function points in the knowledge base and the text to be processed by adopting a KMP algorithm to extract the function points contained in the text to be processed and determine the category labels of the function points; inputting the text to be processed into a pre-built automatic extraction model, and outputting functional points contained in the text to be processed and categories to which the functional points belong; and performing sequencing evaluation calculation on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model so as to screen the function points. The invention realizes a faster and more effective automatic extraction process of the function points, and effectively avoids the problem of missing the function points.

Description

Function point extraction processing method and system

Technical Field

The present invention relates to the field of software analysis technologies, and in particular, to a method and a system for extracting and processing a functional point.

Background

The function point analysis method is one method for measuring the cost of software. The function points generally refer to five kinds of function points in the function point analysis method. Currently, it is usually extracted manually by an expert. However, in recent years, some automatic extraction technologies of function points are presented, and the function points can be automatically extracted from the requirement analysis text. However, there are certain cases of missing functional points for both expert manual extraction and existing automatic extraction methods for functional points implicitly described in the demand analysis text. In addition, the existing automatic extraction method also has the problem that functional points are lost due to limitation of model capacity. In addition, there is still a great room for improvement in how to more effectively promote the automatic extraction of function points and the expansion of function points.

Therefore, it is necessary to provide a functional point extraction processing method to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a functional point extraction processing method and a functional point extraction processing system, so as to solve the technical problems of functional point deficiency in the existing manual extraction method and the existing automatic extraction method in the prior art, and how to more effectively improve the technical problems of automatic functional point extraction, functional point expansion and the like.

The first aspect of the present invention provides a method for extracting and processing a functional point, including: acquiring a demand analysis text, a software design text and a system design text which are available in a public channel, obtaining a data set, carrying out functional point labeling on sample data in the data set, and establishing a knowledge base; according to the established knowledge base, performing character string matching on the function points in the knowledge base and the text to be processed by adopting a KMP algorithm to extract the function points contained in the text to be processed and determine class labels of the function points; using a pre-constructed automatic extraction model to automatically extract functional points of a text to be processed, inputting the text to be processed into the automatic extraction model, and outputting the functional points contained in the text to be processed and the category to which each functional point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, optimizing model parameters for a plurality of times; and performing sequencing evaluation calculation on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model so as to screen the function points.

According to an alternative embodiment, the automatic extraction model is constructed based on the Bert-BiLSTM-CRF algorithm, and the method comprises the following steps: constructing a Bert layer, a BiLSTM layer and a CRF layer to construct the automatic extraction model; according to the length n of sample data in a training set, determining the dimension of a word vector to be generated, specifically inputting the sample data with the length n into a Bert layer, generating a first vector with the dimension of n being a specific value, wherein the range of n is more than 0 and less than or equal to 512, and the range of the specific value is more than 700 and less than or equal to 800; inputting the first vector into a BiLSTM layer to establish context relation among n vectors, and obtaining sequence semantic information corresponding to the text to be processed; and inputting the sequence semantic information obtained by the BiLSTM layer into the CRF layer, and outputting the function points contained in the sample data and the category of the function points to which each function point belongs.

According to an alternative embodiment, a function point label set is established, according to the function point label set, sample data with a specified proportion in the data set is subjected to function point labeling to obtain a first data set, and residual sample data in the data set is used for generating a pseudo label to obtain a second data set, wherein the function point label set comprises the following function point labels: a first type of tag using ILF, a second type of tag using EIF, a third type of tag using EI, a fourth type of tag using EO, a fifth type of tag using EQ.

According to an alternative embodiment, the data set demand analysis text is divided into a training set, a verification set and a test set according to a specific proportion; determining a preliminary training round and a retraining round according to the number of the first data sets and the number of the second data sets; according to the determined preliminary training rounds, training the automatic extraction model by using the training set to obtain a preliminary automatic extraction model; and performing additional training on the preliminary automatic extraction model by using the second data set according to the determined retraining round.

According to an alternative embodiment, optimizing model parameters in a multiple model verification process specifically comprises updating a preliminary training round and a retraining round according to the change conditions of accuracy and loss values in the training process; optimizing model parameters in the model test process specifically comprises updating retraining rounds according to the change conditions of accuracy and loss values in the retraining process.

According to an alternative embodiment, the performing character string matching on the function points in the knowledge base and the text to be processed by using a KMP algorithm to extract the function points included in the text to be processed includes: determining the lengths of the functional points and the character strings of the text sentences containing the functional points in the knowledge base, using the mode string representation, and constructing a next array; determining the character string length of the text to be processed, and representing the character string by using the text string; performing character string matching on each mode string in the knowledge base and the text string of the text to be processed one by one, and determining a matching failure position for determining a starting position in next matching; and extracting corresponding function points in the text to be processed when the pattern string in the knowledge base is successfully matched with the text string of the text to be processed.

According to an alternative embodiment, the sorting evaluation calculation is performed on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model to screen the function points, including: vector conversion is carried out on the functional points extracted based on the knowledge base and the functional points extracted by using the automatic extraction model by using the credibility of the functional points, a pre-trained machine learning model is input, and a credibility evaluation value is output; and sequencing the function points according to the output credible evaluation values to screen out the function points with the credible evaluation values larger than the appointed value.

According to an alternative embodiment, performing knowledge extraction and knowledge relation extraction on function points and category labels extracted based on a knowledge base and function points and categories which belong to the function points extracted by using an automatic extraction model to form a function point triplet so as to construct a function point knowledge graph; the method comprises the steps of extracting knowledge relations according to internal relations between functional points representing different types and different operations in a text to be processed, and obtaining the following various relations for representing unidirectional or bidirectional edges between adjacent entity nodes in a functional point knowledge graph: dependency, inheritance, aggregation, action, generalization, synonym, trigger, parallelism, interaction, coexistence.

According to an alternative embodiment, the function point expansion is performed according to the constructed function point knowledge graph.

The second aspect of the present invention provides a system for extracting and processing functional points, where the method for extracting and processing functional points according to the first aspect of the present invention includes: the system comprises a building module, a knowledge base and a data set, wherein the building module collects a demand analysis text, a software design text and a system design text which are available in a public channel to obtain the data set, carries out functional point labeling on sample data in the data set and builds the knowledge base; the first extraction module is used for carrying out character string matching on the function points in the knowledge base and the text to be processed by adopting a KMP algorithm according to the established knowledge base so as to extract the function points contained in the text to be processed and determine the category labels of the function points; the second extraction module is used for automatically extracting the function points of the text to be processed by using a pre-built automatic extraction model, inputting the text to be processed into the automatic extraction model, and outputting the function points contained in the text to be processed and the category to which each function point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, optimizing model parameters for a plurality of times; and the screening module is used for carrying out sequencing evaluation calculation on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model so as to screen the function points.

A third aspect of the present invention provides an electronic apparatus, comprising: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect of the present invention.

A fourth aspect of the invention provides a computer readable medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the method according to the first aspect of the invention.

The embodiment of the invention has the following advantages:

compared with the prior art, the method and the device have the advantages that through the self-built knowledge base, the function points are extracted by adopting the KMP algorithm based on the self-built knowledge base, so that the function points in the text to be processed can be extracted in a more accurate matching way, the matching time complexity of the character strings can be reduced to be linear time complexity, the time complexity in the text matching process can be effectively reduced, the text matching efficiency can be further effectively improved, and the accuracy of the function point extraction can be further improved; by matching with an automatic extraction model constructed based on the Bert-BiLSTM-CRF algorithm, the function points can be extracted more comprehensively, the automatic extraction process of the function points can be realized more quickly and more effectively, and the function points which are not in the knowledge base can be extracted effectively.

The function point knowledge graph which is more accurate and comprises the function point triples can be obtained by carrying out knowledge extraction and knowledge relation extraction on the existing function point analysis text to form the function point triples so as to construct the function point knowledge graph; when a text to be processed is received, identifying the contained functional points, determining whether the identified functional points are search keywords or not to determine whether to execute the step of traversing the relation paths among the directed entity nodes in the functional point knowledge graph, and establishing a knowledge graph node queue in the process of traversing the functional point knowledge graph, so that the functional point is amplified according to the established knowledge graph node queue, the automatic amplification process of the functional points can be realized more quickly and more effectively, and the problem of functional point missing can be effectively avoided.

In addition, based on a BFS searching algorithm, all reachable entity nodes are searched, and the entity nodes in the knowledge graph node queue are updated by determining updating parameters in real time, so that the knowledge graph node queue with higher reliability can be obtained, and the method can be further optimized.

Drawings

FIG. 1 is a flowchart of an example of a functional point extraction processing method of the present invention;

FIG. 2 is a technical framework diagram of a functional point extraction processing method of the present invention;

FIG. 3 is a schematic diagram of an example of creating a set of function point labels in the function point extraction processing method of the present invention;

FIG. 4 is a schematic diagram of an example of performing functional point extraction by using a KMP algorithm based on a self-built knowledge base in the functional point extraction processing method of the present invention;

FIG. 5 is a schematic diagram of an example of string matching of the string to be matched (i.e., the text string of the text to be analyzed) of FIG. 4 with template strings in a knowledge base;

FIG. 6 is a block diagram of a function point extraction using an automatic extraction model in a function point extraction processing method according to the present invention;

FIG. 7 is a schematic diagram of an automatic extraction model in a functional point extraction processing method according to the present invention;

FIG. 8 is a graph illustrating the variation of accuracy of the training set during the preliminary training process (i.e., the training process corresponding to the preliminary training round) by the automatic extraction model in the function point extraction processing method according to the present invention;

FIG. 9 is a schematic diagram of a change in the loss value of the training set during a preliminary training process (i.e., a training process corresponding to a preliminary training round) by an automatic extraction model in the function point extraction processing method according to the present invention;

FIG. 10 is a graph showing the effect of the change in accuracy of the training set during the additional training process (i.e., the training process corresponding to the retraining round) by the automatic extraction model in the function point extraction processing method according to the present invention;

FIG. 11 is a graph showing the effect of the change of the loss value of the training set during the additional training process (i.e., the training process corresponding to the retraining round) by the automatic extraction model in the functional point extraction processing method according to the present invention;

FIG. 12 is a graph showing the effect of the change in accuracy of the verification set during the additional training process (i.e., the training process corresponding to the retraining round) by the automatic extraction model in the function point extraction processing method according to the present invention;

FIG. 13 is a schematic diagram of an example of functional point knowledge graph construction by extracting functional points using the functional point extraction processing method of the present invention;

FIG. 14 is a flow chart of an example of a step of performing a traversal of a relationship path between directed entity nodes in the functional point knowledge graph of FIG. 13;

FIG. 15 is a flowchart illustrating another example of performing the step of traversing the functional point knowledge graph of FIG. 13

FIG. 16 is a block diagram of a functional point extraction processing system of the present invention;

FIG. 17 is a schematic diagram of an embodiment of an electronic device according to the present application;

fig. 18 is a schematic diagram of an embodiment of a computer readable medium according to the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In view of the above problems, the present application provides a method for extracting and processing functional points, which extracts functional points from a self-built knowledge base by adopting a KMP algorithm based on the self-built knowledge base, can more accurately match and extract functional points in a text to be processed, can reduce the complexity of matching time of a character string to the complexity of linear time, can effectively reduce the time complexity in the text matching process, and can improve the accuracy of extracting functional points. According to the application, the automatic extraction model constructed based on the Bert-BiLSTM-CRF algorithm is used for extracting the function points, so that the function points can be extracted more comprehensively, the automatic extraction process of the function points can be realized more quickly and more effectively, and the function points which are not in the knowledge base can be extracted effectively.

Fig. 1 is a flowchart showing steps of an example of the function point extraction processing method of the present application.

The following describes the present invention in detail with reference to fig. 1 to 15.

As shown in fig. 1, in step S101, a demand analysis text, a software design text and a system design text available in a public channel are collected to obtain a data set, and sample data in the data set is labeled with functional points to establish a knowledge base.

The data set is obtained by obtaining demand analysis text, software design text, and system design text (e.g., about 900) from public channels such as github, gitlab, bloggery, and china knowledge networks, and obtaining the existing known function points.

Next, functional point labeling is performed on the sample data in the dataset (including in particular the demand analysis text, the software design text and the system design text), for example using the Doccano labeling platform, a knowledge base is built (see in particular fig. 2), and a first dataset is obtained for subsequent model training.

It should be noted that Doccano is a lightweight open source data labeling platform, and uses the automatic labeling function of the platform to label the functional points of the sample data in the data set.

In one embodiment, a portion of the sample data in the dataset (e.g., 112 kilobytes of demand analysis text) is functionally point labeled to obtain a first dataset for subsequent model training.

And (5) carrying out functional point labeling by adopting a named entity identification method. As shown in fig. 3, a set of function point labels is established, where the set of function point labels includes information such as names, numbers, colors, etc. of category labels, for example, the information may be displayed on a visual interface, and operations such as creation, modification, deletion, etc. may be performed on the information.

Specifically, the set of function point labels includes the following function point labels: a first type of tag using ILF, a second type of tag using EIF, a third type of tag using EI, a fourth type of tag using EO, a fifth type of tag using EQ, see table 1 below for details.

Optionally, the demand analysis text is classified according to the function point category and/or the scene parameter, and the function point labels are established to establish a function point label set.

TABLE 1

Table 1 shows the classification (category label) of the function points, the names, and the meanings indicated by the respective kinds of function points.

It should be noted that, since the internal logical file and the external interface file do not have a common feature, the two files are respectively labeled as different category labels, and the category labels of the entities related to the internal logical file and the external interface file are respectively represented by ILF and EIF.

It should be noted that, the function point label may also have a sixth type label, a seventh type label, or only three types of labels or four types of labels, and the foregoing is merely illustrative as an optional example, and is not to be construed as limiting the present invention.

Using sample data labeled with category labels (i.e., demand analysis text labeled with category labels), a knowledge base is built. In other words, the knowledge base includes demand analysis text labeled with category labels.

For example, the knowledge base includes function points, text segments containing function points, text sentences containing function points.

Further, the number of tagged data (e.g., 112-thousand demand analysis text) is proportional to the number of untagged data (e.g., 243-thousand), the untagged sample data in the dataset (i.e., the sample data remaining in the dataset) is used to generate a pseudo tag, resulting in a second dataset for subsequent additional training of the model.

In a preferred embodiment, to ensure data quality, each text in the dataset intercepts only a portion of the required analysis (i.e., a portion of the text data) and requires preprocessing, including: and removing useless characters, removing picture links, web page links and the like by using regular expressions.

Next, in step S102, according to the established knowledge base, a KMP algorithm is adopted to match the function points in the knowledge base with the text to be processed and the character strings, so as to extract the function points contained in the text to be processed, and determine the category labels of the function points.

And adopting a KMP algorithm to match the functional points in the knowledge base with the text to be processed, wherein the method specifically comprises the following steps as shown in fig. 4.

Step S401: determining the character string length of the function points and the text sentences containing the function points in the knowledge base, using the mode string representation, and constructing a next array.

Specifically, for each functional point in the knowledge base, a pattern string is used to represent each functional point in the knowledge base, and a next array is built according to the pattern string. The meaning of each value of the next array is to represent how long the same prefix suffix is in the substring preceding the current character. The function point "order" is represented, for example, using a pattern string ab.

It should be noted that, the same prefix and suffix are the same prefix and suffix, and specifically include the current character and the sub-character string formed by all the characters in front of the current character. Specifically, for example, the substring ABCDAB has the same prefix suffix AB from left to right, so the current value is 2. The substring ABCDABD has no identical prefix suffix, so the current value is 0. For another example, the substring ABCDA, from left to right, has the same prefix suffix a, so the current value is 1.

Step S402: the string length of the text to be processed is determined and represented using the text string.

And receiving a text to be processed, and extracting functional points from the received text to be processed, wherein the text to be processed is, for example, a text to be analyzed in need. Specifically, the character string length of the text to be processed is determined, and the text string is used for representing the text to be processed, i.e. the text string or the string to be matched, as shown in fig. 5. The text to be processed is indicated, for example, using abcmnfghab, deab, xkab, "commodity management contains shelves, deletes commodity, modifies commodity". Matching with template string "..delete commodity" in knowledge base.

In one embodiment, for example, the text to be processed dexb (corresponding to "delete merchandise").

Step S403: and carrying out character string matching on each mode string in the knowledge base and the text string of the text to be processed one by one, and determining a matching failure position for determining a starting position in next matching.

In a specific embodiment, for example, when the function point ab in the knowledge base matches a character string with the text to be processed deab, if the matching between the character a in the function point and the character d in the text to be processed is unsuccessful, determining that the matching fails position is a, when matching is performed next time, the position corresponding to the character a is the starting position when matching is performed next time, matching the character a with the character e in the text to be processed, and when matching is unsuccessful, the position corresponding to the character a is still the starting position when matching is performed next time.

Step S404: and extracting corresponding function points in the text to be processed when the pattern string in the knowledge base is successfully matched with the text string of the text to be processed.

And then, matching the character a in the knowledge base with the character a in the text to be processed, and matching the next character b of the character a with the next character b of the character a in the text to be processed when the matching is successful, determining that the matching is successful, wherein the text to be processed contains the function point commodity corresponding to the character string ab, and extracting the corresponding function point commodity in the text to be processed.

For the matching process of the KMP algorithm, if the length of the text to be matched is n, the average length of the function points in the knowledge base is m, the matching time complexity of a single function point is O (n+m), and the number of all the function points in the knowledge base is k, the total searching time complexity is O ((n+m) k). Compared with the searching time complexity O (n) m of the existing method, the searching of the single function point can reduce the matching time complexity of the character strings to the linear time complexity O (n+m), can effectively reduce the time complexity in the text matching process, and can further effectively reduce the matching time.

Next, in step S103, using a pre-built automatic extraction model, performing automatic extraction of function points on the text to be processed, inputting the text to be processed into the automatic extraction model, and outputting the function points included in the text to be processed and the category to which each function point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, the model parameters are optimized for a plurality of times.

Based on the Bert-BiLSTM-CRF algorithm, the method for constructing the automatic extraction model is as follows.

First, the data set acquired in step S101 is divided into a training set, a verification set and a test set according to a specific ratio.

In an alternative embodiment, a first data set of the data sets is specifically selected according to 8:1:1 (i.e. a specific proportion) is divided into a training set, a verification set and a test set, wherein the training set is used for training the automatic extraction model, the verification set is used for verifying the automatic extraction model in the model training process, and the test set is used for carrying out overall test after model training.

Furthermore, a second dataset is formed using unlabeled data in the dataset, the second dataset being used to generate pseudo tags after training the automatic extraction model using the training set, and retraining using sample data with the pseudo tags. Specifically, the automatic extraction model is trained using a training set to obtain a preliminary automatic extraction model, the preliminary automatic extraction model is used, and the second dataset is used to generate the pseudo tag.

For the construction of the automatic extraction model, a Bert layer, a BiLSTM layer and a CRF layer are specifically constructed to construct the automatic extraction model, and the trained automatic extraction model is used for automatically extracting the function points of the text to be processed, namely, inputting the text to be processed, and outputting the function points contained in the text to be processed and the category to which each function point belongs, see fig. 6 in particular.

Training the automatic extraction model by using a training set, determining the dimension of a word vector to be generated according to the length n of sample data in the training set, specifically inputting the sample data with the length n (such as a demand analysis text) into a Bert layer, generating a first vector with the dimension of n being a specific value, wherein the range of n is more than 0 and less than or equal to 512, and the range of the specific value is more than 700 and less than or equal to 800, preferably 768.

Assuming that the sample data (in this example, for example, the demand analysis text) has a length of n, the input Bert layer will generate a word vector of 512 dimensions n, i.e., one 512-dimensional vector per word, resulting in a first vector corresponding to the sample data (in this example, for example, the demand analysis text).

It should be noted that the main purpose of the Bert layer is to generate word vectors for downstream tasks (specifically for use as input to the BiLSTM layer). The Bert is used as a pre-training model, is obtained by training mass data, and has quite complex and excellent training mechanism, so that the generated word vector has rich semantic information. For the training process of an automatic extraction model (also called as a Bert-BiLSTM-CRF model), the invention provides an effective semi-supervised model architecture, which can effectively utilize collected demand analysis cost data and save labor cost of data labeling.

The first vector is then input into the BiLSTM layer, which is processed to establish contextual links between the n vectors, resulting in sequence semantic information corresponding to the sample data (in this example, for example, demand analysis text). And inputting the sequence semantic information obtained by the BiLSTM layer into the CRF layer, and outputting the function points contained in the sample data (such as the requirement analysis text in the example) and the category of the function points to which each function point belongs.

It should be noted that, the creation of the sequence timing information by the Bert layer is only word embedding fused with position coding information, which is helpful for parallel computation of a large network, however, the time information of the context dependency relationship of the created sequence is weaker, so the automatic extraction model of the invention uses multi-layer construction, wherein the BiLSTM is a classical model for creating the time information for various texts, can construct better context semantic information for word vectors generated by the Bert layer, and the BiLSTM layer can learn the semantic information on both sides of various texts at the same time.

For example, n 512-dimensional (768-dimensional, for example) vectors output by the Bert layer are processed, and context linkage between the n vectors is established to supplement the information content of the bistm layer in detail, so as to output a second vector having the same dimension as the first vector.

It should be noted that, for the BiLSTM layer, where LSTM is a network of a recurrent neural network, the basic process and the basic units thereof are shown in fig. 7 mainly by inputting text information to the basic units thereof and implicitly outputting the basic units last time in sequence. Wherein the portions within the black rectangular box represent the basic neural elements of the LSTM and the dashed lines represent the features of the cycle. The inputs of the LSTM current neuron come from the memory output of the last neuron, the hidden state output, and the input of the sequence current word. The forgetting gate is mainly used for neglecting unimportant information transmitted by a past sequence, the input gate is mainly used for inputting current word information, and the output gate is used for outputting semantic information formed by combining the current neuron with the current input word and memory information. The gate mechanism is also one of the reasons that LSTM is excellent in the field of natural language processing, and the calculation formulas of each gate are shown in (1), (2) and (3), and the formulas use a batch calculation mode, and assume that the batch_size is n, and have h hidden units.

I _t ＝sigmoid(X _t W _xi +H _t-1 W _hi +b _i ) (1)

F _t ＝sigmoid(X _t W _xf +H _t-1 W _hf +b _f ) (2)

O _t ＝sigmoid(X _t W _xo +H _t-1 W _ho +b _o ) (3)

Wherein I is _t The input gate is referred to as t time, wherein t is referred to as t time; x is X _t Refers to an input vector at time t; w (W) _xi Refers to a weight parameter; h _t-1 Refers to the hidden state at the t-1 time; w (W) _hi Refers to a weight parameter; b _i Refers to the bias parameter; f (F) _t The forgetting door at the moment t; w (W) _xf Refers to a weight parameter; w (W) _hf Refers to a weight parameter; b _f Refers to the bias parameter; o (O) _t Refers to an output gate at time t; w (W) _xo Refers to a weight parameter; w (W) _ho Refers to a weight parameter; b _o Refers to the bias parameter. All weight parameters and bias parameters are different and thus different gates are obtained.

The memorization process of the current neuron needs to combine the input information of the current word and the hidden information of the past sequence, and two activation functions are used. First forming candidate memoriesAnd then, after the information of the past sequence is subjected to forgetting gate processing, important information is obtained, and the important information is combined with candidate memory to obtain memory information of the current neuron for input of the next neuron. As shown in formulas (4) and (5), wherein ≡indicates the product of the corresponding elements of the matrix.

Wherein, the liquid crystal display device comprises a liquid crystal display device,is at time tCandidate memory cell, W _xc 、W _hc Are all weight parameters, b _c Is a bias parameter. C (C) _t Is a memory cell at time t, C _t-1 Is a memory cell at time t-1, F _t Is the forgetting gate at the moment t, I _t Is the input gate at time t.

The formulas of the activating functions sigmoid and tanh are shown as (6) and (7).

Through the above calculation, the output of the hidden state, that is, the output of the whole neuron is finally obtained, as shown in the formula (8).

H _t ＝O _t ⊙tanh(C _t ) (8)

Wherein H is _t Refers to the output of the hidden state at the moment t; o (O) _t Refers to an output gate at time t; c (C) _t Refers to the memory cell at time t.

Compared with LSTM, the CRF model is used as a more excellent machine learning method applied to the field of sequence labeling, and is mainly characterized in that the relationship among category labels is considered, the transition probability among the category labels can be automatically learned, the joint modeling is carried out on the whole sequence, the context relationship is better captured, and the global optimization is carried out during prediction, so that the optimal solution is obtained. For example, the length of the input sequence is n, the total number of predefined labels is k, the CRF layer predicts by receiving output vector information (i.e. second vector) of the BiLSTM layer, and comprehensively considers the whole sequence from k ⁿ And selecting the optimal one of the output paths, so as to determine the function points of the whole sequence, namely outputting the contained function points and class labels of all the function points.

For a training process of the automatic extraction model, a preliminary training round and a retraining round are determined based on the number of first and second data sets. And training the automatic extraction model by using the training set according to the determined preliminary training turns to obtain a preliminary automatic extraction model. Based on the determined retraining round, the preliminary automatic extraction model is additionally trained (i.e., retrained) using the second data set.

And determining a preliminary training round and a retraining round according to the number of the first data set and the second data set.

And training the automatic extraction model by using the training set according to the determined preliminary training turns to obtain a preliminary automatic extraction model.

And performing additional training on the preliminary automatic extraction model by using the second data set according to the determined retraining round.

For the evaluation index of the training process, the model evaluation index of the invention adopts Precision (Precision), recall (Recall) and F1 value (F1-score) to evaluate, and the real entity and the prediction result thereof, the pseudo entity and the prediction result thereof need to be considered, as shown in table 2.

TABLE 2

Table 2 shows confusion matrices for predicted and real entities.

Wherein TP represents the number that is actually a true entity and predicted to be true; FN represents the number that is actually a true entity, but predicted as false; FP represents the number that is actually a pseudo-entity, but predicted to be true; TN represents the number that is actually a pseudo-entity, but predicted as pseudo. The calculation formulas of the precision rate, recall rate and F1 value are shown as (9), (10) and (11).

For example, the Tensorboard visualization tool is used to monitor the change of accuracy and loss of the training set and the change of accuracy of the verification set during the model training process and the additional training process of automatically extracting the model, so as to determine the turn of the model preliminary training and retraining. As shown in fig. 8-12, where the x-axis represents the number of training steps. The training turns can be obtained by dividing the training steps by the number of sample data in the training set. From fig. 8 and 9, a preliminary training round may be obtained, and from the preliminary training round determined in fig. 8, a retraining round may be obtained in combination with fig. 9 to 12.

Under the condition of not considering semi-supervision, only using the training set divided by the first data set, setting a larger epoch for the automatic extraction model, taking global_step as the x axis, and observing the change conditions of the accuracy (namely training accuracy) and the loss value (training loss value) of the model training process. As can be seen from fig. 8, after 14000 steps, the change curves of the accuracy and the loss value of the model training tend to converge, and each epoch in the training set of the present invention has, for example, 856 steps (step_num), so the equation for calculating the epoch is shown in (12):

in consideration of semi-supervised training, the training set divided by the first data set sets training rounds epoch to 16 according to the above reasoning. Predicting each sample data in a second data set using a preliminary automatic extraction model trained using the first data set, the second data set having an amount of data that is approximately twice the amount of data of the first data set. The change of the Accuracy (Accuracy), loss value (Loss) and Accuracy (Accuracy) of the validation set during the whole training process is shown in fig. 10 to 12. General purpose in three diagramsThree blocks were divided by two dotted lines, and the left part represents 16 epoch (epoch ₁ ) Is a preliminary training process of 856 steps per epoch (step_num ₁ ) The middle part represents the retraining process with 1763 steps per epoch (step_num ₂ ) The model substantially converges after global_step is 70 k. In the retraining process, the accuracy of training data (particularly the second data set) and the accuracy of the verification set are improved to a certain extent, and the model loss is reduced to a greater extent. Considering that retraining data is only sample data of pseudo tags obtained through prediction of a model and a knowledge base, the three indexes can be improved to different degrees, and the effectiveness of semi-supervised training of the invention is verified. Retrained epoch (epoch) ₂ ) The calculation process is shown in formula (13):

the automatic extraction model is trained by using a training set divided by the first data set according to the determined initial training round to obtain an initial automatic extraction model, and the initial automatic extraction model is additionally trained by using the second data set according to the determined retraining round to obtain a more accurate automatic extraction model.

It should be noted that the foregoing is merely illustrative of the present invention and is not to be construed as limiting thereof.

Next, in step S104, ranking evaluation calculation is performed on the function points extracted based on the knowledge base and the function points extracted using the automatic extraction model to screen the function points.

Specifically, the credibility of the functional points is used, vector conversion is carried out on the functional points extracted based on the knowledge base and the functional points extracted by using the automatic extraction model, a pre-trained machine learning model is input, and a credibility evaluation value is output. The machine learning model is, for example, a statistical language model.

And sequencing the function points according to the output credible evaluation values to screen out the function points with the credible evaluation values larger than the appointed value. The reliability of the functions and the filtered function points are thereby identified by the machine model.

In other embodiments, the machine learning model may be an LSTM model. The law of the function points is learned through training by using a machine learning model. The confidence level evaluation value is a negative number, and the greater the confidence level is, the closer to 0 is the higher the confidence level is. The smaller it is, the further from 0, the lower the confidence.

For example, the reliability evaluation value of "add commodity" is-4.93, and the reliability evaluation value of "daily price commodity" is-6.00. The evaluation value of the added commodity is larger than that of the daily commodity, which means that the former is more similar to a functional point, and the added commodity is arranged in front of the daily commodity in order to filter out the functional points with the credibility evaluation value smaller than or equal to the appointed value, namely, the functional points with the credibility evaluation value larger than the appointed value are screened out.

In one embodiment, for example, the function points are scored using an existing Kenlm tool (i.e., a tool that exists on the web), and each function point is effectively evaluated with the focus of inputting whether the function point matches the feature of the function point, for example, whether the "delete user" matches the feature of the function point more than the "learn user".

It should be noted that, the Kenlm tool can quantify the feature of the function point of the phrase to score (for example, calculate the credibility, etc.), first trains Kenlm, and the trained data set is all the function points (excluding the labels) contained in the knowledge base, that is, the Kenlm can learn the rule of the contained function points from all the input function points without considering the labels, and only by inputting all the function points. After Kenlm learns the feature rule of the function points, scoring all the extracted function points, and then sequencing all the function points according to the score and outputting the sequenced function points, so that the function points can be better displayed to a user.

In another example, the functional point extraction processing method of the present invention further includes the steps of: performing knowledge extraction and knowledge relation extraction on the function points and category labels extracted based on the knowledge base, and the function points and the categories which are extracted by using the automatic extraction model to form a function point triplet so as to construct a function point knowledge graph; the method comprises the steps of extracting knowledge relations according to internal relations between functional points representing different kinds and different operations in a text to be processed, and obtaining the following various relations for representing unidirectional or bidirectional edges between entity nodes: dependency, inheritance, aggregation, action, generalization, synonym, trigger, parallelism, interaction, coexistence.

And then, according to the constructed functional point knowledge graph, performing functional point expansion.

In a specific embodiment, the extracting of the functional point entity according to the functional point category specifically includes extracting a first type entity, a second type entity, a third type entity and a fourth type entity, where the entities are all functional point entities.

For example, a hash tree is adopted to identify each entity in the function point text (specifically including the function points extracted based on a knowledge base, the function points extracted by using an automatic extraction model, the existing function point text and the like), and a first type entity related to an internal logic file in a data function is extracted, wherein the first type entity is a noun related to the internal logic file. In particular, the tables and files in the database shown in table 1 below are referred to as "commodity order table", "user information table", etc.

Next, a second type of entity associated with the external logical file in the data function is extracted, the second type of entity being a noun associated with the external logical file. Such as a reference file to be stored for data exchange.

Next, a third class of entities related to external input in the transactional function is extracted, the third class of entities being verbs related to external input (i.e., related to function point operations), and class label tags of the entities related to external input are represented using EI. Such as "add", "modify", "delete".

In addition, a third class of entities related to the external query in the transaction function is extracted, the third class of entities is verbs related to the external query (namely related to the function point operation), and class label tags of the entities related to the external query are represented by using EQ. Such as "query", "get".

In addition, a third class entity associated with the external output in the transactional function is extracted, the third class entity being a verb associated with the external output (i.e., associated with the function point operation), and a class label of the entity associated with the external output is characterized using EO. Such as "recommend", "derive", "count", "print", "generate".

It should be noted that, in other embodiments, the functional point entity extraction may also be performed according to scene parameters. The present invention is not limited to the above description, and may be applied to any conventional function point analysis text.

In particular, scene parameters are determined according to the applied domain. Taking e-commerce fields as examples, for example, scene parameters related to orders, scene parameters related to commodities, scene parameters related to users. More specifically, for example, order form, order number, payment, inventory, refund, merchandise form, user form, and the like, see, in particular, table 3.

TABLE 3 Table 3

/>

Table 3 shows an example of correspondence of scene parameters to respective entities.

In an alternative embodiment, the entity recognition algorithm is used for performing "entity disambiguation" (solving the problems of multi-word meaning and the like) on the words corresponding to the extracted various entities, so that synonyms with the same or similar semantics as the words can be recognized, and the words and the synonyms thereof are mapped to the same entity.

It should be noted that, due to the difference of the text expressions, different vocabulary expressions may have the same semantics, such as "add" and "increase", and then "add" and "increase" represent the same entity. For synonyms with the same semantics, see in particular table 4.

TABLE 4 Table 4

Original entity	Synonymous entity
		Adding orders	Placing orders, placing purchase orders, placing orders, creating orders
Adding commodity	Issuing commodity, putting on shelf commodity and commodity issuing
		Deleting goods	Goods under shelf and goods under shelf
Querying goods	Search commodity, commodity search, browse commodity, commodity browse

Table 4 is an example showing the relationship between each entity (each original entity) and its synonymous entities.

Further, the method also comprises the step of extracting the combined entity to obtain a fourth type entity, wherein the fourth type entity is a combined entity of the verb and the noun. For example, a combination of a first class of entities with a third class of entities, a combination of a second class of entities with a third class of entities, and so on. For example, "query commodity", "add commodity", "newly created order", "retrieve commodity", "commodity retrieve", "browse commodity", "commodity browse", "add user", "delete user", "modify user", "query user", and the like.

Optionally, for the function point categories, creating a name dictionary (e.g., represented using synonym, where keys in the name dictionary represent synonymous entities and values of the name dictionary represent original entities) corresponding to each function point category is further included, see in table 4 above.

Specifically, the name dictionary includes two columns, a key column and a value column, and has a corresponding entity relationship, wherein the key column is a synonymous entity, and the value column is an original entity (the original entity corresponding to each synonymous entity belonging to the same row in the key column can be seen in table 4). If the entities (i.e. the identified function points) have no synonymous relationship, the entities belonging to the same row and located in the key column and the value column are all original entities. If the entities (i.e., the identified functional points) have a synonymous relationship, the entity in the key column is the synonymous entity, and the entity in the value column and belonging to the same row as the synonymous entity (in the key column) is the original entity.

In a specific embodiment, the original entity is queried according to the synonymous entity in the name dictionary, and then the original entity in the functional point knowledge graph constructed later is amplified.

In this embodiment, knowledge relation extraction is performed on function points extracted based on a knowledge base, function points extracted using an automatic extraction model, and existing function point text, so as to form a function point triplet including entity nodes corresponding to function point entities (i.e., function point entity nodes), and unidirectional relations or bidirectional relations between adjacent entity nodes, so as to construct a function point knowledge graph.

Specifically, knowledge relationships are extracted, for example, by using a regular matching method, all the extracted knowledge relationships are subjected to statistical analysis, the number of relationship categories in a specified time period is calculated, and the relationship categories ranked a certain number before, for example, ten digits are taken.

In an alternative embodiment, according to scene parameters, extraction rules are configured, and according to the extraction rules, knowledge relation extraction is performed on functional points extracted based on a knowledge base, functional points extracted by using an automatic extraction model, and internal relations between functional points representing different types and different operations and between various operations and different functional points in an existing functional point text, so that the following various relations are obtained to be used for representing side relations between two adjacent entity nodes in a functional point knowledge graph, and the side relations are unidirectional or bidirectional. That is, different edge relationships are represented using relationship categories.

The specific relationship category comprises a dependency relationship, an inheritance relationship, an aggregation relationship, an action relationship, a generalization relationship, a synonymous relationship, a triggering relationship, a parallel relationship, an interaction relationship and a coexistence relationship, and the relationships are respectively represented by alpha, mu, phi, delta, epsilon, theta, omega, zeta, eta and lambda. See in particular table 5 below.

TABLE 5

/>

Table 5 shows various relationship categories, meanings represented by various relationships, and representation symbols.

Next, according to the extracted various entities and the knowledge relationships (including the unidirectional relationships and the bidirectional relationships) between the extracted entities, a function point triplet (the entity corresponding to the function point, the relationship, the entity corresponding to the function point, see table 6 for details) is formed, so as to construct a function point knowledge graph, wherein each entity corresponds to one entity node, namely, a function point entity node.

TABLE 6

/>

Table 6 is an example of triples showing different edge relationships.

Specifically, the relationship graph in the constructed functional point knowledge graph is composed of entity nodes (namely functional point entity nodes corresponding to the entities corresponding to the functional points) and directed relationship edges. See in particular fig. 13.

It should be noted that the functional point knowledge graph is an instantiation representation of functional points, and represents the inherent links between different functional point instances of a specific type of system (systems corresponding to different application fields).

In yet another example, upon receiving text to be processed, the contained function points are identified, and it is determined whether the identified function points are search keywords.

Preferably, before identifying the function points contained in the text to be processed, the following search keywords are determined according to the scene parameters and the frequency of use: query, add, merchandise, pay, refund, modify, delete, browse, retrieve, new, user, order, release, inventory, put on shelf, put off shelf, manage. And taking the determined keywords as search keywords for traversing each relation path in the knowledge graph and taking the search keywords as starting points.

And screening search keywords from the extracted various entities according to the use frequency and the scene parameters to obtain a keyword set which is used for keyword matching with the function points in the text to be processed to determine whether the search keywords are contained.

Each of the identified function points in the text to be processed and the set of keywords are matched to determine whether the identified function points are search keywords for further determining starting nodes traversing the function point knowledge graph according to the search keywords.

For example, specifically, starting from the first position of the identified function point in the text to be processed, a text matching method is used to determine whether the function point is a search keyword. For example, input "add order", first, match all search keywords (e.g., the first "query" with length of 2), then starting with the first word (or word) in "add order", pick text with length of 2 to get "add", match with the first keyword in all keywords, and match each keyword in order. And when the matching is successful, determining the matched keywords. And when the matching is not successful, determining that the identified function point is not a search related word.

For example, "add" is not equal to "query" and so does not match the search keyword "query"; then, the second search keyword 'adding' is matched, at the moment, the functional point 'adding' in the input is successfully matched with the search keyword 'adding', and the identified functional point 'adding' is determined to be the search keyword. The above method is used to further determine that the identified function point "order" is a search keyword. Thus, determining the identified function point includes searching for keywords.

As shown in fig. 13, the text to be processed is exemplified as "add commodity".

When the "added commodity" is received, determining whether the "added commodity" contains keywords, wherein the "added commodity" contains the "added" and the "commodity", and determining that the identified functional point contains the search keywords "added" and the "commodity" by using the text matching method. The starting node of traversing the functional point knowledge graph is determined according to "add", "commodity" (i.e. search keyword), such as using the starting point corresponding to "add commodity" in fig. 13. Starting from the starting point, traversing the functional point knowledge graph.

When the identified functional point is a search keyword, executing a step of traversing a relation path between directed entity nodes in the functional point knowledge graph, and searching all reachable entity nodes based on a BFS search algorithm in the process of traversing the functional point knowledge graph to establish a knowledge graph node queue, wherein the establishment of the knowledge graph node queue comprises the following steps: and updating the entity nodes in the knowledge graph node queue.

Specifically, when the identified functional point is a search keyword, determining a starting node traversing the functional point knowledge graph according to the search keyword so as to start executing the step of traversing the relation path between the directed entity nodes in the functional point knowledge graph.

When the identified function point is a search keyword, a step of traversing a relation path between directed entity nodes in the function point knowledge graph is executed, as shown in fig. 14, and specifically includes the following steps.

Step S1201: and determining functional point entity nodes corresponding to the search keywords in the functional point knowledge graph, and taking the determined functional point entity nodes as starting nodes.

Step S1202: and repeatedly executing the relation path which is traversed from the starting point to the functional point knowledge graph and contains the search keyword until all relevant relation paths are traversed.

Specifically, starting from the starting node, the adjacent entity node pointed by the starting node and the adjacent entity node pointed by the adjacent entity node until all the access of the reachable entity nodes is completed, judging whether each accessed entity node can be added to the knowledge graph node queue one by one so as to establish the knowledge graph node queue.

In an alternative embodiment, when the identified function point is determined not to be a search keyword, the function point is matched with synonyms of all the search keywords, so as to determine whether the function point is a synonym of each search keyword again. And finally determining that the function point is a search keyword when the function point is determined to be a synonym of a certain search keyword.

In yet another example, upon receiving text to be processed, the contained function points are identified, and a starting node for performing traversal is determined from the identified function points to perform the step of traversing a relationship path between directed entity nodes in the knowledge graph of the function points. Specifically, the relation paths which start from the starting point and contain the search keywords in the functional point knowledge graph are repeatedly executed until all relevant relation paths are traversed. I.e. specifically to step S1202 in fig. 14. Since the steps in this example are substantially the same as step S1202 in fig. 14, the description of the same parts is omitted.

In order to further optimize the function point amplification method of the present invention, all the function points with high reliability can be obtained even in the case where the number of paths passed from one function point to the remaining function points is large. Starting from the credibility and application scene aspect of each relation path passed by the function points obtained by starting the function point amplification, the function point amplification method is further optimized.

In yet another example, the step of traversing a relationship path between directed entity nodes in the functional point knowledge graph is performed based on a BFS search algorithm. As shown in fig. 15, the following steps are specifically included.

Step S1301: based on the BFS search algorithm, all reachable entity nodes are searched.

For determining all reachable entity nodes, each entity node in the functional point knowledge graph corresponds to a trusted threshold, wherein the trusted threshold is used for determining whether the accessed entity node is a reachable entity node, calculating the accumulated weight value of each accessed entity node by using the following expression (14), and comparing the calculated accumulated weight value of each entity node with the trusted threshold of each entity node to determine whether each accessed entity node is a reachable entity node so as to establish a knowledge graph node queue:

PN _n N ₁ ＝w ₁ ·w ₂ w ₃ …w _n-1 (14)

wherein PN (Positive and negative) _n N ₁ Representing entity node N _n Is added to the accumulated weight value of the (a); n represents the number of edge relationships, in this example n is 10; w (w) ₁ ·w ₂ w ₃ …w _n-1 Representing slave entity node N ₁ Start to reach the entity node N _n The weight product of the n-1 paths experienced; w (w) ₁ Representing slave entity node N ₁ Starting the experienced weight of the 1 st edge; w (w) ₂ Representing slave entity node N ₁ Starting the experienced weight of the 2 nd edge; w (w) ₃ Representing slave entity sectionsPoint N ₁ Starting the experienced weight of the 3 rd edge; w (w) _n-1 Representing slave entity node N ₁ The weight of the n-1 th edge experienced, i.e. the weight of the last edge, starts.

The trusted threshold is obtained, for example, by statistical analysis of historical data of each specified field, or by expert setting, or the like.

Step S1302: and determining an updating parameter in real time for updating the entity nodes in the knowledge graph node queue.

The method comprises the steps of specifically calculating the accumulated weight value of each entity node in each relation path, comparing the accumulated weight value of each entity node with a preset threshold (i.e. a credible threshold), determining one relation path with the largest accumulated weight value (i.e. the largest relation path), specifically adding entity nodes with the calculated accumulated weight value larger than or equal to the preset threshold to a knowledge graph node queue, deleting entity nodes smaller than the preset threshold, adding entity nodes which are on the largest relation path and are not in the knowledge graph node queue to the knowledge graph node queue, and updating relevant entity nodes in the knowledge graph node queue in real time to further obtain the knowledge graph node queue comprising a plurality of entity nodes (i.e. comprising a function point set).

The cumulative weight value for each entity node is calculated using the expression (14) described above. For example, a weight value such as (0, 1) is given to the start node in each relationship path, and the cumulative weight value of each entity node is the product of the weight values of the edge relations passing through all relationship paths from the start node to each entity node.

It should be noted that, the cumulative weight value of a certain entity node represents the weight product (i.e., the cumulative weight value) of the relationship path traversed (accessed or searched) from the start node to the certain entity node (e.g., the entity node "modifies ILF"), and for the function point represented by the entity node (e.g., the entity node "modifies ILF"), the cumulative weight value represents the reliability degree of the function point, and the greater the cumulative weight value, the higher the reliability degree. When a certain entity node starts from the initial node and has two relation paths, two accumulated weight values are corresponding, and the relation path with the largest accumulated weight value is taken, namely the entity node on the largest relation path is taken, because the reliability of the functional point represented by the entity node on the largest relation path is high.

In an alternative embodiment, the scene parameters (i.e. the update parameters) are determined based on the text to be processed and the function points contained therein. Specifically identifying whether the text to be processed contains a scene identifier, and determining to update entity nodes in the knowledge graph node queue when determining that the text to be processed contains the scene identifier.

For example, according to the scene parameters (in particular, parameters related to electronic commerce, social media, game entertainment, etc., such as orders, user accounts, etc.), the credibility threshold of each entity node in the functional point knowledge graph is updated in real time, and values corresponding to the side relations corresponding to the scene parameters (in particular, scene identifications) (i.e., values corresponding to the side relations represented by α, μ, Φ, δ, ε, θ, ω, ζ, η, λ) are determined in real time, for example, by expert guidance, or determined according to the average value of historical data in a specified period of time, etc.). And then, judging whether the accessed related entity nodes are added to the knowledge graph node queue one by using the trust threshold value of each entity node updated in real time so as to establish the knowledge graph node queue.

It should be noted that, in other embodiments, the application scenario parameters and the number of all relationship paths related to the start node (i.e., the update parameters) are determined in real time. The foregoing is illustrative only and is not to be construed as limiting the invention.

In an alternative embodiment, before performing the step of traversing the relationship path between the directed entity nodes in the functional point knowledge graph, the triplet file corresponding to each triplet in the functional point knowledge graph is preprocessed to obtain all entity node sets (for example, using nodes to represent), and class labels corresponding to each entity class, and a name dictionary is constructed, which can be see table 4 specifically.

By repeatedly executing the relation paths which start from the starting point and contain the search keywords in the functional point knowledge graph until all relevant relation paths are traversed, the traversing step can be more effectively completed, and all relevant relation paths can be obtained more quickly.

And then, carrying out functional point amplification according to the established knowledge graph node queue.

And obtaining a new function point set according to the updated knowledge graph node queue, and outputting the new function point set to finish function point amplification.

Furthermore, the drawings are only schematic illustrations of processes involved in a method according to an exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily understood that the processes shown in the figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

The following are system embodiments of the present invention that may be used to perform method embodiments of the present invention. For details not disclosed in the system embodiments of the present invention, please refer to the method embodiments of the present invention.

Fig. 16 is a schematic structural view of an example of the functional point extraction processing system according to the present invention.

Referring to fig. 16, a second aspect of the disclosure provides a functional point extraction processing system 900, which adopts the functional point extraction processing method according to the first aspect of the present invention.

Specifically, the functional point extraction processing system 900 includes a building module 910, a first extraction module 920, a second extraction module 930, and a screening module 940.

The establishing module 910 is configured to collect a demand analysis text, a software design text and a system design text available in a public channel, obtain a data set, perform functional point labeling on sample data in the data set, and establish a knowledge base.

The first extraction module 920 matches the function points in the knowledge base with the text to be processed and the character strings according to the established knowledge base by adopting a KMP algorithm, so as to extract the function points contained in the text to be processed, and determine the category labels of the function points.

The second extraction module 930 is configured to automatically extract functional points from the text to be processed by using a pre-constructed automatic extraction model, input the text to be processed into the automatic extraction model, and output the functional points included in the text to be processed and the category to which each functional point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, the model parameters are optimized for a plurality of times.

The screening module 940 is configured to perform ranking evaluation calculation on the function points extracted based on the knowledge base and the function points extracted using the automatic extraction model to screen the function points.

In a specific embodiment, a function point label set is established, according to the function point label set, functional point labeling is performed on sample data with a specified proportion in the data set to obtain a first data set, and residual sample data in the data set is used for generating a pseudo label to obtain a second data set, wherein the function point label set comprises the following function point labels: a first type of tag using ILF, a second type of tag using EIF, a third type of tag using EI, a fourth type of tag using EO, a fifth type of tag using EQ.

Dividing the data set demand analysis text into a training set, a verification set and a test set according to a specific proportion; a preliminary training round and a retraining round are determined based on the number of first data sets and the number of second data sets.

In an alternative embodiment, the constructing an automatic extraction model based on the Bert-BiLSTM-CRF algorithm includes: constructing a Bert layer, a BiLSTM layer and a CRF layer to construct the automatic extraction model; according to the length n of the sample data in the training set, determining the dimension of the word vector to be generated, specifically, inputting the sample data with the length n into the Bert layer, generating a first vector with the dimension of n being a specific value, wherein the range of n is more than 0 and less than or equal to 512, and the range of the specific value is more than 700 and less than or equal to 800, preferably 768.

And then inputting the first vector into a BiLSTM layer to establish context connection among n vectors so as to obtain sequence semantic information corresponding to the text to be processed.

Then, the sequence semantic information obtained by the BiLSTM layer is input into the CRF layer, and the function points contained in the sample data and the category of the function point to which each function point belongs are output.

According to the determined preliminary training rounds, training the automatic extraction model by using the training set to obtain a preliminary automatic extraction model; and performing additional training on the preliminary automatic extraction model by using the second data set according to the determined retraining round.

Optimizing model parameters in the process of multiple model verification, and particularly comprises updating a preliminary training round and a retraining round according to the change conditions of accuracy and loss values in the training process.

Optimizing model parameters in the model test process specifically comprises updating retraining rounds according to the change conditions of accuracy and loss values in the retraining process.

In an alternative embodiment, a KMP algorithm is adopted to perform character string matching on the text to be processed and the function points in the knowledge base, so as to extract the function points included in the text to be processed, including: determining the lengths of the functional points and the character strings of the text sentences containing the functional points in the knowledge base, using the mode string representation, and constructing a next array; determining the character string length of the text to be processed, and representing the character string by using the text string; performing character string matching on each mode string in the knowledge base and the text string of the text to be processed one by one, and determining a matching failure position for determining a starting position in next matching; and extracting corresponding function points in the text to be processed when the pattern string in the knowledge base is successfully matched with the text string of the text to be processed.

Next, ranking evaluation calculation is performed on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model to screen the function points. Specifically, using the credibility of the functional points, performing vector conversion on the functional points extracted based on the knowledge base and the functional points extracted by using the automatic extraction model, inputting a pre-trained machine learning model, and outputting a credibility evaluation value; and sequencing the function points according to the output credible evaluation values to screen out the function points with the credible evaluation values larger than the appointed value.

Performing knowledge extraction and knowledge relation extraction on the function points and category labels extracted based on the knowledge base, and the function points and the categories which are extracted by using the automatic extraction model to form a function point triplet so as to construct a function point knowledge graph; the method comprises the steps of extracting knowledge relations according to internal relations between functional points representing different kinds and different operations in a text to be processed, and obtaining the following various relations for representing unidirectional or bidirectional edges between entity nodes: dependency, inheritance, aggregation, action, generalization, synonym, trigger, parallelism, interaction, coexistence.

And performing functional point expansion according to the constructed functional point knowledge graph.

Note that, since the content of the function point extraction processing method in this system embodiment is substantially the same as that in the method embodiment, the description of the same parts is omitted.

Functional point knowledge extraction and knowledge relation extraction are carried out on the existing functional point analysis text, so that functional point triples are formed to construct a functional point knowledge graph, and the functional point knowledge graph which is more accurate and comprises the functional point triples can be obtained; when a text to be processed is received, identifying the contained functional points, determining whether the identified functional points are search keywords or not to determine whether to execute the step of traversing the relation paths among the directed entity nodes in the functional point knowledge graph, and establishing a knowledge graph node queue in the process of traversing the functional point knowledge graph, so that the functional point is amplified according to the established knowledge graph node queue, the automatic amplification process of the functional points can be realized more quickly and more effectively, and the problem of functional point missing can be effectively avoided.

Fig. 17 is a schematic structural view of an embodiment of an electronic device according to the present invention.

As shown in fig. 17, the electronic device is in the form of a general purpose computing device. The processor may be one or a plurality of processors and work cooperatively. The invention does not exclude that the distributed processing is performed, i.e. the processor may be distributed among different physical devices. The electronic device of the present invention is not limited to a single entity, but may be a sum of a plurality of entity devices.

The memory stores a computer executable program, typically machine readable code. The computer readable program may be executable by the processor to enable an electronic device to perform the method, or at least some of the steps of the method, of the present invention.

The memory includes volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may be non-volatile memory, such as Read Only Memory (ROM).

Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for exchanging data between the electronic device and an external device. The I/O interface may be a bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

It should be understood that the electronic device shown in fig. 17 is only one example of the present invention, and the electronic device of the present invention may further include elements or components not shown in the above examples. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a man-machine interaction element such as a button, a keyboard, and the like. The electronic device may be considered as covered by the invention as long as the electronic device is capable of executing a computer readable program in a memory for carrying out the method or at least part of the steps of the method.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, as shown in fig. 18, the technical solution according to the embodiment of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several commands to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the embodiment of the present invention.

The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable storage medium can also be any readable medium that can communicate, propagate, or transport the program for use by or in connection with the command execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The computer-readable medium carries one or more programs (e.g., computer-executable programs) which, when executed by one of the devices, cause the computer-readable medium to implement the data interaction methods of the present disclosure.

Those skilled in the art will appreciate that the modules may be distributed throughout several devices as described in the embodiments, and that corresponding variations may be implemented in one or more devices that are unique to the embodiments. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and which includes several commands to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The exemplary embodiments of the present invention have been particularly shown and described above. It is to be understood that this invention is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. The functional point extraction processing method is characterized by comprising the following steps:

acquiring a demand analysis text, a software design text and a system design text which are available in a public channel, obtaining a data set, carrying out functional point labeling on sample data in the data set, and establishing a knowledge base;

according to the established knowledge base, performing character string matching on the function points in the knowledge base and the text to be processed by adopting a KMP algorithm to extract the function points contained in the text to be processed and determine class labels of the function points;

using a pre-constructed automatic extraction model to automatically extract functional points of a text to be processed, inputting the text to be processed into the automatic extraction model, and outputting the functional points contained in the text to be processed and the category to which each functional point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, optimizing model parameters for a plurality of times;

and performing sequencing evaluation calculation on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model so as to screen the function points.

2. The method for extracting and processing a functional point according to claim 1, wherein the constructing an automatic extraction model based on the Bert-BiLSTM-CRF algorithm includes:

Constructing a Bert layer, a BiLSTM layer and a CRF layer to construct the automatic extraction model;

according to the length n of sample data in a training set, determining the dimension of a word vector to be generated, specifically inputting the sample data with the length n into a Bert layer, generating a first vector with the dimension of n being a specific value, wherein the range of n is more than 0 and less than or equal to 512, and the range of the specific value is more than 700 and less than or equal to 800;

inputting the first vector into a BiLSTM layer, and processing to establish context relation among n vectors so as to obtain sequence semantic information corresponding to a text to be processed;

and inputting the sequence semantic information obtained by the BiLSTM layer into the CRF layer, and outputting the function points contained in the sample data and the category of the function points to which each function point belongs.

3. The method for extracting and processing a feature point according to claim 1, wherein,

establishing a functional point label set, carrying out functional point labeling on sample data with specified proportion in the data set according to the functional point label set to obtain a first data set, and using the residual sample data in the data set to generate a pseudo label to obtain a second data set, wherein the functional point label set comprises the following functional point labels:

A first type of tag using ILF, a second type of tag using EIF, a third type of tag using EI, a fourth type of tag using EO, a fifth type of tag using EQ.

4. The method for extracting and processing a function point according to claim 3, wherein,

dividing the data set demand analysis text into a training set, a verification set and a test set according to a specific proportion;

determining a preliminary training round and a retraining round according to the number of the first data sets and the number of the second data sets;

according to the determined preliminary training rounds, training the automatic extraction model by using the training set to obtain a preliminary automatic extraction model;

5. The method for extracting and processing a feature point according to claim 4, wherein,

optimizing model parameters in a multi-time model verification process, wherein the method specifically comprises the steps of updating a preliminary training round and a retraining round according to the change conditions of accuracy and loss values in the training process;

6. The method for extracting and processing function points according to claim 1, wherein the performing character string matching between the function points in the knowledge base and the text to be processed by using the KMP algorithm to extract the function points included in the text to be processed comprises:

determining the lengths of the functional points and the character strings of the text sentences containing the functional points in the knowledge base, using the mode string representation, and constructing a next array;

determining the character string length of the text to be processed, and representing the character string by using the text string;

performing character string matching on each mode string in the knowledge base and the text string of the text to be processed one by one, and determining a matching failure position for determining a starting position in next matching; and extracting corresponding function points in the text to be processed when the pattern string in the knowledge base is successfully matched with the text string of the text to be processed.

7. The method according to claim 1, wherein the sorting evaluation calculation is performed on the function points extracted based on the knowledge base and the function points extracted using the automatic extraction model to screen the function points, comprising:

vector conversion is carried out on the functional points extracted based on the knowledge base and the functional points extracted by using the automatic extraction model by using the credibility of the functional points, a pre-trained machine learning model is input, and a credibility evaluation value is output;

And sequencing the function points according to the output credible evaluation values to screen out the function points with the credible evaluation values larger than the appointed value.

8. The method for extracting and processing a feature point according to claim 1, wherein,

performing knowledge extraction and knowledge relation extraction on the function points and category labels extracted based on the knowledge base, and the function points and the categories which are extracted by using the automatic extraction model to form a function point triplet so as to construct a function point knowledge graph; the method comprises the steps of extracting knowledge relations according to internal relations between functional points representing different types and different operations in a text to be processed, and obtaining the following various relations for representing unidirectional or bidirectional edges between adjacent entity nodes in a functional point knowledge graph: dependency, inheritance, aggregation, action, generalization, synonym, trigger, parallelism, interaction, coexistence.

9. The method for extracting and processing a feature point according to claim 8, wherein,

10. A function point extraction processing system employing the function point extraction processing method according to any one of claims 1 to 9, characterized by comprising:

The system comprises a building module, a knowledge base and a data set, wherein the building module collects a demand analysis text, a software design text and a system design text which are available in a public channel to obtain the data set, carries out functional point labeling on sample data in the data set and builds the knowledge base;

the first extraction module is used for carrying out character string matching on the function points in the knowledge base and the text to be processed by adopting a KMP algorithm according to the established knowledge base so as to extract the function points contained in the text to be processed and determine the category labels of the function points;

the second extraction module is used for automatically extracting the function points of the text to be processed by using a pre-built automatic extraction model, inputting the text to be processed into the automatic extraction model, and outputting the function points contained in the text to be processed and the category to which each function point belongs; the method comprises the steps of constructing an automatic extraction model based on a Bert-BiLSTM-CRF algorithm; in the process of constructing the automatic extraction model, optimizing model parameters for a plurality of times;

and the screening module is used for carrying out sequencing evaluation calculation on the function points extracted based on the knowledge base and the function points extracted by using the automatic extraction model so as to screen the function points.