CN112579444B

CN112579444B - Automatic analysis modeling method, system, device and medium based on text cognition

Info

Publication number: CN112579444B
Application number: CN202011437720.6A
Authority: CN
Inventors: 黄翰; 刘雨瑶; 王业超; 黄俊聪
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2024-05-07
Anticipated expiration: 2040-12-10
Also published as: CN112579444A

Abstract

The invention discloses an automatic analysis modeling method, a system, a device and a medium based on text cognition, wherein the method comprises the following steps: acquiring a document, screening sentences in the document, and acquiring sentences related to drawing element extraction; constructing a domain dictionary according to a predefined rule; carrying out semantic analysis on sentences related to drawing element extraction by adopting a dependency syntax analysis tool in combination with the constructed field dictionary to obtain a semantic analysis result; formulating a graph element extraction rule based on semantic analysis; extracting graph elements from semantic analysis results according to formulated extraction rules; and storing the extracted picture elements in a way of using a picture. The invention can realize an automatic demand analysis modeling method, and can automatically analyze corresponding Uml use cases according to the demand document, thereby improving the accuracy of demand analysis modeling and the efficiency of software development, and being widely applied to the fields of natural language processing and software engineering demand analysis.

Description

Automatic analysis modeling method, system, device and medium based on text cognition

Technical Field

The invention relates to the fields of natural language processing and software engineering demand analysis, in particular to an automatic analysis modeling method, system, device and medium based on text cognition.

Background

The software engineering demand analysis process is an important link of the software development process, and whether the user's real intention can be known accurately according to the demand document, and the identification of the user and the function demand of each subsystem is important. However, by the conventional manual recognition method, there are problems of inaccuracy, incompleteness and ambiguity. These problems are caused by the fact that the required documents are not written normally, natural language expressions are ambiguous, and different people understand the natural language expressions differently.

The existing automatic modeling method needs to write requirements according to a certain structure and sentence patterns in a requirement writing stage, has dead formats, has single sentence patterns, and is difficult to express complex and rich requirements. Each set of parsing rules can only solve one writing style of required document. When the sentence pattern is changed, the parsing rule also needs to be changed correspondingly, and the usability is low.

Therefore, by the method for automatically checking and analyzing the demand documents, whether the sentences in one demand document are related to drawing element extraction can be judged, uml drawing elements can be automatically analyzed, the problem that manual analysis is inaccurate and incomplete is avoided, and the efficiency of analyzing the demand documents is improved; meanwhile, the defect that the traditional automatic modeling method is not generalized enough is overcome.

Natural language processing is a theory and technique that uses machines to process human language. The natural language processing takes the language as a calculation object to study a corresponding algorithm, and the aim is to perform man-machine interaction with a machine system through the form of the natural language, so that more efficient and convenient information management is realized. The key to natural language processing is to let the computer "understand" the natural language.

For the method of automatically generating Uml use case graph (composed of graph elements), some automatic analysis tools exist at home and abroad at present. Most foreign parsing tools are only applicable to English and not to Chinese demand documents. The domestic automatic analysis tool can only analyze the structured demand document, and can not process the semi-structured or non-standard document.

Term interpretation:

Drawing elements: i.e., the elements necessary to generate the usage map, such as "user", "operation", etc.

Text recognition: the method is mainly realized by using a computer to automatically detect whether known knowledge point types exist in the text and the logic relation among the knowledge points. I.e., an analytical understanding of the words and sentences of text, a grasp of the logical relationships between sentences, and an understanding of the subject matter of the entire article.

Disclosure of Invention

In order to solve at least one of the technical problems existing in the prior art to a certain extent, the invention aims to provide an automatic analysis modeling method, system, device and medium based on text cognition.

The technical scheme adopted by the invention is as follows:

An automatic analysis modeling method based on text cognition comprises the following steps:

acquiring a document, screening sentences in the document, and acquiring sentences related to drawing element extraction;

constructing a domain dictionary according to a predefined rule;

carrying out semantic analysis on sentences related to drawing element extraction by adopting a dependency syntax analysis tool in combination with the constructed field dictionary to obtain a semantic analysis result;

formulating a graph element extraction rule based on semantic analysis;

extracting graph elements from semantic analysis results according to formulated extraction rules;

and storing the extracted picture elements in a way of using a picture.

Further, the filtering the sentences in the document to obtain sentences related to drawing element extraction includes:

acquiring a training set for training a model;

Training by bert tools according to the training set to obtain a classification model of short text classification;

Classifying sentences in the documents by adopting a classification model, screening classified results, and obtaining sentences related to drawing element extraction.

Further, the building the domain dictionary according to the predefined rule includes:

Collecting special terms, and acquiring classification information of each special term, wherein the classification information comprises synonym information, deformation information and simple description information;

Establishing a configuration file, recording the classification information in a json character string mode, and recording the information of each technical term according to a predefined format;

And according to the information recorded in the configuration file, using jieba word segmentation tools to segment the special terms, and constructing and obtaining a domain dictionary.

Further, the semantic analysis is performed on the sentence related to the extraction of the graph element by adopting the dependency syntax analysis tool to obtain a semantic analysis result, which comprises the following steps:

And carrying out semantic analysis on the sentences obtained by screening by adopting a hanlp dependency syntax analysis tool to obtain semantic analysis results, wherein the semantic analysis results comprise the main lattices and the active words in the sentences.

Further, the formulating the graph element extraction rule based on semantic analysis includes:

Analyzing the sentence which can be used as the components of the user diagram element and the operation diagram element according to the sentence described by the text related to the requirement;

And (3) carrying out an extraction method on each sentence pattern combination and the corresponding graph element of the combination, and compiling and formulating corresponding extraction rules.

Further, the extracting the graph element from the semantic analysis result according to the formulated extraction rule includes:

Storing the extracted picture elements in a usecase (actor, function) manner;

Wherein actor denotes a user in the use case diagram, and function denotes an operation that the user can perform.

Further, the storing the extracted drawing elements in a manner of a use case diagram includes:

And taking the system name of the subsystem corresponding to the graph element as a key, and converting the application graph element corresponding to the subsystem, namely all users and use cases contained in the subsystem, into json character strings as values for storage.

The invention adopts another technical scheme that:

An automatic analytical modeling system based on text cognition, comprising:

the sentence screening module is used for acquiring a document, screening sentences in the document and acquiring sentences related to drawing element extraction;

The dictionary construction module is used for constructing a domain dictionary according to a predefined rule;

The semantic analysis module is used for carrying out semantic analysis on sentences related to the extraction of the picture elements by adopting a dependency syntax analysis tool in combination with the constructed and obtained domain dictionary to obtain a semantic analysis result;

the rule making module is used for making a graph element extraction rule based on semantic analysis;

the element extraction module is used for extracting the graph elements from the semantic analysis result according to the formulated extraction rule;

and the storage module is used for storing the extracted picture elements in a way of using a picture.

The invention adopts another technical scheme that:

an automatic analysis modeling apparatus based on text cognition, comprising:

At least one processor;

At least one memory for storing at least one program;

The at least one program, when executed by the at least one processor, causes the at least one processor to implement the method described above.

The invention adopts another technical scheme that:

a storage medium having stored therein processor executable instructions which when executed by a processor are for performing the method as described above.

The beneficial effects of the invention are as follows: the invention can realize an automatic demand analysis modeling method, and can automatically analyze corresponding Uml use cases according to the demand document, thereby improving the accuracy of demand analysis modeling and the efficiency of software development.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and other drawings may be obtained according to these drawings without the need of inventive labor for those skilled in the art.

Fig. 1 is a schematic flow chart of an automatic analysis modeling method based on text cognition in an embodiment of the invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.

In the description of the present invention, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.

As shown in fig. 1, the present embodiment provides an automatic analysis modeling method based on text cognition, including, but not limited to, the following steps:

S1, screening each sentence in the whole required document. Classifying sentences in the document by adopting a classification model based on bert Chinese short text classification and sentence pattern screening, and judging whether each sentence is related to drawing element extraction.

S2, constructing a domain dictionary according to a predefined rule.

S3, using the verification result obtained in the step S1, extracting related sentences possibly related to the picture elements, and carrying out semantic analysis on the sentences by adopting a hanlp dependency syntax analysis tool based on the structured domain dictionary in the step S2 to obtain components such as a main lattice, a main verb and the like in the sentences.

And S4, formulating a generalized drawing element extraction rule based on semantic analysis.

S5, extracting the picture elements according to the picture element extraction rules formulated in the S4 and the dependency syntax analysis result obtained in the S3.

S6, storing the classified and integrated elements obtained in the S5 in a user graph mode.

Further as an optional embodiment, the document sentence screening step includes:

s11, collecting demand documents in the field of software engineering, and extracting elements in an application diagram corresponding to each document as test basis; the specific extraction method comprises the following steps: removing the map elements extracts irrelevant statements, such as "catalogs", "backgrounds", etc. And extracting the related sentence of the drawing element extraction and the corresponding application diagram.

S12, collecting other documents, such as news manuscripts, as negative samples of model training data;

S13, using bert tools, combining bert-base chinese with the training model, and extracting related sentences and other documents from the picture elements to train the short text classification model.

S14, collecting general sentence patterns of the sentence related to the drawing element extraction, for example, any sentence related to the drawing element extraction, and the sentence must include a 'main lattice', 'agent lattice', and the like. And (3) screening the classification result in the step (S13), and carrying out the next operation on the sentences conforming to the sentence patterns.

Further as an optional embodiment, constructing the domain dictionary includes:

S21, collecting common special terms in the field, synonyms, shorthand, deformation, simple description, part-of-speech and other information of each word according to expert knowledge (namely expert in the field of software engineering and the technical terms obtained according to working experience summary);

S22, a configuration file is established, and information is recorded in a json character string mode;

S23, inputting information of each technical term according to a predefined format.

S24, downloading jieba word segmentation kits, namely, transferring the words in the professional field dictionary into a user text file in a mode of < words and parts of speech >, wherein synonyms, shorthand and deformation of each word are stored as a single item. And then using jieba word segmentation tools to carry out word segmentation and part-of-speech tagging.

Further as an alternative embodiment, the semantic analysis comprises the steps of:

S31, carrying out the following steps on the word segmentation and part-of-speech tagging results of each sentence obtained in the step S1;

s32, adopting a hanlp dependency syntactic analysis tool to obtain agent relation incident relation and the like in the sentences;

Focusing on agent relationships therein, the subject of the general agent relationship would act as a user in the diagram element.

Regarding the relationship of events, objects of the relationship of events are generally the objects to be operated, such as a system and software.

Focusing on the root node, the root node will typically act as a user-specific operation.

For each pattern, a parsing manner is defined, i.e. modeling elements can be extracted from each pattern.

For example: "students can select lessons through the lesson selection system", the result of the semantic analysis result is:

AGT for students

Can be used for

By passing through

Course selection system

Select ROOT

Lesson PAT

Further as an optional embodiment, the parsing the statement Uml element step includes:

S41, analyzing the possible components of the user graph element and the operation graph element in the statement according to the expression described by the requirement; the analysis results cannot depend on specific words, for example, the analysis results cannot include "execute … … operations", and must all consist of various relationships or roles of the semantic analysis results.

S42, writing corresponding extraction rules for each possible sentence pattern combination and corresponding drawing element extraction mode.

For example: { AGT: user, root+pat: operation }

Further as an optional embodiment, the step of extracting the map element includes:

S51, circularly traversing the generalized graph element extraction rule obtained in the step S5 according to the analysis result of the dependency method of the sentence obtained in the step S4, matching the rule with the corresponding sentence pattern rule, and extracting possible graph elements according to the rule; the criterion for matching is that the sentence contains elements in the rule, i.e. the sentence is considered to match the rule. If a sentence matches a plurality of rules, the formula is as follows:

The sentence s is assumed to contain n elements (s ₁,s₂……s_n),C_R represents the number of elements in the rule R. F (s, R)) and the matching degree of the sentence s and the rule R. If a certain sentence matches a plurality of rules, the rule with high matching degree is selected for the next operation.

S52, extracting the picture elements in the drama according to the picture element extraction mode in the rule.

S53, storing in a manner of usecase (actor, function), where actor represents a user in the use case diagram, and function represents an operation that the user can perform.

Further as an alternative embodiment, storing in a manner of a use case graph includes:

S61, identifying Uml graph elements according to rules for the model matching result obtained in the step S5;

s62, integrating all cases and users under each subsystem according to the subsystem items obtained in the step S5.

S63, checking whether boundaries under the same subsystem are the same, if yes, judging whether the reasons of the different boundaries are only because the expressions are different, if not, selecting boundary names with more occurrence times, and simultaneously sending out a warning to indicate that the document possibly has the situation of wrong expressions.

S64, according to a predefined format, the system name of each subsystem is used as a key, the corresponding figure elements are used as values, and the values are converted into json character strings for storage.

In summary, the present embodiment is based on the requirement document in the software engineering field. Based on the way that the demand analyst manually analyzes the demand document, the demand analysis results may be inconsistent due to the different understanding of text by each person. According to the embodiment, an automatic demand analysis modeling method can be realized, and corresponding Uml use cases can be automatically analyzed according to the demand document, so that the accuracy of demand analysis modeling and the efficiency of software development are improved. The traditional automatic analysis modeling method requires that a required text is written according to a fixed sentence pattern and a fixed structure, so that the method is not flexible; and a large number of drawing element extraction rules need to be formulated, and when sentence patterns change, the rules need to be rewritten. In addition, the text cognition-based automatic analysis modeling method provided by the embodiment can analyze any style demand text and perform automatic modeling.

The embodiment also provides an automatic analysis modeling system based on text cognition, which comprises:

The text cognition-based automatic analysis modeling system can execute any combination implementation steps of the text cognition-based automatic analysis modeling method provided by the method embodiment of the invention, and has corresponding functions and beneficial effects.

The embodiment also provides an automatic analysis modeling device based on text cognition, which comprises:

At least one processor;

At least one memory for storing at least one program;

The text cognition-based automatic analysis modeling device provided by the embodiment of the invention can be used for executing any combination implementation steps of the text cognition-based automatic analysis modeling method provided by the embodiment of the method, and has the corresponding functions and beneficial effects.

Embodiments of the present application also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 1.

The embodiment also provides a storage medium which stores instructions or programs capable of executing the text cognition-based automatic analysis modeling method provided by the embodiment of the method, and when the instructions or programs are run, the method can execute any combination implementation steps of the embodiment of the method, and the method has corresponding functions and beneficial effects.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the foregoing description of the present specification, reference has been made to the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present application has been described in detail, the present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. An automatic analysis modeling method based on text cognition is characterized by comprising the following steps:

S1, acquiring a document, screening sentences in the document, and acquiring sentences related to drawing element extraction;

the step of screening the sentences in the document in the step S1 comprises the following steps:

S11, collecting demand documents in the field of software engineering, and extracting elements in an application diagram corresponding to each document as test basis;

S12, collecting other documents as negative samples of model training data;

s13, using bert tools, combining bert-base chinese with the training model, and extracting related sentences from the picture elements to train a short text classification model;

s14, screening classification results, and performing next operation on sentences conforming to sentence patterns;

s2, constructing a domain dictionary according to a predefined rule;

S3, carrying out semantic analysis on sentences related to the extraction of the picture elements by adopting a dependency syntax analysis tool in combination with the constructed domain dictionary to obtain a semantic analysis result;

s4, formulating a graph element extraction rule based on semantic analysis;

s5, extracting graph elements from semantic analysis results according to the formulated extraction rules;

The step S5 includes:

s51, circularly traversing the obtained graph element extraction rule for the obtained semantic analysis result to match the corresponding sentence pattern rule, and extracting the graph element according to the rule; wherein the matching criteria are: if the sentence contains the element in the rule, the sentence is considered to be matched with the rule; if the sentence matches a plurality of rules, selecting the rule with high matching degree for the next operation;

s52, extracting the picture elements in the sentence according to the picture element extraction mode in the rule;

s53, storing in a usecase (actor, function) manner, wherein actor represents a user in the use case diagram, and the function represents an operation which can be performed by the user;

s6, storing the extracted picture elements in a user diagram mode;

The step S6 includes:

s62, integrating all use cases and users under each subsystem according to the subsystem items obtained in the step S5;

S63, checking whether boundaries under the same subsystem are the same, if yes, judging whether the reasons of the different boundaries are only because of the different expressions, if not, selecting boundary names with more occurrence times, and simultaneously sending out a warning to indicate that the document possibly has the situation of wrong expressions;

2. The text recognition-based automatic analysis modeling method of claim 1, wherein the constructing the domain dictionary according to the predefined rule comprises:

3. The text-aware automatic analysis modeling method of claim 1, wherein the semantic analysis of the sentence related to the extraction of the graph element using the dependency syntax analysis tool to obtain the semantic analysis result comprises:

4. A text-based cognitive automatic analytical modeling system for performing the text-based cognitive automatic analytical modeling method of any of claims 1-3, comprising:

5. An automatic analysis modeling apparatus based on text cognition, comprising:

At least one processor;

At least one memory for storing at least one program;

The at least one program, when executed by the at least one processor, causes the at least one processor to implement a text-aware based automatic analytical modeling method of any of claims 1-3.

6. A storage medium having stored therein a processor executable program, which when executed by a processor is adapted to carry out the method of any one of claims 1-3.