CN112579444A

CN112579444A - Text cognition-based automatic analysis modeling method, system, device and medium

Info

Publication number: CN112579444A
Application number: CN202011437720.6A
Authority: CN
Inventors: 黄翰; 刘雨瑶; 王业超; 黄俊聪
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-30

Abstract

The invention discloses a text cognition-based automatic analysis modeling method, a text cognition-based automatic analysis modeling system, a text cognition-based automatic analysis modeling device and a text cognition-based automatic analysis modeling medium, wherein the method comprises the following steps: acquiring a document, screening sentences in the document, and acquiring sentences related to pixel extraction; constructing a domain dictionary according to a predefined rule; performing semantic analysis on sentences related to pixel extraction by adopting a dependency syntax analysis tool in combination with the constructed field dictionary to obtain a semantic analysis result; formulating a primitive extraction rule based on semantic analysis; extracting graph elements from semantic analysis results according to formulated extraction rules; and storing the pixel obtained by extraction according to the mode of the use graph. The invention can realize an automatic demand analysis modeling method, can automatically analyze the corresponding Uml use case diagram according to the demand document, thereby improving the accuracy of demand analysis modeling and the efficiency of software development, and can be widely applied to the fields of natural language processing and software engineering demand analysis.

Description

Text cognition-based automatic analysis modeling method, system, device and medium

Technical Field

The invention relates to the field of natural language processing and software engineering requirement analysis, in particular to an automatic analysis modeling method, system, device and medium based on text cognition.

Background

The software engineering requirement analysis process is an important link of the software development process, whether the real intention of a user can be known accurately according to a requirement document or not is important for identifying the user and the functional requirement of each subsystem. However, the traditional manual identification method has the problems of inaccuracy, incomprehension and ambiguity. These problems arise because the document writing is required to be non-standard, the natural language expression is ambiguous, and different people have different understandings.

The existing automatic modeling method needs to compile requirements according to a certain structure and sentence pattern in a requirement compiling stage, the format is rigid, the sentence pattern is single, and the complex and rich requirements are difficult to express. Each set of parsing rules can only solve the requirement document of one writing style. When the sentence pattern changes, the parsing rule also needs to change correspondingly, and the usability is low.

Therefore, by the method for automatically checking and analyzing the demand document, whether the statement in one demand document is related to the extraction of the primitive element can be judged, the Uml image element can be automatically analyzed, the problems of inaccurate and incomplete manual analysis are avoided, and the efficiency of analyzing the demand document is improved; meanwhile, the defect that the traditional automatic modeling method is not general enough is overcome.

Natural language processing is the theory and technique of processing human language using machines. The natural language processing researches a corresponding algorithm by taking a language as a calculation object, and aims to realize more efficient and convenient information management by performing man-machine interaction with a machine system in a natural language form. The key to natural language processing is to let the computer "understand" the natural language.

For the method of automatically generating the Uml use case diagram (consisting of primitive elements), at present, some automatic analysis tools exist at home and abroad. Most foreign parsing tools are only applicable to English and are not applicable to Chinese requirement documents. Domestic automatic analysis tools can only analyze structured demand documents, and cannot process semi-structured or non-standard documents.

Interpretation of terms:

drawing elements: i.e., elements necessary to generate a usage graph, such as "user," "operation," etc.

Recognizing texts: the method is mainly implemented by a computer to automatically detect whether known knowledge point types exist in texts and the logical relationship among the knowledge points. Namely, the analysis and understanding of words and sentences of the text, the mastering of logical relations among the sentences and the understanding of the subjects of the whole article.

Disclosure of Invention

In order to solve at least one of the technical problems in the prior art to a certain extent, the invention aims to provide a text cognition-based automatic analysis modeling method, system, device and medium.

The technical scheme adopted by the invention is as follows:

a text cognition-based automatic analysis modeling method comprises the following steps:

acquiring a document, screening sentences in the document, and acquiring sentences related to pixel extraction;

constructing a domain dictionary according to a predefined rule;

performing semantic analysis on sentences related to pixel extraction by adopting a dependency syntax analysis tool in combination with the constructed field dictionary to obtain a semantic analysis result;

formulating a primitive extraction rule based on semantic analysis;

extracting graph elements from semantic analysis results according to formulated extraction rules;

and storing the pixel obtained by extraction according to the mode of the use graph.

Further, the screening the statements in the document to obtain the statements related to the primitive pixel extraction includes:

acquiring a training set for training a model;

according to the training set, training by adopting a bert tool to obtain a short text classification model;

and classifying the sentences in the document by adopting a classification model, screening the classified result, and acquiring the sentences related to the pixel extraction.

Further, the constructing the domain dictionary according to the predefined rule includes:

collecting special terms, and acquiring classification information of each special term, wherein the classification information comprises synonym information, deformation information and simple description information;

establishing a configuration file, recording the classification information in a json character string mode, and inputting the information of each professional term according to a predefined format;

and according to the information input in the configuration file, performing word segmentation on the special term by using a jieba word segmentation tool, and constructing and obtaining a field dictionary.

Further, the performing semantic analysis on the sentence related to the primitive extraction by using the dependency syntax analysis tool to obtain a semantic analysis result includes:

and performing semantic analysis on the sentences obtained by screening by adopting a dependency syntax analysis tool of hand to obtain a semantic analysis result, wherein the semantic analysis result comprises a main event lattice and an active word in the sentences.

Further, the formulating semantic analysis-based primitive extraction rule includes:

analyzing the components which can be used as user diagram elements and operation diagram elements in the sentences according to the sentences described by the text related to the requirements;

and compiling and formulating corresponding extraction rules for each sentence pattern combination and the corresponding graph elements of the combination.

Further, the extracting graph elements from the semantic analysis result according to the formulated extraction rule includes:

storing the extracted graph elements in a mode of usecast (function);

wherein, the operator represents the user in the use chart, and the function represents the operation that the user can perform.

Further, the storing the primitive elements obtained by extraction according to the usage graph mode includes:

the system name of the subsystem corresponding to the graph element is used as a key, and the application graph element corresponding to the subsystem, namely all users and application cases contained in the subsystem are used as values and are converted into json character strings for storage.

The other technical scheme adopted by the invention is as follows:

a text-awareness based automated analytical modeling system, comprising:

the sentence screening module is used for obtaining a document, screening sentences in the document and obtaining sentences related to pixel extraction;

the dictionary construction module is used for constructing a field dictionary according to a predefined rule;

the semantic analysis module is used for combining the constructed field dictionary and adopting a dependency syntax analysis tool to perform semantic analysis on sentences related to the pixel extraction to obtain a semantic analysis result;

the rule making module is used for making a primitive extraction rule based on semantic analysis;

the element extraction module is used for extracting graph elements from the semantic analysis result according to the formulated extraction rule;

and the storage module is used for storing the extracted primitive elements according to the use case diagram mode.

The other technical scheme adopted by the invention is as follows:

an automatic analysis modeling apparatus based on text recognition, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The other technical scheme adopted by the invention is as follows:

a storage medium having stored therein processor-executable instructions for performing the method as described above when executed by a processor.

The invention has the beneficial effects that: the invention can realize an automatic demand analysis modeling method, and can automatically analyze the corresponding Uml use case diagram according to the demand document, thereby improving the accuracy of demand analysis modeling and the efficiency of software development.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an automatic analysis modeling method based on text recognition in an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1, the present embodiment provides an automatic analysis modeling method based on text recognition, which includes, but is not limited to, the following steps:

and S1, screening each sentence in the whole requirement document. And classifying the sentences in the document by adopting a classification model based on bert Chinese short text classification and sentence pattern screening, and judging whether each sentence is related to pixel extraction.

And S2, constructing a domain dictionary according to the predefined rule.

S3, using the check result obtained in step S1, for the sentence that is likely to be related to primitive extraction, semantic analysis is performed on the sentence by using the dependency syntax analysis tool of hanlp based on the domain dictionary constructed in step S2, and components such as the main lattice, the main verb, and the like in the sentence are obtained.

And S4, making a generalized primitive extraction rule based on semantic analysis.

And S5, extracting the primitive according to the primitive element extraction rule formulated in S4 and the dependency syntax analysis result obtained in S3.

And S6, storing the elements which are obtained in the S5 and are subjected to classification and integration according to a use case diagram mode.

Further as an optional implementation, the document statement filtering step includes:

s11, collecting requirement documents in the field of software engineering, and extracting elements in the application diagram corresponding to each document to serve as a test basis; the specific extraction method comprises the following steps: removing the graph elements extracts irrelevant statements such as "directory", "background", etc. And extracting the related sentences and the corresponding application graphs of the graph elements.

S12, collecting other documents, such as news releases, as negative samples of model training data;

and S13, training the short text classification model by using a bert tool and combining a bert-base Chinese with the training model and extracting relevant sentences and other documents by using primitive elements.

S14, collecting general sentence patterns of the related sentences for primitive element extraction, such as all the sentences related to primitive element extraction, which must include a main event lattice, a schlieren lattice and the like. And screening the classification result in the S13, and carrying out the next operation on the sentence which accords with the sentence pattern.

Further as an optional implementation, constructing the domain dictionary includes:

s21, collecting special terms commonly used in the field and information such as synonyms, shorthand, deformation, simple description and parts of speech of each word according to expert knowledge (namely experts in the field of software engineering and professional terms obtained by summarizing work experience);

s22, establishing a configuration file, and recording information in a json character string mode;

s23, entering information of each term in a predefined format.

S24, downloading a jieba word segmentation toolkit, and transferring the words in the professional field dictionary into a user' S next file according to the mode of < words and parts of speech >, wherein synonyms, abbreviations and deformations of each word are stored as a single item. And then performing word segmentation and part-of-speech tagging by using a jieba word segmentation tool.

As a further optional implementation, the semantic analysis includes the steps of:

s31, performing the following steps on the segmentation and part-of-speech tagging results of each sentence obtained in the step S1;

s32, obtaining the relation of affairs and affairs in the sentence by adopting a dependency syntax analysis tool of hand;

focusing on the implementation relationships, the subject of the general implementation relationships can be used as the user in the graph element.

Regarding the relationship of affairs, the object of the relationship of affairs is generally the object to be operated, such as a certain system or a certain software.

Focusing on the root node, the root node is generally used as an operation to be implemented by a user.

For each schema, its analytic style, i.e., the modeling elements that can be extracted from each schema, is defined.

For example: the student can select courses through the course selection system, and the result of the semantic analysis result is as follows:

student AGT

Can be used for

By passing

Course selection system

ROOT selection

Lesson PAT

Further as an optional implementation, the parsing step of the element of the statement Uml includes:

s41, analyzing components possibly serving as user diagram elements and operation diagram elements in the sentence according to the expression described by the requirement; the analysis result cannot depend on a specific word, for example, the analysis result cannot contain "execute … … operation", and must be composed of various relations or roles of the semantic analysis result.

And S42, writing corresponding extraction rules for each possible sentence combination and corresponding drawing element extraction mode.

For example: { AGT: user, ROOT + PAT: operation }

As a further optional implementation, the step of extracting primitive elements includes:

s51, circularly traversing the generalized primitive extraction rule obtained in the step S5 according to the dependency analysis result of the statement obtained in the step S4, matching the generalized primitive extraction rule with a corresponding sentence pattern rule, and extracting possible graph elements according to the rule; the matching criterion is that the sentence contains the elements in the rule, i.e. the sentence is considered to be matched with the rule. If a statement matches multiple rules, according to the formula:

suppose that sentence s contains n elements(s)₁,s₂……s_n)，C_RRepresenting the number of elements in the rule R. The matching degree of the F (s, R) sentence s and the rule R. And if a certain statement matches with a plurality of rules, selecting the rule with high matching degree to perform the next operation.

And S52, extracting the graphic element in the scenario according to the graphic element extraction mode in the rule.

S53, storing the data in a manner of usecast (action, function), where the action represents the user in the use case diagram, and the function represents the operation that the user can perform.

Further as an optional implementation, the storing according to the usage graph includes:

s61, identifying Uml graph elements according to rules for the model matching result obtained in the step S5;

s62, according to the subsystem items obtained in the step S5, all use cases under each subsystem are integrated, and users are integrated.

And S63, checking whether the boundaries under the same subsystem are the same or not, if not, firstly judging whether the reasons of the different boundaries are only different due to different expressions or not, if not, selecting the boundary name with more occurrence times, and simultaneously giving out a warning to indicate that the document possibly has the condition of wrong expression.

And S64, according to a predefined format, using the key value pair mode to store the system name of each subsystem as a key and the corresponding application diagram element as a value in a json character string conversion mode.

In summary, the present embodiment is based on the requirement document in the field of software engineering. Based on the way in which the demand analyst manually analyzes the demand document, the demand analysis results may be inconsistent due to the different understanding of the text by each person. According to the embodiment, an automatic demand analysis modeling method can be realized, and a corresponding Uml use case diagram can be automatically analyzed according to a demand document, so that the accuracy of demand analysis modeling and the efficiency of software development are improved. The traditional automatic analysis modeling method requires that a requirement text must be written according to a fixed sentence pattern and a fixed structure, and is not flexible enough; and a large number of drawing element extraction rules need to be formulated, and when the sentence pattern changes, the rules need to be rewritten. In addition, the automatic analysis modeling method based on text recognition provided by the embodiment can analyze the text with any style and perform automatic modeling.

The embodiment also provides an automatic analysis modeling system based on text cognition, which comprises:

the sentence screening module is used for obtaining the document, screening sentences in the document and obtaining sentences related to pixel extraction;

The automatic analysis modeling system based on text cognition can execute the automatic analysis modeling method based on text cognition provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

The embodiment further provides an automatic analysis modeling device based on text cognition, which includes:

at least one processor;

at least one memory for storing at least one program;

The automatic analysis modeling device based on text cognition can execute the automatic analysis modeling method based on text cognition provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

The embodiment also provides a storage medium, which stores instructions or programs capable of executing the automatic analysis modeling method based on text recognition provided by the embodiment of the method of the invention, and when the instructions or the programs are run, the steps can be implemented by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An automatic analysis modeling method based on text cognition is characterized by comprising the following steps:

constructing a domain dictionary according to a predefined rule;

formulating a primitive extraction rule based on semantic analysis;

2. The automatic analysis modeling method based on text cognition according to claim 1, wherein the screening the sentences in the document to obtain the sentences related to primitive pixel extraction comprises:

acquiring a training set for training a model;

3. The automatic analysis modeling method based on text cognition according to claim 1, wherein the building of the domain dictionary according to the predefined rule comprises:

4. The method as claimed in claim 1, wherein the semantic analysis of the sentence related to primitive extraction by using the dependency parsing tool to obtain the semantic analysis result comprises:

5. The method according to claim 1, wherein the formulating semantic analysis-based primitive extraction rules comprises:

6. The method according to claim 1, wherein extracting graph elements from semantic analysis results according to established extraction rules comprises:

storing the extracted graph elements in a mode of usecast (function);

7. The automatic analysis modeling method based on text cognition according to claim 1, characterized in that the storing of the extracted primitive elements in the manner of a usage graph comprises:

8. An automated analytical modeling system based on text recognition, comprising:

9. An automatic analysis modeling device based on text cognition, characterized by comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a method for text awareness based automated analytical modeling according to any of claims 1-7.

10. A storage medium having stored therein a program executable by a processor, wherein the program executable by the processor is adapted to perform the method of any one of claims 1-7 when executed by the processor.