CN113158654B - Domain model extraction method and device and readable storage medium - Google Patents

Domain model extraction method and device and readable storage medium Download PDF

Info

Publication number
CN113158654B
CN113158654B CN202011301741.5A CN202011301741A CN113158654B CN 113158654 B CN113158654 B CN 113158654B CN 202011301741 A CN202011301741 A CN 202011301741A CN 113158654 B CN113158654 B CN 113158654B
Authority
CN
China
Prior art keywords
phrases
concepts
dependency
relationship
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011301741.5A
Other languages
Chinese (zh)
Other versions
CN113158654A (en
Inventor
杜佳诺
连小利
张莉
赵子岩
张航
樊志强
李华莹
刘必欣
张捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
CETC 15 Research Institute
Research Institute of War of PLA Academy of Military Science
Original Assignee
Beihang University
CETC 15 Research Institute
Research Institute of War of PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, CETC 15 Research Institute, Research Institute of War of PLA Academy of Military Science filed Critical Beihang University
Priority to CN202011301741.5A priority Critical patent/CN113158654B/en
Publication of CN113158654A publication Critical patent/CN113158654A/en
Application granted granted Critical
Publication of CN113158654B publication Critical patent/CN113158654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a domain model extraction method, a domain model extraction device and a readable storage medium, wherein the method comprises the following steps: carrying out syntactic analysis on the requirement document, and determining the dependency relationship among the participles; determining semantic relations among concepts according to the dependency relations among the participles; and determining a corresponding domain model according to the semantic relation between the concepts. The method comprises the steps of determining the dependency relationship among the participles in the required document; determining semantic relations among concepts according to the dependency relations among the participles; and determining a corresponding domain model according to the semantic relation between the concepts, thereby improving the extraction accuracy of the domain model.

Description

Domain model extraction method and device and readable storage medium
Technical Field
The invention relates to the technical field of natural language identification, in particular to a method and a device for extracting a domain model and a readable storage medium.
Background
The domain model is a visual representation of important concepts and their relationships in the domain and is used to analyze how to meet the functional requirements of the system during the analysis phase of software development. The domain model may be represented using UML class diagrams, usage diagrams, ontologies, etc., as desired. The domain model is mainly composed of concepts, attributes and relationships. Concepts represent entities or events in the real world, the attributes of the concepts are logical data contained in the entities represented by the concepts, various relationships among the concepts represent semantic connections or interactive behaviors existing between the entities represented by the concepts, and common relationships include incidence relationships, aggregation relationships, inheritance relationships and the like.
The domain model provides structured knowledge about the underlying terms that make up the domain. Also, the design of systems, particularly in model-based development environments, is often modeled around domain models. The method has the advantages that the concepts and the relations among the concepts are correctly identified, the system architecture can be analyzed in the software development process, the development difficulty is reduced, the redundancy of codes is reduced, and the problems of inconsistency, incompleteness and the like of the analysis requirements of developers can be solved. When a developer builds a domain model, the developer needs to repeatedly check the requirement document, ensure that the built domain model is consistent with the requirement, and ensure that all concepts and relationships related to the requirement are contained in the domain model. For large applications, it is a very difficult task to manually build a domain model.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting a domain model and a readable storage medium, which are used for improving the accuracy of extracting the domain model.
In a first aspect, an embodiment of the present invention provides a domain model extraction method, including:
carrying out syntactic analysis on the requirement document, and determining the dependency relationship among the participles;
determining semantic relations among concepts according to the dependency relations among the participles;
and determining a corresponding domain model according to the semantic relation between the concepts.
Optionally, the parsing the requirement document includes:
decomposing the requirement document to obtain corresponding word segmentation;
performing part-of-speech tagging based on the participles, and determining corresponding participle types according to part-of-speech tagging results;
determining a dependency relationship between the participles based on the participle types.
Optionally, after determining the corresponding word segmentation type according to the part-of-speech tagging result, the method further includes:
cleaning the word segmentation;
extracting word segmentation word stems in the cleaning result;
and restoring the word stem.
Optionally, determining a semantic relationship between concepts according to the dependency relationship between the participles includes:
traversing noun phrases in the participles, and determining dependency relationships between the phrases and words and between the phrases;
semantic relationships between concepts are extracted from the dependencies between phrases and words and between phrases.
Optionally, traversing the noun phrases, derivative phrases and dependencies between words and phrases in the participle includes:
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls into the current noun phrase, not deriving the current noun phrase;
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls outside the current noun phrase, deriving the current noun phrase.
Optionally, deriving the current noun phrase includes: if the derived words are source node words in noun phrases except the current noun phrase, deriving the dependency relationship between the obtained phrases, otherwise deriving the dependency relationship between the obtained phrases and the words.
Optionally, extracting semantic relationships between concepts according to the dependency relationships between phrases and words and between phrases, including:
extracting association relations among concepts according to the dependency relations among phrases and words and among phrases and according to source nodes corresponding to different syntactic structures; and the number of the first and second groups,
and matching the phrases and the dependency relations between the words and between the phrases according to a preset word structure, and identifying an aggregation relation, a cardinal number relation and an attribute relation between concepts.
Optionally, determining a corresponding domain model according to the association relationship between the concepts includes:
traversing boundary concepts in the association relationship between the concepts;
correcting the incidence relation of the boundary concepts matched with the preset field in the boundary concepts;
wherein the boundary concept is that only one other concept has a semantic relationship with the boundary concept.
In a second aspect, an embodiment of the present invention provides a domain model extraction apparatus, including:
the analysis unit is used for carrying out syntactic analysis on the requirement document and determining the dependency relationship among the participles;
the relation determining unit is used for determining semantic relation between concepts according to the dependency relation between the participles;
and the domain model determining unit is used for determining a corresponding domain model according to the semantic relation between the concepts.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the foregoing domain model extraction method.
The embodiment of the invention determines the dependency relationship among the participles in the requirement document; determining semantic relations among concepts according to the dependency relations among the participles; and determining the corresponding domain model according to the semantic relation between the concepts, thereby improving the extraction accuracy of the domain model and obtaining the positive technical effect.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a first embodiment of the present invention;
FIG. 2 is a flowchart of a syntax analysis according to a first embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus according to a second embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
A first embodiment of the present invention provides a domain model extraction method, as shown in fig. 1, including the following specific steps:
s101, performing syntactic analysis on the requirement document, and determining the dependency relationship among the participles;
s102, determining semantic relations among concepts according to the dependency relations among the participles;
s103, determining a corresponding domain model according to the semantic relation between the concepts.
The embodiment of the invention determines the dependency relationship among the participles in the requirement document; determining semantic relations among concepts according to the dependency relations among the participles; and determining a corresponding domain model according to the semantic relation between the concepts, thereby improving the extraction accuracy of the domain model.
Optionally, the parsing the requirement document includes:
decomposing the requirement document to obtain corresponding word segmentation;
performing part-of-speech tagging based on the participles, and determining corresponding participle types according to part-of-speech tagging results;
determining a dependency relationship between the participles based on the participle types.
Specifically, in this embodiment, the parsing includes preprocessing the requirement statement, including word segmentation, sentence segmentation, part-of-speech tagging, phrase structure analysis, and dependency parsing. In this embodiment, a main flow of parsing the input requirement document is shown in fig. 2, and includes the following steps:
sentence splitting: the input text is divided into individual sentences.
Word segmentation: the input sentence is divided into individual symbols. A symbol may be a word, a number, a punctuation, or a space.
Part of speech tagging: and marking the part of speech of the symbols obtained by the word segmentation device, such as noun (NN), Verb (VB), adjective (JJ), preposition (IN), article (DT), conjunctive (CC) and the like.
Phrase structure analysis: presume the type that each structural unit IN the sentence belongs to, such as Noun Phrase (NP), Verb Phrase (VP), Preposition Phrase (PP), Verb (VB), article (DT), preposition (IN), etc.
Dependency parsing: and analyzing to obtain grammatical relations among independent words in the sentence, wherein the grammatical relations are represented by dependency relations. Dependency parsing inputs are sentences and outputs a directed acyclic graph consisting of relational triplets, e.g., as represented by the triplet < word, dependency class, word >. According to the specification of the international dependency definition framework Universal Dependencies, the dependency categories in this embodiment mainly include: noun subject relation (nsubj), passive noun subject relation (nsubjass), direct object relation (dobj), adjective-form modifier (amod), nominal modifier (nmod), clause modifier of noun (acl), relational clause modifier (acl: relcl), and the like.
Where the nominal modifier (nmod) represents the prepositional phrase structure in the sentence. Clause modifiers (acl) of nouns represent complement structures in verb indefinite or participle form; the relational clause modifier (acl: relcl) represents a clause modification structure.
Optionally, after determining the corresponding word segmentation type according to the part-of-speech tagging result, the method further includes:
cleaning the word segmentation;
extracting word segmentation word stems in the cleaning result;
and restoring the word stem.
In this embodiment, after obtaining the word segmentation result, the parsing further includes removing stop words: stop words are words that frequently occur in text and do not have a specific meaning, such as "a", "the", "any", etc.
Stem extraction and morphology reduction: the complex form of noun, the participle form of verb, the form of adjective-adverb and the like are converted into the original forms of these words.
And extracting the phrases and verbs of the atomic nouns to prepare for further extracting the concepts and the relations of the domain model.
Optionally, determining a semantic relationship between concepts according to the dependency relationship between the participles includes:
traversing noun phrases in the participles, and determining dependency relationships between the phrases and words and between the phrases;
semantic relationships between concepts are extracted from the dependencies between phrases and words and between phrases.
Optionally, traversing the noun phrases, derivative phrases and dependencies between words and phrases in the participle includes:
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls into the current noun phrase, not deriving the current noun phrase;
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls outside the current noun phrase, deriving the current noun phrase.
Optionally, deriving the current noun phrase includes: if the derived words are source node words in noun phrases except the current noun phrase, deriving the dependency relationship between the obtained phrases, otherwise deriving the dependency relationship between the obtained phrases and the words.
Specifically, in this embodiment, based on the word segmentation obtained by the foregoing syntax analysis, the dependency relationship between the words obtained by the syntax analysis is further derived to obtain the phrase-level dependency relationship. Phrase-level dependencies may be represented as a relational triple < phrase, dependency type, phrase > or < phrase, dependency type, word >.
Pseudo code of the dependency derivation algorithm employed in the present embodiment is shown in table 1.
Table 1 dependency derivation algorithm
Figure BDA0002787052390000071
In this embodiment, the dependency derivation algorithm inputs all words, noun phrases, and dependencies among words obtained by parsing, and outputs dependencies among phrases and between phrases and words, and the specific process includes:
all noun phrases NP in the requirement document are examined:
token for each word in noun phrase NP1: if a dependency dep (token) is started by taking the word as a source node1,token2) The target node of (2) still falls within the noun phrase, then the dependency is not derived.
If the target node of the dependency falls outside the noun phrase, then derive the dependency dep:
if token2Is another noun phrase NP2The derived dependency dep is dep (NP, NP)2) Otherwise, deriving the dependency relationship dep as dep (NP, token)2)。
Thereby determining the dependency relationships between phrases and words and between phrases.
Optionally, extracting semantic relationships between concepts according to the dependency relationships between phrases and words and between phrases, including:
extracting association relations among concepts according to the dependency relations among phrases and words and among phrases and according to source nodes corresponding to different syntactic structures; and the number of the first and second groups,
and matching the phrases and the dependency relations between the words and between the phrases according to a preset word structure, and identifying an aggregation relation, a cardinal number relation and an attribute relation between concepts.
Specifically, in this embodiment, the semantic relationships between the concepts include association relationships, aggregation relationships, cardinality relationships, and attribute relationships. The association relation comprises a direct relation and an indirect relation, wherein the direct relation represents the relation that the concept and the concept are directly connected and represented by a verb or verb phrase (including a participle form or an indefinite form of the verb or verb phrase) or a preposition; the indirect relationship is the transfer of the direct relationship, and if there is a direct relationship between the concept A and the concept B and a direct relationship between the concept B and the concept C, there is an indirect relationship between the concept A and the concept C.
Based on the embodiment, extracting association relations among concepts according to source nodes corresponding to different syntactic structures according to dependency relations between phrases and words and between phrases includes: firstly, direct relations among the concepts are identified, and then indirect relations among the concepts are derived according to the direct relations, so that all incidence relations are obtained.
Specifically, extracting association relations between concepts according to source nodes corresponding to different syntactic structures according to dependency relations between phrases and words and between phrases, including:
regarding the direct relation represented by the structure of the subject-predicate object, the subject is used as the source concept of the relation, the object is used as the target concept of the relation, and the predicate-predicate object is used as the content of the relation.
For the relationship of the main subject and the predicate in the relational clause, according to acl: relcl dependency relationship, a noun phrase indicated by the subject that or which in the relational clause is found to be used as a source concept of the relationship, and an object and a predicate in the clause are respectively used as a target concept and content of the relationship.
For direct relationships represented by prepositional phrase structures, the nominal part-of-speech modifier (nmod) is used for extraction, and the pseudo-code of the extraction algorithm is shown in table 2.
TABLE 2 preposition phrase extraction Algorithm pseudo-code
Figure BDA0002787052390000091
Taking a set of all atomic noun phrases and verbs as input, checking whether each noun phrase or verb is a source node of an nmod dependency.
If the source node of the nmod dependency is a noun phrase, the noun phrase of the nmod dependency source node is used as the source concept of the relation, the noun phrase of the target node is used as the target concept of the relation, and a preposition is used as the content of the relation.
If the source node of the nmod dependency is a verb, the direct object of the verb is used as the source concept of the relation, the noun phrase of the nmod dependency target node is used as the target concept of the relation, and the preposition is used as the content of the relation.
For the direct relationships represented by the verbalized complement structure, extraction is performed using clause modifiers (acl) of nouns, and the pseudo-code of the extraction algorithm is shown in table 3.
TABLE 3 verbalization anaglyph extraction Algorithm pseudocode
Figure BDA0002787052390000101
Using the collection of all atomic noun phrases as input, check if each noun phrase is the source node of an acl dependency.
If the noun phrase is the source node of an acl dependency and the acl dependency target node is a transitive verb or verb phrase, the noun phrase of the acl dependency source node is taken as the source concept of the relationship and the object followed by the dependency target node verb or verb phrase is taken as the target concept of the relationship, the verb or verb phrase being the content of the relationship.
And deducing indirect relations among the concepts according to the extracted direct relations, thereby obtaining all association relations among the concepts.
And matching the phrases and the dependency relations between the words and between the phrases according to a preset word structure, and identifying an aggregation relation, a cardinal number relation and an attribute relation between concepts.
Embodiments of identifying aggregation relationships may include:
the aggregation relationship is expressed for word structures such as "continain", "include", "type of", and all lattice forms of nouns. Taking the statement "a contacts B" or "a's B" as an example, the aggregation relationship with the source concept being B and the target concept being a can be extracted.
Embodiments of identifying cardinality relationships may include:
the singular and plural forms of indefinite articles, ordinals, nouns in a word structure represent cardinal relationships.
If both the source concept and the target concept of an associative relationship are singular, the relationship is a one-to-one relationship.
If the source concept and the target concept of an associative relationship are both complex numbers, the relationship is a many-to-many relationship.
If the source concept of an associative relationship is singular and the target concept is plural, the relationship is a one-to-many relationship.
If the source concept of an associative relationship is plural and the target concept is singular, the relationship is a many-to-one relationship.
If a source concept or a target concept of an associative relationship is preceded by an explicit numerical modification, the number represents a cardinality relationship.
The specific implementation of the identification of the attribute relationship may include:
word structures in the form of "identified by", "retrieved by", etc. may represent attributes. Taking the statement "a is identified by B" as an example, B can be extracted as an attribute of the concept a.
Adjectives that modify a concept represent the attributes of the concept and are embodied in natural language as idioms or as a master system structure. A phrase represents the property of the noun phrase it modifies, and a phrase represents the property of a subject.
A default verb with adverb or complement modifiers represents an attribute. Taking The statement "The train arrives in The moving at 10 am." as an example, it can be inferred by The short verb "arrives" together with The following complement that The concept "train" should have an attribute "arrival time".
After the semantic relation among the concepts is obtained, the aggregation relation, the cardinal number relation and the attribute relation can be further distinguished, so that the identification accuracy of the three is improved.
Optionally, determining a corresponding domain model according to the association relationship between the concepts includes:
traversing boundary concepts in the association relationship between the concepts;
correcting the incidence relation of the boundary concepts matched with the preset field in the boundary concepts;
wherein the boundary concept is that only one other concept has a semantic relationship with the boundary concept.
The boundary concept in this embodiment means that if there is and only one other concept that has a relationship with this concept, this concept is called a boundary concept. And for the obtained semantic relations among the concepts, checking all the incidence relations containing the boundary concepts, and if the content of the incidence relations can match structures such as 'include in', 'including', 'containing' and the like, correcting the incidence relations into aggregation relations or attributes. For example, all boundary concepts of the domain model and the association relationship connecting the boundary concepts may be checked based on the existing domain model extraction result. If the specific content of the association can be matched with the similar meaning words of the mode representing the aggregation relationship, such as "contact", "include", etc., the association relationship is modified into the aggregation relationship or the attribute.
Compared with the extraction result of the field modeling expert, the method can extract 95% of the relation in the requirement document.
In conclusion, the method expands the extraction rule of the domain model, introduces various new dependency relationships and grammar structures for extracting the domain model, and can more comprehensively and accurately extract the information represented by the preposition phrase structure and the complement structure. The method also provides a boundary concept in the field model, and provides a method for checking the incidence relation containing the boundary concept, so that the accuracy of the incidence relation, the aggregation relation and the attribute identification can be improved.
Example two
A second embodiment of the present invention provides a domain model extraction apparatus, as shown in fig. 3, including:
the analysis unit is used for carrying out syntactic analysis on the requirement document and determining the dependency relationship among the participles;
the relation determining unit is used for determining semantic relation between concepts according to the dependency relation between the participles;
and the domain model determining unit is used for determining a corresponding domain model according to the semantic relation between the concepts.
The embodiment of the invention determines the dependency relationship among the participles in the requirement document; determining semantic relations among concepts according to the dependency relations among the participles; and determining a corresponding domain model according to the semantic relation between the concepts, thereby improving the extraction accuracy of the domain model.
EXAMPLE III
A third embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the domain model extraction method of the first embodiment.
In an alternative embodiment, the computer program when executed by a processor implements:
carrying out syntactic analysis on the requirement document, and determining the dependency relationship among the participles;
determining semantic relations among concepts according to the dependency relations among the participles;
and determining a corresponding domain model according to the semantic relation between the concepts.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. A domain model extraction method is characterized by comprising the following steps:
carrying out syntactic analysis on the requirement document, and determining the dependency relationship among the participles;
determining semantic relations among concepts according to the dependency relations among the participles;
determining a corresponding domain model according to the semantic relation between the concepts;
determining semantic relations between concepts according to the dependency relations between the participles, including:
traversing noun phrases in the participles, and determining dependency relationships between the phrases and words and between the phrases;
extracting semantic relations between concepts according to the dependency relations between phrases and words and between phrases;
traversing noun phrases in the participle, derivative phrases and dependencies between words and phrases, including:
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls into the current noun phrase, not deriving the current noun phrase;
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls outside the current noun phrase, deriving the current noun phrase;
deriving current noun phrases, including: if the derived words are source node words in noun phrases except the current noun phrase, deriving to obtain the dependency relationship between the phrases, otherwise deriving to obtain the dependency relationship between the phrases and the words;
extracting semantic relations between concepts according to the dependency relations between the phrases and the words and between the phrases, comprising:
extracting association relations among concepts according to the dependency relations among phrases and words and among phrases and according to source nodes corresponding to different syntactic structures; and the number of the first and second groups,
matching the phrases and the dependency relationships among the words and among the phrases according to a preset word structure, and identifying an aggregation relationship, a cardinal number relationship and an attribute relationship among concepts;
determining a corresponding domain model according to the incidence relation, comprising:
traversing boundary concepts in the association relationship between the concepts;
correcting the incidence relation of the boundary concepts matched with the preset field in the boundary concepts;
wherein the boundary concept is that only one other concept has a semantic relationship with the boundary concept.
2. The domain model extraction method of claim 1, wherein parsing the requirements document comprises:
decomposing the requirement document to obtain corresponding word segmentation;
performing part-of-speech tagging based on the participles, and determining corresponding participle types according to part-of-speech tagging results;
determining a dependency relationship between the participles based on the participle types.
3. The method of extracting a domain model according to claim 2, wherein after determining the corresponding segmentation type according to the part-of-speech tagging result, further comprising:
cleaning the word segmentation;
extracting word segmentation word stems in the cleaning result;
and restoring the word stem.
4. A domain model extraction device, comprising:
the analysis unit is used for carrying out syntactic analysis on the requirement document and determining the dependency relationship among the participles;
the relation determining unit is used for determining semantic relation between concepts according to the dependency relation between the participles;
the domain model determining unit is used for determining a corresponding domain model according to the semantic relation between the concepts;
determining semantic relations between concepts according to the dependency relations between the participles, including:
traversing noun phrases in the participles, and determining dependency relationships between the phrases and words and between the phrases;
extracting semantic relations between concepts according to the dependency relations between phrases and words and between phrases;
traversing noun phrases in the participle, derivative phrases and dependencies between words and phrases, including:
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls into the current noun phrase, not deriving the current noun phrase;
if the target node corresponding to the dependency relationship taking the word in the current noun phrase as the source node falls outside the current noun phrase, deriving the current noun phrase;
deriving the current noun phrase, including: if the derived words are source node words in noun phrases except the current noun phrase, deriving to obtain the dependency relationship between the phrases, otherwise deriving to obtain the dependency relationship between the phrases and the words;
extracting semantic relations between concepts according to the dependency relations between the phrases and the words and between the phrases, comprising:
extracting association relations among concepts according to the dependency relations among phrases and words and among phrases and according to source nodes corresponding to different syntactic structures; and the number of the first and second groups,
matching the phrases and the dependency relationships among the words and among the phrases according to a preset word structure, and identifying an aggregation relationship, a cardinal number relationship and an attribute relationship among concepts;
determining a corresponding domain model according to the incidence relation, comprising:
traversing boundary concepts in the association relationship between the concepts;
correcting the incidence relation of the boundary concepts matched with the preset field in the boundary concepts;
wherein the boundary concept is that only one other concept has a semantic relationship with the boundary concept.
5. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
CN202011301741.5A 2020-11-19 2020-11-19 Domain model extraction method and device and readable storage medium Active CN113158654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011301741.5A CN113158654B (en) 2020-11-19 2020-11-19 Domain model extraction method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011301741.5A CN113158654B (en) 2020-11-19 2020-11-19 Domain model extraction method and device and readable storage medium

Publications (2)

Publication Number Publication Date
CN113158654A CN113158654A (en) 2021-07-23
CN113158654B true CN113158654B (en) 2022-04-29

Family

ID=76882341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011301741.5A Active CN113158654B (en) 2020-11-19 2020-11-19 Domain model extraction method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN113158654B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076088A (en) * 1996-02-09 2000-06-13 Paik; Woojin Information extraction system and method using concept relation concept (CRC) triples
CN107885528A (en) * 2017-11-17 2018-04-06 东南大学 A kind of architecture mode modeling method based on body
CN109255017A (en) * 2018-08-23 2019-01-22 北京所问数据科技有限公司 A kind of real-time text viewpoint abstracting method based on syntax tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7899666B2 (en) * 2007-05-04 2011-03-01 Expert System S.P.A. Method and system for automatically extracting relations between concepts included in text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076088A (en) * 1996-02-09 2000-06-13 Paik; Woojin Information extraction system and method using concept relation concept (CRC) triples
CN107885528A (en) * 2017-11-17 2018-04-06 东南大学 A kind of architecture mode modeling method based on body
CN109255017A (en) * 2018-08-23 2019-01-22 北京所问数据科技有限公司 A kind of real-time text viewpoint abstracting method based on syntax tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Mining Requirements Knowledge from》;lian xiaoli 等;《2016 IEEE 24th International Requirements Engineering Conference (RE)》;20161205;全文 *
一种基于情感依存元组的简单句情感判别方法;周文等;《中文信息学报》;20170515(第03期);全文 *
石油勘探开发领域本体的构建方法研究;文必龙等;《计算机工程与应用》;20091201(第34期);全文 *

Also Published As

Publication number Publication date
CN113158654A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
US7765097B1 (en) Automatic code generation via natural language processing
Leopold et al. Supporting process model validation through natural language generation
US9652719B2 (en) Authoring system for bayesian networks automatically extracted from text
US10296584B2 (en) Semantic textual analysis
US9390087B1 (en) System and method for response generation using linguistic information
JP5536518B2 (en) Method, apparatus and computer for automatically extracting a system modeling metamodel language model for the system from the natural language specification of the system
Gulia et al. An efficient automated design to generate UML diagram from Natural Language Specifications
JP6676109B2 (en) Utterance sentence generation apparatus, method and program
Arora et al. Requirement boilerplates: Transition from manually-enforced to automatically-verifiable natural language patterns
Nguyen et al. Rule-based extraction of goal-use case models from text
Umber et al. NL-based automated software requirements elicitation and specification
CN113282762B (en) Knowledge graph construction method, knowledge graph construction device, electronic equipment and storage medium
CN110727803A (en) Text event extraction method and device
Roth et al. Parsing software requirements with an ontology-based semantic role labeler
CN112417846A (en) Text automatic generation method and device, electronic equipment and storage medium
Glass et al. A naive salience-based method for speaker identification in fiction books
KR102206742B1 (en) Method and apparatus for representing lexical knowledge graph from natural language text
CN113158654B (en) Domain model extraction method and device and readable storage medium
Wein et al. A fully automated approach to requirement extraction from design documents
CN116484809A (en) Text processing method and device based on artificial intelligence
CN107168950B (en) Event phrase learning method and device based on bilingual semantic mapping
JP5911931B2 (en) Predicate term structure extraction device, method, program, and computer-readable recording medium
Ackermann et al. Model [nl] generation: natural language model extraction
JP6586055B2 (en) Deep case analysis device, deep case learning device, deep case estimation device, method, and program
CN114154497A (en) Language disease identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant