CN108717405B - The complementing method of the default subject of staircase design specification based on mind map - Google Patents

The complementing method of the default subject of staircase design specification based on mind map Download PDF

Info

Publication number
CN108717405B
CN108717405B CN201810349079.7A CN201810349079A CN108717405B CN 108717405 B CN108717405 B CN 108717405B CN 201810349079 A CN201810349079 A CN 201810349079A CN 108717405 B CN108717405 B CN 108717405B
Authority
CN
China
Prior art keywords
sentence
subject
mind map
staircase
design specification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810349079.7A
Other languages
Chinese (zh)
Other versions
CN108717405A (en
Inventor
朱磊
陈晨
姚全珠
黑新宏
赵钦
陈毅
杨明松
王一川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201810349079.7A priority Critical patent/CN108717405B/en
Publication of CN108717405A publication Critical patent/CN108717405A/en
Application granted granted Critical
Publication of CN108717405B publication Critical patent/CN108717405B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供了一种基于思维导图的楼梯设计规范缺省主语的补全方法。通过思维导图中本体间的继承关系,可以确定复句中句法结构不完整的从句缺失的主语并进行补齐,并且可以提取兼语句和连谓句中的主语‑谓语‑宾语,为能够更好的进行自然语言处理,最终构建楼梯建筑规范的知识图谱,实现自动审图;本发明有效的规避了在审图过程中可能出现的不确定因素,误检率低,操作简单,节约人力,极大地提高了建筑行业工程项目的完成效率。

The invention provides a method for complementing the default subject of a staircase design specification based on a mind map. Through the inheritance relationship between ontology in the mind map, it is possible to determine the missing subject of the subordinate clause with incomplete syntactic structure in the complex sentence and complete it, and to extract the subject-predicate-object in the concurrent sentence and the conjunction sentence, in order to be able to better Natural language processing is carried out, and finally the knowledge graph of the staircase building code is constructed to realize automatic drawing review; the present invention effectively avoids uncertain factors that may occur in the drawing review process, has low false detection rate, simple operation, saves manpower, and is extremely Greatly improved the completion efficiency of engineering projects in the construction industry.

Description

基于思维导图的楼梯设计规范缺省主语的补全方法A Completion Method for the Default Subject of Stair Design Specifications Based on Mind Map

技术领域technical field

本发明属于计算机自然语言处理技术技术领域,具体涉及基于思维导图的楼梯设计规范缺省主语的补全方法。The invention belongs to the technical field of computer natural language processing technology, and in particular relates to a method for complementing the default subject of a staircase design specification based on a mind map.

背景技术Background technique

随着计算机科学技术的发展,各个工程领域引进了知识图谱的概念,知识图谱能够描述显示世界的实体以及实体间的联系,提供了一种从关系的角度去分析问题的能力,能够实现对知识的共建、查询、共享和重用。目前使用知识图谱的行业领域包括医学、军事、教育等等。但建筑行业的信息化建设还处于起步阶段,一个建筑行业的工程项目首先需要设计院进行规划、设计,接着对设计好的模型和图纸进行审查,然后提供给施工单位进行施工,最后投入使用。精确的图纸和模型会极大程度地减少施工阶段因设计变更导致的返工窝工现象,但由于现阶段技术上的缺乏,在我国大部分的项目在设计之后都是由专家来进行人为地审图,将设计好的图纸与建筑行业的设计规范进行对比,这会产生误检率高、漏检率高、检验力度低、不确定因素多等问题。With the development of computer science and technology, various engineering fields have introduced the concept of knowledge graphs. Knowledge graphs can describe the entities in the world and the connections between entities, and provide an ability to analyze problems from the perspective of relationships. co-construction, query, sharing and reuse. Industry fields that currently use knowledge graphs include medicine, military, education, and more. However, the information construction of the construction industry is still in its infancy. A project in the construction industry first needs to be planned and designed by the design institute, and then the designed models and drawings are reviewed, and then provided to the construction unit for construction, and finally put into use. Accurate drawings and models will greatly reduce the phenomenon of rework caused by design changes in the construction stage. However, due to the lack of technology at this stage, most projects in our country are manually reviewed by experts after design. Comparing the designed drawings with the design specifications of the construction industry, this will cause problems such as high false detection rate, high missed detection rate, low inspection intensity, and many uncertain factors.

发明内容Contents of the invention

本发明的目的是提供一种基于思维导图的楼梯设计规范缺省主语的补全方法,解决了楼梯设计规范中部分语句主语不全难以识别的问题。The purpose of the present invention is to provide a method for complementing the default subject of the staircase design specification based on the mind map, which solves the problem that the subject of some sentences in the staircase design specification is incomplete and difficult to identify.

本发明一种基于思维导图的楼梯设计规范缺省主语的补全方法,通过对规范中文本进行补齐,能够更好的进行自然语言处理,最终构建楼梯建筑规范的知识图谱,实现自动审图,帮助解决了现有技术中存在的人为审图时误检率较高的问题。The invention provides a method for completing the default subject of the staircase design specification based on a mind map. By completing the text in the specification, it can better perform natural language processing, and finally build a knowledge map of the staircase construction specification to realize automatic review. The graph helps to solve the problem of high false detection rate in the prior art when people review the graph.

本发明所采用的技术方案是,基于思维导图的楼梯设计规范缺省主语的补全方法,包括以下步骤:The technical solution adopted in the present invention is a method for complementing the default subject of the staircase design specification based on the mind map, comprising the following steps:

步骤1:从住宅建筑设计规范中获取有关楼梯设计的规范作为处理的语料集,并且采用基于词典的正向最大匹配算法对原始文本进行分词,以及基于统计的方法对分词之后的词进行词性标注,得到预处理后的文本。Step 1: Obtain the specifications about staircase design from the residential building design specifications as the processing corpus, and use the dictionary-based forward maximum matching algorithm to segment the original text, and use the statistical method to tag the words after word segmentation , to get the preprocessed text.

步骤2:参照IFC标准中楼梯规范的描述格式,将与楼梯设计相关的本体及本体间关系梳理成思维导图,并构建相应的索引树。Step 2: Referring to the description format of the staircase specification in the IFC standard, comb the ontology related to the staircase design and the relationship between the ontology into a mind map, and construct the corresponding index tree.

步骤3:使用上下文无关语法把预处理后的文本进行语法解析,在遇到兼语句或连谓句的情况,通过在索引树中查找与宾语相关的本体来确定句中的句型成分,以此构建语法树,并分析一个句子的句法结构是否完整。最后从中过滤出缺失主语的语料。Step 3: Use context-free grammar to parse the preprocessed text. In the case of a concurrent sentence or a conjunction sentence, determine the sentence type components in the sentence by searching the ontology related to the object in the index tree. This builds a syntax tree and analyzes whether the syntactic structure of a sentence is complete. Finally, the corpus with missing subject is filtered out.

步骤4:通过在索引树中进行搜索,对具有不完整句型结构的宾语本体查找其父结点以及到根节点的唯一的一条路径,其父结点即为此句缺省的主语,除父结点外,此路径上的所有结点即为主语的修饰定语,后将缺省的主语添加至原语句,输出主谓宾完整的楼梯设计规范。Step 4: By searching in the index tree, find the parent node and the only path to the root node for the object ontology with incomplete sentence structure. The parent node is the default subject of the sentence, except Except the parent node, all the nodes on this path are the modifiers of the subject, and then add the default subject to the original sentence, and output the complete staircase design specification of the subject, predicate and object.

步骤1中:分词所采用的方法是基于词典的正向最大匹配算法。词性标注采用的是基于隐马尔可夫模型的词性标注方法。In step 1: the method adopted for word segmentation is the dictionary-based forward maximum matching algorithm. Part-of-speech tagging uses a part-of-speech tagging method based on a hidden Markov model.

步骤3中:在进行语法解析时根据词语的词性以及在句子中出现的位置来确定它在句中的成分。In step 3: when performing grammatical analysis, determine its component in the sentence according to the part of speech of the word and the position where it appears in the sentence.

步骤3中构建语法树的方法具体步骤如下:首先定义上下文无关文法G={N,∑,X,S}。其中N表示一组非叶子结点的标注;Σ表示一组叶子结点的标注,即组成句子的词;X表示一组句法的规则,即为N的产生式,每条规则可表示为X=Y1Y2...Yn,X∈N,Yi∈(N∪Σ);X中至少有一个产生式的α得由S充当。而S表示语法树开始的标注。The specific steps of the method for constructing the syntax tree in step 3 are as follows: first, define the context-free grammar G={N, Σ, X, S}. Among them, N represents a set of labels for non-leaf nodes; Σ represents a set of labels for leaf nodes, that is, the words that make up a sentence; X represents a set of syntactic rules, which is the production formula of N, and each rule can be expressed as X =Y 1 Y 2 ...Y n , X∈N, Y i ∈(N∪Σ); the α of at least one production in X must be acted by S. And S indicates the label at the beginning of the syntax tree.

采用自底向上的方法,从待分析的字符串开始,用待分析的字符串去匹配上下文无关文法规则X箭头的右部字符,匹配成功后替换为左部字符,直到S出现,语法树构建完毕。Using the bottom-up method, starting from the string to be analyzed, use the string to be analyzed to match the right character of the X arrow in the context-free grammar rule, and replace it with the left character after the match is successful, until S appears, and the syntax tree is constructed complete.

步骤4中:查找父结点的时候采用的是遍历算法。In step 4: the traversal algorithm is used when finding the parent node.

本发明的有益效果是:The beneficial effects of the present invention are:

一种基于思维导图的楼梯设计规范缺省主语的补全方法。通过思维导图中本体间的继承关系,可以确定复句中句法结构不完整的从句缺失的主语并进行补齐,并且可以提取兼语句和连谓句中的主语-谓语-宾语,以方便自然语言处理并构建知识图谱,最终帮助自动审图;本发明有效的规避了在人为审图过程中可能出现的不确定因素,误检率低,操作简单,节约人力,极大地提高了建筑行业工程项目的完成效率。A Completion Method for the Default Subject of Stair Design Specification Based on Mind Map. Through the inheritance relationship between ontology in the mind map, the missing subject of the subordinate clause with incomplete syntactic structure in the compound sentence can be determined and completed, and the subject-predicate-object in the concurrent sentence and the conjunction sentence can be extracted to facilitate natural language Process and construct the knowledge map, and finally help to automatically review the drawings; the present invention effectively avoids uncertain factors that may appear in the process of manual review, has low false detection rate, simple operation, saves manpower, and greatly improves the quality of engineering projects in the construction industry. completion efficiency.

附图说明Description of drawings

图1是本发明基于思维导图的楼梯设计规范缺省主语的补全方法的主要流程;Fig. 1 is the main flow of the completion method of the default subject of the stair design specification based on mind map in the present invention;

图2是参照IFC标准定义的思维导图模型。Figure 2 is a mind map model defined with reference to the IFC standard.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明基于思维导图的楼梯设计规范缺省主语的补全方法,包括以下步骤:The present invention is based on the method for complementing the default subject of the staircase design specification based on the mind map, comprising the following steps:

步骤1:从住宅建筑设计规范中获取有关楼梯设计的规范作为处理的语料集,并且采用基于词典的正向最大匹配算法对原始文本进行分词,以及基于隐马尔可夫模型的方法对分词之后的词串进行词性标注,得到预处理后的文本。Step 1: Obtain the specifications about staircase design from the residential building design specifications as the processing corpus, and use the dictionary-based forward maximum matching algorithm to segment the original text, and the method based on the hidden Markov model to segment the text Word strings are part-of-speech tagged to obtain the preprocessed text.

步骤2:参照IFC标准中楼梯规范的描述格式,将与楼梯设计相关的本体及本体间关系梳理成思维导图,并构建相应的索引树。Step 2: Referring to the description format of the staircase specification in the IFC standard, comb the ontology related to the staircase design and the relationship between the ontology into a mind map, and construct the corresponding index tree.

步骤3:使用上下文无关语法把预处理后的文本进行语法解析,在遇到兼语句或连谓句的情况,通过在索引树中查找与宾语相关的本体来确定句中的句型成分,以此构建语法树,并分析一个句子的句法结构是否完整。最后从中过滤出缺失主语的语料。Step 3: Use context-free grammar to parse the preprocessed text. In the case of a concurrent sentence or a conjunction sentence, determine the sentence type components in the sentence by searching the ontology related to the object in the index tree. This builds a syntax tree and analyzes whether the syntactic structure of a sentence is complete. Finally, the corpus with missing subject is filtered out.

步骤4:通过在索引树中进行搜索,对具有不完整句型结构的宾语本体查找其父结点以及到根节点的唯一的一条路径,其父结点即为此句缺省的主语,除父结点外,此路径上的所有结点即为主语的修饰定语,后将缺省的主语添加至原语句,输出主谓宾完整的楼梯设计规范。Step 4: By searching in the index tree, find the parent node and the only path to the root node for the object ontology with incomplete sentence structure. The parent node is the default subject of the sentence, except Except the parent node, all the nodes on this path are the modifiers of the subject, and then add the default subject to the original sentence, and output the complete staircase design specification of the subject, predicate and object.

步骤1中:分词所采用的方法是基于词典的正向最大匹配算法。词性标注采用的是基于隐马尔可夫模型的词性标注方法。In step 1: the method adopted for word segmentation is the dictionary-based forward maximum matching algorithm. Part-of-speech tagging uses a part-of-speech tagging method based on a hidden Markov model.

步骤3中:在进行语法解析时根据词语的词性以及在句子中出现的位置来确定它在句中的成分。In step 3: when performing grammatical analysis, determine its component in the sentence according to the part of speech of the word and the position where it appears in the sentence.

步骤3中构建语法树的方法具体步骤如下:首先定义上下文无关文法G={N,∑,X,S}。其中N表示一组非叶子结点的标注;Σ表示一组叶子结点的标注,即组成句子的词;X表示一组句法的规则,即为N的产生式,每条规则可表示为X=Y1Y2...Yn,X∈N,Yi∈(N∪Σ);X中至少有一个产生式的α得由S充当。而S表示语法树开始的标注。The specific steps of the method for constructing the syntax tree in step 3 are as follows: first, define the context-free grammar G={N, Σ, X, S}. Among them, N represents a set of labels for non-leaf nodes; Σ represents a set of labels for leaf nodes, that is, the words that make up a sentence; X represents a set of syntactic rules, which is the production formula of N, and each rule can be expressed as X =Y 1 Y 2 ...Y n , X∈N, Y i ∈(N∪Σ); the α of at least one production in X must be acted by S. And S indicates the label at the beginning of the syntax tree.

采用自底向上的方法,从待分析的字符串开始,用待分析的字符串去匹配上下文无关文法规则X箭头的右部字符,匹配成功后替换为左部字符,直到S出现,语法树构建完毕。Using the bottom-up method, starting from the string to be analyzed, use the string to be analyzed to match the right character of the X arrow in the context-free grammar rule, and replace it with the left character after the match is successful, until S appears, and the syntax tree is constructed complete.

步骤4中:查找父结点的时候采用的是遍历算法。In step 4: the traversal algorithm is used when finding the parent node.

由于汉语的文本是由词构成,而词与词之间没有明显的分界标志,所以处理文本的第一步是输入语料将其进行分词,本专利采用的分词方法是基于词典的正向最大匹配的算法:Since the Chinese text is composed of words, and there is no obvious dividing mark between words, the first step in processing the text is to input the corpus to segment it. The word segmentation method adopted in this patent is based on the positive maximum matching of the dictionary The algorithm:

首先从待切分的文本中取出一条字符串s1,并将楼梯规范中的本体组成词典,并构建哈希表。根据词典中定义的最大词长maxlen,从字符串s1的左边开始,取出长度不大于maxlen的子串w。并:在哈希表中搜索w是否是一个词,若是将w输出为一个词。若不是一个词,则从将w的尾部减去一个字,继续迭代地判断w是否在哈希表中,直至w为空或者s1为空。First, a string s 1 is taken from the text to be segmented, and the ontology in the staircase specification is formed into a dictionary, and a hash table is constructed. According to the maximum word length maxlen defined in the dictionary, starting from the left side of the string s 1 , extract a substring w whose length is not greater than maxlen. And: search whether w is a word in the hash table, if w is output as a word. If it is not a word, subtract a word from the end of w, and continue to iteratively judge whether w is in the hash table until w is empty or s 1 is empty.

为了确定句子的句法结构,就首先得确定句型成分,汉语中的句型成分由这个词的词性以及其在句子中出现的顺序位置决定它在句中作何成分。词性标注采用的是基于隐马尔可夫模型的方法:In order to determine the syntactic structure of a sentence, it is first necessary to determine the sentence composition. The sentence composition in Chinese is determined by the part of speech of the word and the order in which it appears in the sentence. Part-of-speech tagging uses a method based on the hidden Markov model:

此方法分为三个模块:Initialise,Induction,Back tracing the besttagging。This method is divided into three modules: Initialise, Induction, Back tracing the besttagging.

首先在Initialise步骤中统计每个词性出现在语料文本句首的概率,并乘上词性喷射出词的概率得到一个词的score分数。接着在Induction步骤中用viterbi算法计算每两个相邻出现的词的score分数,等于这个词性的初始score分数乘以词性间转换的概率乘以词性喷射到这个词的概率。从最终score分数中选择分数值大的词性值记录在Backpointer中。最后在步骤Back tracing the best tagging中进行从后往前的回溯,得到词性构成的序列串str2First, in the Initialize step, the probability of each part of speech appearing at the beginning of a sentence in the corpus is counted, and multiplied by the probability of the part of speech ejecting a word to obtain a word score. Then in the Induction step, the Viterbi algorithm is used to calculate the score of every two adjacent words, which is equal to the initial score of the part of speech multiplied by the probability of conversion between parts of speech and the probability of injecting the part of speech into this word. Select the part-of-speech value with a large score from the final score and record it in the Backpointer. Finally, in the step of Back tracing the best tagging, trace back from back to front to obtain the sequence string str 2 composed of parts of speech.

仅凭词性并不能确定这个句子的句型结构是否完整,还需通过构建语法树来进一步判断。构建语法树的方法具体步骤如下:The part of speech alone cannot determine whether the sentence structure of this sentence is complete, and it is necessary to further judge by constructing a grammar tree. The specific steps of the method for constructing the syntax tree are as follows:

首先定义如下所示的上下文无关语法:First define a context-free grammar as follows:

(a)N表示一组非叶子结点的标注,例如原始句子、名词短语、动词短语等。(a) N represents a set of labels for non-leaf nodes, such as original sentences, noun phrases, verb phrases, etc.

(b)Σ表示一组叶子结点的标注,即组成句子的词。(b) Σ represents the labeling of a set of leaf nodes, that is, the words that make up the sentence.

(c)X表示一组句法的规则,即为N的产生式,每条规则可表示为X=Y1Y2...Yn,X∈N,Yi∈(N∪Σ)。(c) X represents a set of syntactic rules, that is, a production of N, and each rule can be expressed as X=Y 1 Y 2 ...Y n , X∈N, Y i ∈(N∪Σ).

(d)S表示语法树开始的标注。(d) S indicates the label at the beginning of the syntax tree.

通过上下文无关语法对一个特定的句子定义其四元组,通过自底向上的推导可得到一个语法树。若是遇到兼语句或连谓句,谓词前具有两个或者两个以上的名词,无法判断主语成分时,需要通过思维导图来确定与宾语具有继承关系的本体,来确定这些名词中谁是主语而其他名词只是作定语,修饰限定了主语的范围。通过语法树可以容易的判断出句法结构不完整的语料集,将其提取出来。Define its quadruples for a specific sentence through context-free grammar, and a syntax tree can be obtained through bottom-up derivation. If there are two or more nouns before the predicate, and the subject cannot be judged, you need to use the mind map to determine the ontology that has an inheritance relationship with the object to determine who is the noun in these nouns. The subject and other nouns are only attributives, and the modification limits the scope of the subject. The corpus with incomplete syntactic structure can be easily judged and extracted through the syntax tree.

提取出缺失主语的复句之后,需要从子句中找出宾语,并由宾语在思维导图中的位置确定与之有继承关系的父节点。从而补齐其缺失的主语,该过程是输入宾语本体,输出与此宾语有直接继承关系的主语作为其主语,以及主语到根节点的唯一路径上的所有结点构成的定语短语,用来修饰限定主语的领域。After extracting the compound sentence without the subject, it is necessary to find the object from the clauses, and determine the parent node that has an inheritance relationship with it based on the position of the object in the mind map. In order to fill in the missing subject, the process is to input the object ontology, output the subject that has a direct inheritance relationship with this object as its subject, and an attributive phrase composed of all nodes on the unique path from the subject to the root node, which is used to modify Limit the domain of the subject.

附图1所示的为整个补全主语的流程。在分词、标记词性的同时,参照IFC标准将住宅楼梯的规范中本体的关系构建成思维导图模型,借助此模型将语料构建语法树,并从中过滤出句法结构不完整的语料集,通过查询作宾语成分的本体在思维导图中的继承关系,找到其父结点做主语,并且其到根节点的唯一路径是修饰限定这条规范的作用域。What shown in accompanying drawing 1 is the flow process of the whole completion subject. At the same time of word segmentation and part-of-speech marking, refer to the IFC standard to construct the relationship of ontology in the specification of residential stairs into a mind map model, use this model to construct a grammar tree from the corpus, and filter out the corpus with incomplete syntactic structure, through query The inheritance relationship of the ontology used as the object component in the mind map finds its parent node as the subject, and the only path from it to the root node is to modify and limit the scope of this specification.

根据基于词典的正向最大匹配的算法,首先从《住宅设计规范》中摘取住规4.1.2作为语料样本:“住宅楼梯梯段净宽度不应小于1.1m,一边设有栏杆时,不应小于1m。”。According to the forward maximum matching algorithm based on the dictionary, firstly, the residence rules 4.1.2 are extracted from the "Residential Design Code" as a corpus sample: "The net width of the residential stair section should not be less than 1.1m. When there is a railing on one side, no It should be less than 1m.".

从原文中取一条待切分字符串进行分词,此时s1=“住宅楼梯梯段净宽度不应小于1.1m,一边设有栏杆时,不应小于1m。”根据构造的词典确定maxlen为=10,s2初始化为空。将词典构造成哈希表。Take a character string to be segmented from the original text and perform word segmentation. At this time, s 1 = "The net width of the residential staircase should not be less than 1.1m, and if there is a railing on one side, it should not be less than 1m." Determine maxlen according to the constructed dictionary as =10, s 2 is initialized as empty. Construct a dictionary as a hash table.

从s1的左边选取长度不大于maxlen的子串w=“住宅楼梯梯段净宽度不”,判断w是否为空,不为空,判断w是否是哈希表中的一个词,遍历哈希表,未找到匹配项。将w的右边减少一个字,w=“住宅楼梯梯段净宽度”,继续迭代。直到w减少成“住宅”,在哈希表中查找成功,将住宅添加到s2中,s2=“住宅/”。s1=“楼梯梯段净宽度不应小于1.1m,一边设有栏杆时,不应小于1m。”迭代至s1为空时,此时输出s2=“住宅/楼梯/梯段/净宽度/不应/小于/1.1m/,/一边/设有/栏杆/时/,/不应/小于/1m/。”Select the substring w whose length is not greater than maxlen from the left side of s1 = "the net width of the residential staircase is not", judge whether w is empty or not, judge whether w is a word in the hash table, and traverse the hash table, no matches found. Decrease the right side of w by one word, w="clean width of residential stairs", and continue to iterate. Until w is reduced to "house", the search is successful in the hash table, and the house is added to s 2 , s 2 = "house/". s 1 = "The net width of the stair section should not be less than 1.1m, and if there is a railing on one side, it should not be less than 1m." When iterating until s 1 is empty, the output s 2 = "residential/stairs/section/net The width /should not/be less than /1.1m/, /one side/has/railing/when/, /should not/be less than /1m/.”

分词之后标记词的词性,标记词性的过程如算法2所示。首先人工标记一部分语料集的词性,然后用viterbi算法进行训练参数,通过机器学习对剩余的语料集进行自动标注,其中,对未登录词进行平滑处理,对标注后的未登录词进行平滑处理,将正确的语料添加到训练集中继续训练更加可靠的参数。最后输出的语料为{noun:住宅}{noun:楼梯}{noun:梯段}{noun:净宽度}{adv:不应}{v:小于}{num:1.1m},{adv:一边}{v:设有}{noun:栏杆}{adv:时},{adv:不应}{v:小于}{num:1m}。After the word segmentation, the part of speech of the word is marked, and the process of marking the part of speech is shown in Algorithm 2. First, the part of speech of a part of the corpus is manually marked, and then the viterbi algorithm is used to train the parameters, and the remaining corpus is automatically marked by machine learning. Among them, the unregistered words are smoothed, and the marked unregistered words are smoothed. Add the correct corpus to the training set to continue training more reliable parameters. The final output corpus is {noun: residence}{noun: stairs}{noun: stairs}{noun:net width}{adv:should not}{v:less than}{num:1.1m}, {adv:side} {v:with}{noun:railing}{adv:when}, {adv:should not}{v:less than}{num:1m}.

根据IFC标准定义的住宅楼梯设计规范中的本体间关系,参照此格式定义的思维导图模型如附图2所示。According to the inter-ontology relationship in the residential staircase design specification defined by the IFC standard, the mind map model defined with reference to this format is shown in Figure 2.

根据预处理输出的语料构建语法树,其中S表示句子;NP、VP、PP是名词、动词、介词短语(短语级别);A、V、P分别是副词、动词、介词;Noun是名词,Num是数词。定义上下文无关语法产生式如下所示:Construct a grammar tree based on the corpus output by preprocessing, where S represents a sentence; NP, VP, and PP are nouns, verbs, and prepositional phrases (phrase level); A, V, and P are adverbs, verbs, and prepositions respectively; Noun is a noun, and Num is a numeral. Define a context-free grammar production as follows:

1)S→NPVP1) S → NPVP

2)NP→NP|NounNoun|ε2) NP→NP|NounNoun|ε

3)VP→AVN|AVNA3) VP→AVN|AVNA

4)A→Adv4) A→Adv

5)V→Verb5) V→Verb

6)N→Noun|Num6) N→Noun|Num

7)Noun→住宅|楼梯|梯段|净宽度|栏杆7) Noun→residential|stairs|stairs|net width|railing

8)Adv→不应8) Adv → should not

9)Num→1.1m|1m9)Num→1.1m|1m

10)Verb→小于|一边|时|设有10) Verb→less than |one side|when|has

产生式X:α→β的定义需要满足下列条件:The definition of production X: α→β needs to meet the following conditions:

1)α可以是叶子结点和非叶子结点的任意标注,不能是ε;1) α can be any label of leaf nodes and non-leaf nodes, and cannot be ε;

2)β可以是叶子结点和非叶子结点的任意标注,可以是ε;2) β can be any label of leaf nodes and non-leaf nodes, and can be ε;

3)X中至少有一个产生式的α得由S充当。3) The α of at least one production in X must be acted by S.

采用自底向上的、基于规约的方法,从待分析的字符串开始,用待分析的字符串去匹配上下文无关文法规则箭头的右部字符,匹配成功后替换为左部字符,直到S。Using a bottom-up, specification-based method, starting from the string to be analyzed, use the string to be analyzed to match the right character of the arrow of the context-free grammar rule, and replace it with the left character after the match is successful until S.

将构建的语法树记为如下格式:Record the constructed syntax tree as the following format:

a)[S[NP[NP[Noun住宅][Noun楼梯]][NP[Noun梯段][Noun净宽度]]][VP[A[Adv不应]][V[Verb小于]][N[Num 1.1m]]]]a)[S[NP[NP[Noun residence][Noun stairs]][NP[Noun steps][Noun net width]]][VP[A[Adv should not]][V[Verb less than]][N [Num 1.1m]]]]

b)[S[NP[ε][VP[A[Adv一边]][V[Verb设有]][N[Noun栏杆]][A[Adv时]]]]b)[S[NP[ε][VP[A[Adv side]][V[Verb set]][N[Noun railing]][A[Adv]]]]

c)[S[NP[ε]][VP[A[Adv不应]][V[Verb小于]][N[Num 1m]]]]c)[S[NP[ε]][VP[A[Adv should not]][V[Verb less than]][N[Num 1m]]]]

分析句法结构,以第二个子句为例,以谓词为中心,谓词之前的分析主语和状语,谓词之后的分词宾语和补语,经过分析,可判断副词“一边”作状语,动词“设有”作谓语,名词“栏杆”作宾语,最后的副词“时”,代表这个子句在整个句子中作状语,但是这个子句是不完整的,如果不联系上下文,无法判断是什么“设有栏杆”。Analyze the syntactic structure, take the second clause as an example, center on the predicate, analyze the subject and adverbial before the predicate, and the participle object and complement after the predicate. After analysis, it can be judged that the adverb "side" is used as an adverbial, and the verb "have" As a predicate, the noun "railing" is used as an object, and the last adverb "shi" means that this clause is used as an adverbial in the whole sentence, but this clause is incomplete, and it is impossible to judge what is "with railings" without contacting the context ".

发现句法结构不完整的句子有s1=“一边设有栏杆时”,s2=“不应小于1m”。Sentences found to have an incomplete syntactic structure include s 1 = "when there is a railing on one side", and s 2 = "should not be less than 1m".

查找主语的过程以s1为例,首先在B索引树中查找结点“栏杆”,然后查找此结点到根节点的一条路径:找到栏杆的父结点为“梯段”,“梯段”的父结点为“楼梯”,“楼梯”的父结点为“住宅”,“住宅”的父结点为“建筑”。即此子句缺失的主语为梯段,将s2补充完整为“梯段的一边设有栏杆时”,其到根节点的路径为修饰主语的定语,即:“建筑的住宅的楼梯的”“梯段一边设有栏杆时”。由于定语在句型结构中不做成分,故可不进行补齐。The process of finding the subject takes s 1 as an example. First, search for the node "railing" in the B index tree, and then find a path from this node to the root node: find the parent node of the railing as "ladder section", "ladder section "'s parent node is "stairs", the parent node of "stairs" is "house", and the parent node of "house" is "building". That is to say, the missing subject of this clause is the stair section. Complete s2 as "when there is a railing on one side of the stair section", the path to the root node is the attributive that modifies the subject, namely: "the staircase of the building's residence""When there is a railing on one side of the ladder". Since attributives do not make components in the sentence structure, it is not necessary to complete them.

除了补全缺失的主语,在句型成分不明确的规范中,可以使用此方法确定主语。例如语料样本的第一个子句“住宅楼梯梯段净宽度不应小于1.1m”,在构建语法树时,发现并不缺失主语,但是从此规范中抽取三元组并不如主谓句那么容易,因为在不做语义分析的前提下,无法判断究竟是什么不应小于1.1m,即使用语义分析也并不容易。所以在分析此句的句法结构时得先确定主语。在思维导图中确定住宅、楼梯、梯段哪一个包含有净宽度这个属性,通过遍历索引树可确定是梯段做主语成分,住宅楼梯作定语,修饰限定主语的范围。In addition to filling in missing subjects, this method can be used to determine the subject in specifications where the composition of the sentence is unclear. For example, in the first clause of the corpus sample "the net width of the residential staircase should not be less than 1.1m", when constructing the syntax tree, it is found that the subject is not missing, but it is not as easy to extract triples from this specification as the subject-predicate sentence , because without semantic analysis, it is impossible to judge what should not be less than 1.1m, even using semantic analysis is not easy. Therefore, when analyzing the syntactic structure of this sentence, we must first determine the subject. In the mind map, determine which one of the house, stairs, and stairs contains the attribute of net width. By traversing the index tree, it can be determined that the stairs are the subject component, and the residential stairs are the attributive, modifying and limiting the scope of the subject.

最后将子句全部补充完整为:“住宅的楼梯的梯段的净宽度不应小于1.1m,梯段的一边设有栏杆时,梯段的净宽度不应小于1m。”。Finally, the clauses are completely supplemented as follows: "The clear width of the steps of the residential stairs should not be less than 1.1m, and when there is a railing on one side of the steps, the clear width of the steps should not be less than 1m.".

根据本发明的方法成功的补全了主语缺省的楼梯设计规范文本。本发明的目的是提供一种基于思维导图的楼梯设计规范缺省主语的补全方法,通过对规范中文本进行补齐,能够更好的进行自然语言处理,最终构建楼梯建筑规范的知识图谱,实现自动审图,有效的规避了在人为审图过程中可能出现的不确定因素,误检率低,操作简单,在节约人力的同时,极大地提高了建筑行业工程项目的完成效率。According to the method of the present invention, the default staircase design specification text of the subject is successfully completed. The purpose of the present invention is to provide a method for completing the default subject of the staircase design specification based on a mind map. By completing the text in the specification, natural language processing can be better performed, and finally a knowledge map of the staircase construction specification can be constructed. , to achieve automatic drawing review, effectively avoiding uncertain factors that may occur in the manual review process, low false detection rate, simple operation, while saving manpower, it greatly improves the completion efficiency of construction projects in the construction industry.

Claims (4)

1. the complementing method of the default subject of staircase design specification based on mind map, which comprises the following steps:
Step 1: obtaining corpus of the specification as processing in relation to staircase design from house building design standard, and use Forward Maximum Method algorithm based on dictionary segments urtext, and based on the method for hidden Markov model to point Word after word carries out part-of-speech tagging, obtains pretreated text;
Step 2:, will be relevant to staircase design referring to the descriptor format of staircase design specification in industrial foundation class IFC standard Relationship is combed into mind map between body and ontology, and constructs corresponding index tree;
Step 3: pretreated text being carried out syntax parsing using context-free grammar, is encountering pivotal sentence or even meaning sentence The case where, the sentence pattern ingredient in sentence is determined by searching ontology relevant to object in index tree, and syntax tree is constructed with this, And whether the syntactic structure for analyzing a sentence is complete;Finally therefrom filter out the corpus of default subject;
Step 4: by scanning in index tree, its father node is searched to the object ontology with imperfect sentence pattern structure, And object ontology, to unique paths of root node, father node is this default subject, in addition to father node, this All nodes on path are the modification attribute of subject, after default subject is added to prototype statement, output Subject, Predicate and Object is complete Staircase design specification.
2. the complementing method of the staircase design specification default subject according to claim 1 based on mind map, feature It is, in the step 3: hereafter Grammars exist after carrying out syntax parsing according to the part of speech and word of word in use The position occurred in sentence determines the ingredient in sentence.
3. the complementing method of the staircase design specification default subject according to claim 1 based on mind map, feature Be, in the step 3 construct syntax tree method specific step is as follows: define first context-free grammar G=N, Σ, X, S }, wherein N indicates the mark of one group of n omicronn-leaf child node;Σ indicates the mark of one group of leaf node, that is, forms the word of sentence;X Indicate the rule of one group of syntax, the as production of N, every rule is expressed as X=Y1Y2…Yn,X∈N,Yi∈(N∪Σ);X In the α of at least one production served as by S;α is any mark of leaf node and n omicronn-leaf child node, and S indicates syntax tree The mark of beginning;
Using bottom-up method, from character start of string to be analyzed, matching context-free is gone with character string to be analyzed Matching character is replaced with left character after successful match by the character of arrow right part in grammar rule X, until S occurs, syntax tree Building finishes.
4. the complementing method of the staircase design specification default subject according to claim 1 based on mind map, feature It is, in the step 4: using ergodic algorithm when searching father node.
CN201810349079.7A 2018-04-18 2018-04-18 The complementing method of the default subject of staircase design specification based on mind map Active CN108717405B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810349079.7A CN108717405B (en) 2018-04-18 2018-04-18 The complementing method of the default subject of staircase design specification based on mind map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810349079.7A CN108717405B (en) 2018-04-18 2018-04-18 The complementing method of the default subject of staircase design specification based on mind map

Publications (2)

Publication Number Publication Date
CN108717405A CN108717405A (en) 2018-10-30
CN108717405B true CN108717405B (en) 2019-08-16

Family

ID=63899153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810349079.7A Active CN108717405B (en) 2018-04-18 2018-04-18 The complementing method of the default subject of staircase design specification based on mind map

Country Status (1)

Country Link
CN (1) CN108717405B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287304B (en) * 2019-06-30 2021-11-16 联想(北京)有限公司 Question and answer information processing method and device and computer equipment
CN111091006B (en) * 2019-12-20 2023-08-29 北京百度网讯科技有限公司 Method, device, equipment and medium for establishing entity intention system
CN111027287B (en) * 2019-12-24 2023-08-29 深圳集智数字科技有限公司 Method and related device for converting computer executable script
CN111708882B (en) * 2020-05-29 2022-09-30 西安理工大学 A Completion Method for Missing Chinese Text Information Based on Transformer
CN113158311B (en) * 2021-04-23 2024-09-27 山东建筑大学 Large-scale rural residential building generation design method oriented to three-dimensional shape rule reasoning
CN113987199B (en) * 2021-10-19 2023-02-21 清华大学 A BIM intelligent drawing review method, system and medium for standard automatic interpretation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9323771B2 (en) * 2013-04-24 2016-04-26 Dell Products, Lp Efficient rename in a lock-coupled traversal of B+tree

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ITQAN: a Mobile Based Assistant for Mastering Quran Memorization;Entesar Almosallam et al.;《2015 Fifth International Conference on e-Learning》;20151018;第349-352页
自然语言处理中基于短语结构的语法分析方法;杨国基 等;《微处理机》;20091231(第6期);第74-77页

Also Published As

Publication number Publication date
CN108717405A (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN108717405B (en) The complementing method of the default subject of staircase design specification based on mind map
US11989519B2 (en) Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system
CN109408642B (en) A method for extracting domain entity attribute relationship based on distance supervision
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
WO2023051399A1 (en) Generative event extraction method based on ontology guidance
CN113312922B (en) Improved chapter-level triple information extraction method
CN103942347B (en) A kind of segmenting method based on various dimensions synthesis dictionary
CN108920447B (en) A Domain-Oriented Chinese Event Extraction Method
CN101329666A (en) Chinese Syntax Automatic Analysis Method Based on Corpus and Tree Structure Pattern Matching
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN110991180A (en) A command recognition method based on keywords and Word2Vec
CN107145584A (en) A kind of resume analytic method based on n gram models
CN105138864A (en) Protein interaction relationship data base construction method based on biomedical science literature
CN110119510A (en) A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN113792542A (en) An Intent Understanding Method Integrating Syntactic Analysis and Semantic Role Pruning
CN110175585A (en) It is a kind of letter answer correct system and method automatically
CN109783806A (en) A kind of text matching technique using semantic analytic structure
CN115017335A (en) Knowledge graph construction method and system
Kaur et al. A detailed analysis of core NLP for information extraction
Bladier et al. German and French neural supertagging experiments for LTAG parsing
de Carvalho et al. Extracting semantic information from patent claims using phrasal structure annotations
CN111259159A (en) Data mining method, device and computer readable storage medium
Xu et al. Recognizing Chinese elementary discourse unit on comma
CN112328811A (en) Word spectrum clustering intelligent generation method based on same type of phrases
CN114647418A (en) Software code recommendation method for tree serialization embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant