CN112651226B - Knowledge analysis system and method based on dependency syntax tree - Google Patents

Knowledge analysis system and method based on dependency syntax tree Download PDF

Info

Publication number
CN112651226B
CN112651226B CN202010997505.5A CN202010997505A CN112651226B CN 112651226 B CN112651226 B CN 112651226B CN 202010997505 A CN202010997505 A CN 202010997505A CN 112651226 B CN112651226 B CN 112651226B
Authority
CN
China
Prior art keywords
words
knowledge
dependency
word
dependency syntax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010997505.5A
Other languages
Chinese (zh)
Other versions
CN112651226A (en
Inventor
裴正奇
王树徽
朱斌斌
刘潇
段必超
于秋鑫
余志炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Heidun Technology Co ltd
Original Assignee
Shenzhen Qianhai Heidun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Heidun Technology Co ltd filed Critical Shenzhen Qianhai Heidun Technology Co ltd
Priority to CN202010997505.5A priority Critical patent/CN112651226B/en
Publication of CN112651226A publication Critical patent/CN112651226A/en
Application granted granted Critical
Publication of CN112651226B publication Critical patent/CN112651226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Abstract

The invention provides a knowledge analysis system and a knowledge analysis method based on a dependency syntax tree. A dependency syntax tree based knowledge parsing system, comprising: a knowledge base module and an analysis module. The knowledge analysis method based on the dependency syntax tree enables knowledge points in the Chinese context to be clearly defined for accurate analysis. The knowledge base can be dynamically maintained in real time, is clear and controllable, can directly position and solve unreasonable problem parts, and is not like a traditional deep learning model which is like a black box and can not be analyzed. The knowledge analysis scene is not limited by the diversity and complexity of Chinese grammar/syntax any more, and the application requirements of the scene can be met to the utmost extent as long as the resources of the knowledge base are ensured to be high-quality and comprehensive.

Description

Knowledge analysis system and method based on dependency syntax tree
Technical Field
The invention relates to the field of natural language processing, in particular to a knowledge analysis system and a knowledge analysis method based on a dependency syntax tree.
Background
Dependency parsing is an important component in natural language processing. The dependency syntax can embody the internal logic rule of natural language, and is a syntax theory which breaks through the language restriction and exists in each language family. The concept of "dependency syntax" was originally proposed by the linguistic scientist Panini in india in the 4 th century before the notations, the initial intention was to perform classified research on grammar, syntax, semantics and dependency morphology, the book "structure syntax base" published by the french linguistic scientist Lucien Tesniere in 1959 was always considered as the theoretical basis of the modern dependency syntax, and Robinson in 1970 proposed four dependency principles based on dependency syntax to lay the theoretical structural basis for dependency syntax, and the four principles are: (1) pure node conditions: only contain the bottom leaf nodes; (2) single parent node condition: all non-root nodes in the dependency tree have one and only one father node; (3) conditions of the single root node: a complete dependency tree only comprises a root node, and all other nodes depend on the root node; (4) mutual exclusion condition: the predecessors of siblings and the dependences of parent and child nodes in the dependency tree are mutually exclusive, that is, if there is a dominant and a dominated relationship between two nodes, there may be no predecessor between them. Dependency syntax analysis by building a formalized mathematical model, designing an efficient algorithm, using a computer to analyze and process a sentence, converting it from a word sequence form to a syntax tree form, thereby capturing the sentence internal structure and the dependency relationship between words to reveal its syntax structure, which claims that the core verb in the sentence is the central component that governs other components, but itself is not governed by any other component, all governed components being subordinate to the governors with some dependency relationship. The computer analyzes the dependency syntax, that is, analyzes the collocation relationship between each word and the structure of the whole sentence for the word sequence of the given input sentence, and obtains a dependency syntax analysis tree. The dependency parse tree is a representation of the dependency parse result. At present, the mainstream dependency syntax research mainly focuses on a data-driven dependency syntax analysis method, that is, iterative learning is performed on a training data set, so as to obtain a dependency syntax analyzer, and there are two mainstream methods: shift-based Dependency analysis methods (Transition-based Dependency matching) and Graph-based Dependency analysis methods (Graph-based Dependency matching). The former is to model the generation process of the dependency syntax analysis book into an action sequence and convert the dependency analysis problem into the problem of finding the optimal action sequence; the latter is the problem of translating the dependency parsing problem into finding the largest spanning tree from a fully directed graph.
However, the dependency syntax analysis method in the prior art has the following problems:
(1) the linguists rely on the "near principle" excessively, and the linguists summarize the existence of the "near" principle on the language organization by observation, that is, people actively place the modifier around the center component when organizing the language. However, natural language does not exist completely according to the principle, for example, for the identification of long distance dependencies, because the "near principle" essentially already implies a higher probability and priority for short distance dependencies than for long distance dependencies, and in the parallel structure, each component usually has the same position in the semantic hierarchy, and even the position can be interchanged without affecting the semantic relationship, which results in the accuracy of analysis being reduced.
(2) The method analyzes and judges the large and excellent corpus which is very dependent on the text through the dependency syntax, the largest task of establishing the corpus is alignment, the higher the alignment efficiency is, the higher the accuracy is, and the greater the use is. The existing corpus has some problems, for example, the overall development is unbalanced, which is mainly reflected in that the quantity of the written linguistic data is very different from that of the spoken linguistic data, because the collection and sampling processes of the spoken linguistic data are complicated and tedious. The accuracy of the corpus cannot be guaranteed, a large corpus contains a lot of sentences to be modified, and the fundamental reason is that an effective self-checking method is lacked. These problems all reflect the urgent need for flexible and accurate corpus establishment.
Disclosure of Invention
In order to solve the above problems in the prior art, the technical solution proposed by the present application is as follows:
according to one aspect of the present invention, a dependency syntax tree based knowledge parsing system is disclosed, comprising: the knowledge base module and the analysis module; wherein knowledge base module includes:
the word segmentation module is used for carrying out word segmentation processing on the natural language sentence according to the pre-trained dependency syntactic model and indicating the syntactic dependency relationship among all the components;
the dependency syntax tree generation module collects the sentences covering the target knowledge points, obtains the dependency syntax trees of all the sentences by using the dependency syntax model, and labels the core words;
the simplified processing module is used for reserving the core words in the dependency syntax tree obtained in the dependency syntax tree generating module and simplifying and processing the redundant words and the peripheral structures thereof;
the calculation module is used for calculating and obtaining adjacent characteristics of each core word and storing the adjacent characteristics corresponding to the core words of each knowledge point to form a knowledge base;
wherein, analysis module includes:
the syntax tree processing module is used for processing the text input by the user through the dependency syntax tree to obtain a corresponding word segmentation result;
and the adjacent feature comparison module is used for comparing the obtained adjacent features of all the words with all the adjacent features in the knowledge base, judging whether the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module if the matching degree is greater than a first threshold value, outputting an analysis result if the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module, and prompting the words corresponding to the adjacent features in the knowledge base if the words corresponding to the adjacent features in the knowledge base are not similar to the adjacent features in the knowledge base.
According to an aspect of the present invention, a knowledge parsing method based on a dependency syntax tree is also disclosed, which includes the following steps:
step S1, performing word segmentation processing on the natural language sentence according to the pre-trained dependency syntax model and indicating the syntax dependency relationship among the components;
step S2, summarizing sentences covering the target knowledge points, obtaining dependency syntax trees of all the sentences by using a dependency syntax model, and labeling core words;
step S3, reserving the core words in the dependency syntax tree obtained in step S2, and simplifying processing of redundant words and peripheral structures thereof;
step S4, calculating to obtain adjacent features of each core word, and storing the adjacent features corresponding to the core words of each knowledge point to form a knowledge base;
step S5, processing the text input by the user through the dependency syntax tree to obtain the corresponding word segmentation result;
step S6, comparing the obtained adjacent features of each word with each adjacent feature in the knowledge base, if the matching degree is larger than a first threshold value, judging whether the word corresponding to the adjacent features in the knowledge base is approximate to the adjacent features of the core word, if so, outputting an analysis result, and if not, prompting the word corresponding to the adjacent features in the knowledge base.
Compared with the prior art, the invention has the following beneficial effects:
1. the knowledge points in the Chinese context can be clearly defined for accurate resolution.
2. Knowledge points can be efficiently and unambiguously stored, i.e., knowledge points are no longer stored independently and unambiguously, but rather are stored specifically with respect to a particular context, a particular word, thereby increasing the accuracy of knowledge point retrieval.
3. A knowledge tree (contiguous features) that describes knowledge points in a particular context is subjected to a series of sifting processes, tailored according to the linguistic features of individual dependencies (e.g., COO, ATT).
4. The knowledge points in the Chinese context can be accurately analyzed, for example, a user inputs ' Maotai wine and rice are made into distiller's yeast ', and the analysis system can correct the knowledge of the text input by the user according to the knowledge points of the contexts of ' Maotai wine ' and ' distiller ' pre-stored in the knowledge base, and informs that ' rice ' should be corrected to ' wheat '.
5. The knowledge base can be dynamically maintained in real time, is clear and controllable, can directly position and solve unreasonable problem parts, and is not like a traditional deep learning model which is like a black box and can not be analyzed.
6. The knowledge analysis scene is not limited by the diversity and complexity of Chinese grammar/syntax any more, and the application requirements of the scene can be met to the utmost extent as long as the resources of the knowledge base are ensured to be high-quality and comprehensive.
Drawings
FIG. 1 is a flow chart of building a dynamic structured knowledge base according to the present invention;
FIG. 2 is a flow chart of computing neighboring features according to aspects of the present disclosure;
fig. 3 is a schematic diagram of obtaining an analysis result according to the technical solution of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and the detailed description.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
Fig. 1 is a flow chart of building a dynamic structured knowledge base according to the technical solution of the present invention. Knowledge points in the Chinese context can be well defined for accurate parsing. Knowledge points can be efficiently and unambiguously stored, i.e. knowledge points are no longer stored independently and unambiguously, but rather are stored specifically with respect to a particular context, a particular word. Specifically, the knowledge parsing system based on the dependency syntax tree of the present invention includes: the knowledge base module and the analysis module; wherein knowledge base module includes:
the word segmentation module is used for carrying out word segmentation processing on the natural language sentence according to the pre-trained dependency syntactic model and indicating the syntactic dependency relationship among all the components;
the dependency syntax tree generation module collects the sentences covering the target knowledge points, obtains the dependency syntax trees of all the sentences by using the dependency syntax model, and labels the core words;
the simplified processing module is used for reserving the core words in the dependency syntax tree obtained in the dependency syntax tree generating module and simplifying and processing the redundant words and the peripheral structures thereof;
and the calculation module is used for calculating the adjacent characteristics of each core word and storing the adjacent characteristics corresponding to the core words of each knowledge point to form a knowledge base.
Wherein, analysis module includes:
the syntax tree processing module is used for processing the text input by the user through the dependency syntax tree to obtain a corresponding word segmentation result;
and the adjacent feature comparison module is used for comparing the obtained adjacent features of all the words with all the adjacent features in the knowledge base, judging whether the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module if the matching degree is greater than a first threshold value, outputting an analysis result if the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module, and prompting the words corresponding to the adjacent features in the knowledge base if the words corresponding to the adjacent features in the knowledge base are not similar to the adjacent features in the knowledge base.
According to one aspect of the invention, a knowledge parsing method based on a dependency syntax tree is disclosed, which comprises the following steps:
step S1, performing word segmentation processing on the natural language sentence according to the pre-trained dependency syntax model and indicating the syntax dependency relationship among the components;
step S2, summarizing sentences covering the target knowledge points, obtaining dependency syntax trees of all the sentences by using a dependency syntax model, and labeling core words;
step S3, reserving the core words in the dependency syntax tree obtained in step S2, and simplifying processing of redundant words and peripheral structures thereof;
and step S4, calculating to obtain the adjacent characteristics of each core word, and storing the adjacent characteristics corresponding to the core words of each knowledge point to form a knowledge base.
In step S1, the dependency syntax relationship between words is directional. Each sentence has at least one root source word, and for any word except the root source word, there is only one father node and at least one son node.
In step S3, the simplification process includes: if the two redundant words have dependency relationship, combining the two redundant words into a new redundant word; if the dependency relationship of the two words is a parallel relationship, the parent node and the child node of the two words are shared.
In step S1, it is necessary to first prepare a Dependency syntax model (Dependency syntax model) that has been pre-trained. The model can perform word segmentation processing on natural language sentences and mark syntactic dependencies among components. The details are as follows:
given a sentence of n characters, S ═ S1S2S3…SnAfter the dependency syntax tree processing, the sentence S has a structure S of m words, i.e., W1W2W3…WmAnd obtain dependency syntactic relations between words, e.g., R (W)i,Wj) SBV, for WiAnd WjThere is a SBV (main predicate) relationship between them. WjIs WiParent node of WiIs WjThe child node of (1).
Specifically, in step S1, the dependency syntax relationship between words is directional, i.e., R (W)i,Wj)≠R(Wj,Wi). There must be one root source word W per sentenceroot. For words other than the root source word WrootArbitrary word other than WiThere is and only one word WjWith the presence of R (W)i,Wj) The relationship of (1); i.e. WiThere is only one parent node. For a word WjThere may be multiple words (e.g., W)1、W2、W3) With which there is a radical such as R (W)1,Wj)、R(W2,Wj)、R(W3,Wj) The relationship of (1); i.e. WjThere may be multiple child nodes.
Specifically, in step S1, the sentences that cover the target knowledge point are collected, the dependency syntax trees of all the sentences are obtained by using the dependency syntax model, and the core words are labeled. For example, the sentence "the Maotai liquor in China uses high-quality sorghum as a raw material", we can mark out the core words that can constitute the knowledge points: maotai, wine, sorghum and raw materials; the non-core words may also be referred to as "redundant words".
Specifically, in step S1, a series of screen reduction simplification processes are performed on the obtained dependency syntax tree, the core words are retained, the redundant words and their surrounding structures are simplified, and a dependency syntax structure for each knowledge point and a normalized knowledge tree are formed and stored for later use. The simplified processing means includes:
if two redundant words xi,xjThere is a dependency relationship, and R (x)i,xj) Where ATT stands for "centered relationship", e.g., "red" and "apple" are centered relationship, then x may be expressedi,xjAnd the words are combined into a new redundant word, thereby achieving the aim of simplification.
If two words Wi,WjThe dependency relationship of (A) is R (W)i,Wj) COO (COO herein stands for "parallel relationship"), then WiParent node and child node of (2) and WjMay be shared with the child nodes.
The above dependencies may also be any of the types of relationships shown in the dependency table in the appendix.
Fig. 2 is a flow chart of calculating the adjacent features according to the technical solution of the present invention. Specifically, in step S4, the adjacent feature of each core word, the arbitrary word W, is calculatediAdjacent feature F ofiRepresents WiRelationships with other words;
Figure BDA0002693101030000061
wherein, gijRepresented in a normalized knowledge tree, from WiAt node to WjPaths between nodes where they are located; the path can be encoded into a high-dimensional vector through a neural network model and can also be expressed as a vectorThe specific functional relationship can compare the structure (dependency relationship between each node) and the content (content of each core word on the path) of two different paths. To simplify the processing, the adjacent feature F is calculatediOnly the core words may be considered, and the redundant words are ignored. A certain core word W in a certain knowledge treeiIs counted as Fi
In a specific knowledge statement S(x)Specific core word W in the knowledge tree ofiAdjacent feature of (D) is denoted as Fi (x)Adjacent characteristics of each core word of each knowledge point are stored to form a knowledge base, and the structure of a storage unit is as follows
Fi (x)→Wi
Strictly speaking, WiAnd Fi (x)Each core word in (2) can be expressed in the form of a high-dimensional vector or a set of a series of similar words, so that the situation of similar word replacement can be handled.
After the knowledge base is established, the knowledge parsing system based on the dependency syntax tree of the present invention can be used to parse the input of the user, and as shown in fig. 3, a schematic diagram of the parsing result obtained according to the technical solution of the present invention is shown, which includes the following steps:
step S5, processing the text input by the user through the dependency syntax tree to obtain the corresponding word segmentation result;
step S6, comparing the obtained adjacent features of each word with each adjacent feature in the knowledge base, if the matching degree is larger than a first threshold value, judging whether the word corresponding to the adjacent features in the knowledge base is approximate to the adjacent features of the core word, if so, outputting an analysis result, and if not, prompting the word corresponding to the adjacent features in the knowledge base.
Specifically, in step S5, the user-entered text S is given(U)=S1S2S3...SnAfter the dependency syntax tree processing, the dependency syntax tree can be obtained, and the corresponding word segmentation result is S(U)=W1W2W3...Wm
Specifically, in step S6, the adjacent features of the respective words are obtained
Figure BDA0002693101030000071
E.g. for the core word W in the user texti (U)Acquiring its adjacent feature F in the user input texti (U)
Specifically, in step S6, each adjacent feature F in the knowledge base is identifieda,Fb,Fc.., and Fi (U)By contrast, the adjacent feature with the highest degree of match (e.g. F) is takenj) If the matching degree is higher than a certain threshold (such as a first threshold), the corresponding word W of the adjacent feature in the knowledge base is obtainedjThen word WjShould be related to the core word Wi (U)High approximation, if not sufficiently approximate, of the core word W in the text entered by the useri (U)And the information is not fused with the knowledge base and should be marked and corrected, so that a series of knowledge analysis operations such as knowledge auditing/correcting are realized.
The degree of match may be calculated by comparing the semantic proximity of two words. For example, the word vectors of two words may be compared, or a synonym table may be defined in advance, and it is checked whether each word is a synonym in the synonym table.
Preferably, different weights may be assigned to the core word and the other words in the adjacent feature to calculate a total score, and the total score may be output as a result of the analysis. For example, the core words are similarly configured with a first weight and the redundant words are configured with a second weight. If the adjacent characteristic is judged to be approximate, the output value is 1, and if the adjacent characteristic is not similar, the output value is 0. And multiplying the output value by the corresponding weight, and finally counting the overall score condition as a similarity result. Because the weights in the embodiment of the invention are different, the score is higher if the core words are more similar, and the resolution precision of the system is improved.
The first embodiment is as follows:
according to the first embodiment of the invention, the effects of intelligent error correction and intelligent filling can be realized by constructing the dynamic structured knowledge base in advance and then inputting the user through the analysis algorithm module.
Constructing a dynamic structured knowledge base:
assuming a knowledge statement that "einstein proposed a narrow relativity theory in the odd track year 1905 and expounds the principle of photoelectric effect", it can be obtained after processing of dependency syntax tree:
Figure BDA0002693101030000081
assuming that the knowledge we are interested in is "einstein proposed the principle of the photoelectric effect in 1905", we need to label as core words: einstein, 1905, photoelectric, effect and principle. A series of screen reduction and simplification processes can be carried out to obtain a normalized knowledge tree:
Figure BDA0002693101030000082
here, the variables beginning with "G _" represent core words, the variables beginning with "t _" represent redundant words, and the specific vocabulary is:
{ ' G _0' [ ' Einstein ' ],'t _1': ' ], ' G _2' [ '1905 year ' ],'t _4': ' [ ' ], ' G _12': photoelectric ' ], ' G _13' [ ' effect ' ], ' G _14': principle ' ] }
For convenience of illustration, the words are not represented by high-dimensional vectors, but by a set of similar words.
The adjacent feature of the word "einstein" is actually a summary of paths from the node (i.e., "G _ 0") where the word "einstein" is located to other words, such as the adjacent feature of "einstein" (denoted as "einstein
Figure BDA0002693101030000083
):
Figure BDA0002693101030000084
Where "f" and "b" represent forward (from child node to parent node) and reverse (from parent node to child node), respectively, for example, the path from "G _ 0" to "G _ 13" can be taken from the index "G _ 13" in the graph, i.e., the path from "G _ 0" to "G _ 13" is taken
[['G_0',['f','SBV'],'t_1'],
['t_1',['b','VOB'],'G_14'],
['G_14',['b','ATT'],'G_13']]
It represents that going from "G _ 0" to "G _ 13" needs to go forward to some redundant node "t _ 1", the dependency relationship in the period is SBV, then going backward from "t _ 1" to core node "G _ 14", the dependency relationship in the period is VOB, and finally going backward to "G _ 13", and the dependency relationship in the period is ATT. Judging whether the two paths are consistent not only needs to compare whether the dependency relationships between the nodes of the two paths are consistent, but also needs to compare whether the contents of the nodes are consistent or sufficiently similar.
Repeating the steps, collecting mass knowledge sentences, wherein each knowledge sentence may correspond to more than one knowledge point, generating a knowledge tree for each knowledge point according to the steps, and obtaining each node W in the knowledge treeiAdjacent feature of
Figure BDA0002693101030000093
Each of the adjacent features is then stored as an index.
An intelligent analysis process:
assuming that the user inputs "in 1995, the physicist einstein in germany proved the principles of the photoelectric effect," the resulting dependency syntax tree is:
Figure BDA0002693101030000091
adjacent features of the respective words are obtained. Traversing each adjacent feature stored in the knowledge base, and judging whether the adjacent feature of each word is adjacent to a certain pre-stored adjacent feature in the knowledge baseThe adjacent features are completely matched, and finally, the adjacent feature of the word of '1995' is found to be matched with an adjacent feature of a knowledge base, the word of '1905' is taken as a numerical value
Figure BDA0002693101030000092
And (4) complete matching. Therefore, the term "1995" should be interpreted
When the content of the G _2 node in the knowledge base is consistent, that is, the word "1995" is only replaced by "1905" to ensure that the sentence input by the user does not conflict with the knowledge base.
To prevent miscorrection, if the adjacent feature of the word "1995" matches another adjacent feature in the knowledge base completely, and the word "1995" matches the node corresponding to the adjacent feature, the adjacent feature is described previously
Figure BDA0002693101030000101
The case of a conflict is invalidated.
For example, the user inputs the Maotai wine and the rice are made into the distiller's yeast, and the analysis system can correct the knowledge of the text input by the user according to the knowledge points of the contexts of the Maotai wine, the distiller's yeast and the like pre-stored in the knowledge base, so as to inform that the rice is corrected to be the wheat.
Example two:
intelligent knowledge population can also be achieved according to the parsing system utilizing the present invention. If the user inputs to the system of the present invention the form "physical scientist einstein in x, germany proves the principle of the photoelectric effect", then the system only needs to search the knowledge base for x, thereby realizing the product effect of "knowledge filling". The user enters "einstein has won x year's nobel y prize" and the system informs that x is "1921" and that y is "physics". The retrieval process is the same as in embodiment one.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Appendix, dependency relationship table:
type of relationship Tag Description Example
Relationship between major and minor SBV subject-verb I send her a bunch of flowers (I)<- -send)
Moving guest relationship VOB Direct object, verb-object I send her bundle of flowers (send-)>Flower)
Inter-guest relationships IOB Indirect object, indeect-object I send her bundle of flowers (send-)>She)
Preposition object FOB Front-object of preceding object He reads what book (book)<- - -read)
Concurrent language DBL double He asks me to eat>I)
Centering relationships ATT attribute Red apple (Red)<- - -apple)
Middle structure ADV adverbial Very beautiful (very beautiful)<- - -beautiful)
Dynamic compensation structure CMP complement Has done the operation (do-)>Go to)
In a parallel relationship COO coordinate Dashan and Dahai (Dashan-)>Sea)
Intermediary relation POB preposition-object In the trade area (in-)>Inner)
Left additive relationship LAD left adjunct Mountain and sea (Hehe)<- - -sea)
Right attachmentRelationships between RAD right adjunct Children (children-)>People)
Independent structure IS independent structure The two separate sentences being structurally independent of each other
Punctuation WP punctuation
Core relationships HED head Refers to the core of the whole sentence

Claims (10)

1. A dependency syntax tree based knowledge parsing system, comprising: the knowledge base module and the analysis module; wherein, knowledge base module includes:
the word segmentation module is used for carrying out word segmentation processing on the natural language sentence according to the pre-trained dependency syntactic model and indicating the syntactic dependency relationship among all the components;
the dependency syntax tree generation module collects the sentences covering the target knowledge points, obtains the dependency syntax trees of all the sentences by using the dependency syntax model, and labels the core words;
the simplified processing module is used for reserving the core words in the dependency syntax tree obtained in the dependency syntax tree generating module and simplifying and processing the redundant words and the peripheral structures thereof;
the calculation module is used for calculating the adjacent characteristics of each core word and storing the adjacent characteristics corresponding to the core words of each knowledge point to form a knowledge base;
wherein, analysis module includes:
the syntax tree processing module is used for processing the text input by the user through the dependency syntax tree to obtain a corresponding word segmentation result;
and the adjacent feature comparison module is used for comparing the obtained adjacent features of all the words with all the adjacent features in the knowledge base, judging whether the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module if the matching degree is greater than a first threshold value, outputting an analysis result if the words corresponding to the adjacent features in the knowledge base are similar to the adjacent features of the core words in the adjacent feature acquisition module, and prompting the words corresponding to the adjacent features in the knowledge base if the words corresponding to the adjacent features in the knowledge base are not similar to the adjacent features in the knowledge base.
2. The dependency syntax tree-based knowledge parsing system of claim 1 wherein: in the word segmentation module, the dependency syntactic relation among the words is directional.
3. The dependency syntax tree-based knowledge parsing system of claim 1 wherein: in the word segmentation module, each sentence has at least one root source word, and for any word except the root source word, only one father node and at least one child node exist.
4. The dependency syntax tree-based knowledge parsing system of claim 1 wherein: in the simplified processing module, if two redundant words have dependency relationship, the two redundant words are combined into a new redundant word; if the dependency relationship of the two words is a parallel relationship, the parent node and the child node of the two words are shared.
5. A knowledge parsing method based on a dependency syntax tree is characterized by comprising the following steps:
step S1, performing word segmentation processing on the natural language sentence according to the pre-trained dependency syntax model and indicating the syntax dependency relationship among the components;
step S2, summarizing sentences covering the target knowledge points, obtaining dependency syntax trees of all the sentences by using a dependency syntax model, and labeling core words;
step S3, reserving the core words in the dependency syntax tree obtained in step S2, and simplifying processing of redundant words and peripheral structures thereof;
step S4, calculating to obtain adjacent features of each core word, and storing the adjacent features corresponding to the core words of each knowledge point to form a knowledge base;
step S5, processing the text input by the user through the dependency syntax tree to obtain the corresponding word segmentation result;
step S6, comparing the obtained adjacent features of each word with each adjacent feature in the knowledge base, if the matching degree is larger than a first threshold value, judging whether the word corresponding to the adjacent features in the knowledge base is approximate to the adjacent features of the core word, if so, outputting an analysis result, and if not, prompting the word corresponding to the adjacent features in the knowledge base.
6. The dependency syntax tree-based knowledge parsing method of claim 5, wherein: in step S1, the dependency syntax relationship between words is directional.
7. The dependency syntax tree-based knowledge parsing method of claim 5, wherein: in step S1, each sentence has at least one root source word, and for any word except the root source word, there is only one parent node and at least one child node.
8. The dependency syntax tree-based knowledge parsing method of claim 5, wherein: in step S3, the simplification process includes: if the two redundant words have dependency relationship, combining the two redundant words into a new redundant word; if the dependency relationship of the two words is a parallel relationship, the parent node and the child node of the two words are shared.
9. An intelligent learning content push system, comprising: memory, processor and computer program stored on said memory and executable on said processor, characterized in that the processor performs the method according to any of the claims 5-8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 5-8.
CN202010997505.5A 2020-09-21 2020-09-21 Knowledge analysis system and method based on dependency syntax tree Active CN112651226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010997505.5A CN112651226B (en) 2020-09-21 2020-09-21 Knowledge analysis system and method based on dependency syntax tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010997505.5A CN112651226B (en) 2020-09-21 2020-09-21 Knowledge analysis system and method based on dependency syntax tree

Publications (2)

Publication Number Publication Date
CN112651226A CN112651226A (en) 2021-04-13
CN112651226B true CN112651226B (en) 2022-03-29

Family

ID=75347072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010997505.5A Active CN112651226B (en) 2020-09-21 2020-09-21 Knowledge analysis system and method based on dependency syntax tree

Country Status (1)

Country Link
CN (1) CN112651226B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282762B (en) * 2021-05-27 2023-06-02 深圳数联天下智能科技有限公司 Knowledge graph construction method, knowledge graph construction device, electronic equipment and storage medium
CN115270786B (en) * 2022-09-27 2022-12-27 炫我信息技术(北京)有限公司 Method, device and equipment for identifying question intention and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1628298A (en) * 2002-05-28 2005-06-15 弗拉迪米尔·叶夫根尼耶维奇·涅博利辛 Method for synthesising self-learning system for knowledge acquistition for retrieval systems
CN105528349A (en) * 2014-09-29 2016-04-27 华为技术有限公司 Method and apparatus for analyzing question based on knowledge base
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method
CN109815230A (en) * 2018-12-23 2019-05-28 国网浙江省电力有限公司 A kind of full-service data center Data Audit method of knowledge based map
CN111177394A (en) * 2020-01-03 2020-05-19 浙江大学 Knowledge map relation data classification method based on syntactic attention neural network
CN111194401A (en) * 2017-10-10 2020-05-22 国际商业机器公司 Abstraction and portability of intent recognition
CN111597351A (en) * 2020-05-14 2020-08-28 上海德拓信息技术股份有限公司 Visual document map construction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509860B2 (en) * 2016-02-10 2019-12-17 Weber State University Research Foundation Electronic message information retrieval system
US10325215B2 (en) * 2016-04-08 2019-06-18 Pearson Education, Inc. System and method for automatic content aggregation generation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1628298A (en) * 2002-05-28 2005-06-15 弗拉迪米尔·叶夫根尼耶维奇·涅博利辛 Method for synthesising self-learning system for knowledge acquistition for retrieval systems
CN105528349A (en) * 2014-09-29 2016-04-27 华为技术有限公司 Method and apparatus for analyzing question based on knowledge base
CN111194401A (en) * 2017-10-10 2020-05-22 国际商业机器公司 Abstraction and portability of intent recognition
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method
CN109815230A (en) * 2018-12-23 2019-05-28 国网浙江省电力有限公司 A kind of full-service data center Data Audit method of knowledge based map
CN111177394A (en) * 2020-01-03 2020-05-19 浙江大学 Knowledge map relation data classification method based on syntactic attention neural network
CN111597351A (en) * 2020-05-14 2020-08-28 上海德拓信息技术股份有限公司 Visual document map construction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Arabic Language Text Classification Using Dependency Syntax-Based Feature Selection;Yannis Haralambous et.al;《Eprint Arxiv》;20141231;第1-10页 *
数据驱动的依存句法分析方法研究;李正华等;《智能计算机与应用》;20131031;第3卷(第5期);第1-4页 *

Also Published As

Publication number Publication date
CN112651226A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
US10706236B1 (en) Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system
CN109299341B (en) Anti-cross-modal retrieval method and system based on dictionary learning
CN108319668B (en) Method and equipment for generating text abstract
US9176949B2 (en) Systems and methods for sentence comparison and sentence-based search
CN102479191B (en) Method and device for providing multi-granularity word segmentation result
CN110727839A (en) Semantic parsing of natural language queries
CN111124487B (en) Code clone detection method and device and electronic equipment
US10713429B2 (en) Joining web data with spreadsheet data using examples
AU2014315619B2 (en) Methods and systems of four-valued simulation
Hasan et al. Recognizing Bangla grammar using predictive parser
CN112651226B (en) Knowledge analysis system and method based on dependency syntax tree
CN109408628B (en) A kind of method and relevant device parsing semantic structure of sentences
CN109840255A (en) Reply document creation method, device, equipment and storage medium
JP2019082931A (en) Retrieval device, similarity calculation method, and program
US11669691B2 (en) Information processing apparatus, information processing method, and computer readable recording medium
JP6614152B2 (en) Text processing system, text processing method, and computer program
CN111158692B (en) Ordering method, ordering system and storage medium for intelligent contract function similarity
JP5355483B2 (en) Abbreviation Complete Word Restoration Device, Method and Program
CN115080748B (en) Weak supervision text classification method and device based on learning with noise label
CN114661616A (en) Target code generation method and device
CN113779200A (en) Target industry word stock generation method, processor and device
Wu et al. Structured composition of semantic vectors
CN112446206A (en) Menu title generation method and device
KR102649948B1 (en) Text augmentation apparatus and method using hierarchy-based word replacement
CN112417871B (en) Text entity relationship prediction method based on neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant