CN115859955A - Natural language understanding method based on concept network - Google Patents

Natural language understanding method based on concept network Download PDF

Info

Publication number
CN115859955A
CN115859955A CN202210484567.5A CN202210484567A CN115859955A CN 115859955 A CN115859955 A CN 115859955A CN 202210484567 A CN202210484567 A CN 202210484567A CN 115859955 A CN115859955 A CN 115859955A
Authority
CN
China
Prior art keywords
concept
def
word
words
concepts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210484567.5A
Other languages
Chinese (zh)
Inventor
任浙东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Gewo Intelligent Technology Co ltd
Original Assignee
Hangzhou Gewo Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Gewo Intelligent Technology Co ltd filed Critical Hangzhou Gewo Intelligent Technology Co ltd
Priority to CN202210484567.5A priority Critical patent/CN115859955A/en
Priority to PCT/CN2023/077271 priority patent/WO2023155914A1/en
Publication of CN115859955A publication Critical patent/CN115859955A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the field of artificial intelligence, and particularly provides a natural language understanding method based on a concept network. The natural language understanding method includes the following steps: based on the concept network, the following steps are sequentially carried out on the natural language: segmentation of sentences, segmentation of concept words, concept Definition (DEF) dependency analysis, concept Definition (DEF) operation; the invention provides a brand-new natural language understanding method based on a concept network, which has a greater significance in comprehensively explaining how to convert natural language into concept Definition (DEF) which can be operated by a machine, so that discourse/paragraph and single-round/multi-round conversation can be accurately understood. A DEF operation processing method based on the GTL template engine technology of the GScript language is also provided, and the method can accurately generate and manipulate a UI interface, data, programming language code or generate a reply DEF.

Description

Natural language understanding method based on concept network
Technical Field
The invention relates to the field of artificial intelligence, and particularly provides a natural language understanding method based on a concept network.
Background
And the third time of attack of artificial intelligence AI tide, AI has gone into the real application scene of the industry in a plurality of fields such as voice recognition, machine vision, data mining and the like. As one of the important development directions, natural language processing technology is also rapidly developed and applied. How to let machines understand natural language is one of the most difficult problems in artificial intelligence.
The theoretical basis of concept networks was first traced back to the concept dependency proposed by the us artificial intelligence expert r.c. Shang Ke in 1973. Concept dependency theory (CD theory for short) is a theory in natural language automatic processing. Shang Ke it is believed that there is some conceptual basis in the human brain, and the process of language understanding is the process of mapping sentences into the conceptual basis.
Disclosure of Invention
In order to solve the above problems, the present invention provides a natural language understanding method based on a concept network.
The natural language understanding method includes the following steps: based on the concept network, the following steps are sequentially carried out on the natural language: segmentation of sentences, segmentation of concept words, concept Definition (DEF) dependency analysis, concept Definition (DEF) operation;
features of the natural language understanding process:
(1) A concept-based network implementation;
(2) The process is as follows: segmentation sentences, segmentation concept words, concept Definition (DEF) dependency analysis, concept Definition (DEF) operation. The purpose is as follows: it is possible to understand exactly the chapters/paragraphs, the single/multiple sessions, and even to generate and manipulate UI interfaces (i.e., human-machine interface), data, programming language code, or generate the reply DEF.
(3) The context session runs through the entire analysis, running process.
The specific process comprises the following steps:
(1) Cutting sentence
Whether it is a chapter/paragraph or a single-round/multi-round dialog, a sentence is the most basic language exercise unit, so that it is a very important step to segment the sentence. The method for implementing sentence segmentation mainly has two kinds: firstly, the method is realized based on a clause sign rule; and secondly, the sentence division symbol realization is based on training/recognition. If the natural sentence is definitely a single sentence, the step of sentence segmentation can be skipped;
(2) Segmentation concept word
The segmentation concept word mainly comprises three links of segmentation word, part of speech tagging and concept recognition.
A. Word segmentation
In Latin languages, represented by English, spaces are used as natural separators between words. Chinese is a writing unit based on characters, and there is no obvious mark between words, and the words have stronger meaning ability than single character. Therefore, chinese word segmentation is the basis and key of Chinese information processing.
There are many methods for Chinese word segmentation, such as word segmentation based on lexicon (or dictionary) such as forward minimum matching, reverse minimum matching, forward maximum matching, reverse maximum matching, minimum segmentation, etc., and word segmentation based on statistics such as Hidden Markov Model (HMM), bigram, conditional Random Field (CRF), seq2seq of two-way LSTM, etc. But the technical difficulties of ambiguity recognition, new word recognition and the like still exist.
The word segmentation method described in the text adopts the idea of combining a plurality of methods such as a character-based sequence labeling model, a word-based n-gram model, a word base and the like. The word segmentation process is as follows:
and I, dynamically adding words directly or indirectly corresponding to the concepts in the concept network into the dictionary. If the word is a word with components or a combined word, the components are ignored, the word is taken out, and then the word is added into the dictionary;
and II, segmenting the sentences by using word segmentation methods based on a sequence labeling model of characters, such as a maximum entropy hidden Markov model, a conditional random field CRF, a seq2seq of a bidirectional LSTM and the like. This word segmentation result can be used as the final word segmentation result;
III, segmenting the sentences by a dictionary-based full segmentation method;
IV, merging the words segmented by the two methods, and only one of the words at the same position and in the same position is reserved during merging;
and V, eliminating ambiguity by using algorithms such as a Bigram, a shortest distance (least number of words), a reverse maximum matching, a forward maximum matching, a reverse minimum matching, a forward minimum matching and the like, and obtaining a final word segmentation result.
VI different algorithms may produce different segmentation results. Therefore, the algorithms can be set with priority respectively, and then the algorithms are traversed according to the priority, and the process is as follows:
i, segmenting a word segmentation result by using an algorithm with high priority;
and ii, performing subsequent semantic understanding processing on the word segmentation result, ending the processing if the processing is successful, or selecting an algorithm with high priority from the algorithms which are not traversed and then segmenting the word segmentation result. If the word segmentation result is processed, continuing to select the next algorithm with high priority, otherwise, repeating the step ii;
iii until the semantic understanding processing is successful, or the algorithm is completely traversed;
B. part-of-speech tagging
Part of speech is used to describe the role of a word in context, and part of speech tagging is used to identify the part of speech of the word to determine its role in context. In general, part-of-speech tagging is based on word segmentation, and each word in the word segmentation result is tagged with a correct part-of-speech, that is, a process of determining whether each word is a noun, a verb, an adjective, or other part-of-speech.
The words directly or indirectly corresponding to the concepts in the concept network can specify the parts of speech, but if the words with specific parts of speech do not exist, the step of part of speech tagging can be skipped, otherwise, the part of speech tagging plays a key role in the subsequent concept identification link.
There are also many parts-of-speech tagging methods, most of which are based on statistical models, such as Hidden Markov Models (HMMs).
C. Concept recognition
Concepts are identified for each word in the segmentation results based on the concept network. A word may correspond to one or more word concepts or a word-free concept. And those words for which no concept has been identified are marked as unknown word concepts. This word segmentation result may therefore be referred to as a sequence of concept words.
I for word concepts, matching words are identified. If the words or the compound words are words with components, the components are ignored, words are extracted, and matching is carried out. The recognized concept of the compound word is also typically tagged with the location in the sentence where the word associated with it is located.
II for the concept without words, it is identified by its concept recognizer. The identified concept typically also contains a converter that implements the concept Definition (DEF).
(3) Concept Definition (DEF) dependency analysis
Concept Definition (DEF) dependency analysis is mainly implemented using several methods:
(1) Based on (1) arc-Standard transition actions and (2) dependency parsing of classifiers, while performing (3) concept Definition (DEF) analysis when actions transition in the parsing process;
(2) Model training from a sequence of concept words to a dependency structure tree is performed based on a pre-trained language characterization model, such as BERT. The output of the dependency structure tree from the concept word sequence can be implemented by the model, and then (3) concept Definition (DEF) analysis is performed through the output dependency structure tree.
(1) Arc-Standard based transfer actions
The dependency parsing method based on the transition uses a series of syntax parsing processes represented by initial to termination states (states), one State is composed of a Stack (Stack) for storing words that have been parsed, a Buffer (Buffer) for storing words that have been parsed, and a part of the parsed dependency arcs.
In the initial state, the stack only contains one Root node (Root), and all words in the sentence are stored in the cache. One state is changed into a new state through a transfer Action (Action), and the transfer Action has three types of move (Shift), left-Reduce (Left-Reduce) and Right-Reduce (Right-Reduce). Wherein the move action pushes the first word in the cache onto the stack; generating a left pointing dependency arc between two words with left-side attribution to the top of the stack, and simultaneously pushing the second word at the top of the stack down;
and generating a right pointing dependency arc between two words at the top of the stack, and simultaneously pushing the words at the top of the stack down.
Through a series of transfer actions, the final state can be reached, namely, the stack only contains one root node, the cache is empty, at the moment, a complete dependency tree is just formed, and the dependency syntax analysis process of a sentence is completed.
(2) Based on the classifier
The dependency parsing based on transition also has a classifier to implement, where the input is a state and the output is the most likely action in that state.
The traditional classifier needs to manually define a series of characteristics and various combined characteristic templates, is extremely complex, needs rich knowledge in the field of syntactic analysis, and has low accuracy.
A neural network-based classifier is completely different, and it only needs to take some scattered primitive features directly as an afferent neural network model. How the features are combined is not determined by manually written feature templates, but is automatically extracted by a hidden layer of the neural network model.
(3) Concept Definition (DEF) analysis
I concept Definition (DEF) definition
Concept Definition (DEF) refers to a specific definition or implementation of a concept, which may also be referred to as a concept instance.
Concept Definitions (DEF) are defined by DEF converters, which are configured and defined in a concept network. DEF converters generate one DEF according to one concept, and multiple concepts generate multiple DEFs.
DEF can be classified into simple DEF and complex DEF. Simple DEFs include general DEF, with additional and defined DEF, and composite DEF includes primarily two types, link and intermediate DEF, host predicate DEF.
i simple DEF
● Typically, DEF is the most basic, without any additional and defined DEF, and the resulting DEF may not support the incorporation of any other DEF.
● With appended DEF, meaning a DEF with a series of left appended, right appended, punctuation marks appended, the generated DEF should support merging other DEFs, and the merge time is marked as appended.
● DEF with append and define refers to a DEF with a series of definitions in addition to the appends described above, the resulting DEF should also support merging of other DEFs, and the flag is appended or defined when merged.
ii host predicate guest DEF
The host-predicate-guest DEF, contains a subject, an action, an object, a doublet or a guest complement and a series of supplemental DEFs, the DEF generated typically expresses an action, behavior or method. The DEF should support merging the subject DEF, object DEF, doublet or guest DEF, as well as merging a series of supplemental DEFs.
iii linking and mediating DEF
The link and index DEFs, including a link or index start DEF and a series of link or index end DEFs, are processed separately depending on the application.
● Watch connection
Expression of two DEFs before and after ligation, the DEF produced is typically a combinatorial group. The DEF should support merging DEFs that begin and end a series of DEFs with a composition type between the beginning and each end. And this combination type is semantic dependent.
● Introduction to watch
The expression elicits a preceding or following DEF, whose starting DEF is often the preposition for the elicitation, and ending DEF is the core word of the elicitation section.
The DEF generated is the DEF that initiated the DEF generation. The DEF should support merging a series of DEFs that end the DEF, with an intervening type between each end. And this type of invocation also depends on semantics.
The role of the DEF can generally be judged by the concept used for interfacing and guiding. If the starting DEF of the DEF is one of these concepts, it is determined as a table reference, otherwise it is a table join. If the concept is a table reference, the semantic meaning of the concept is the reference type. If the concept is table-connected, it is further determined whether one of the concepts is stored in the beginning DEF or the ending DEF, and if so, the semantic meaning of the concept is the composition type.
Concept Definition (DEF) exists for essential and optional dependency conditions that are both basic and dynamic. If the linguistic expression is incomplete, the meaning of the expression will not be understood. Also, some DEFs require the presence of DEF dependent thereon, which is referred to as a dependency condition. Conversely, if the intent of expression of these DEFs is understood without these DEFs being dependent on them, these DEFs are referred to as optional dependency conditions.
There are mainly two methods to define the essential dependency and optional dependency conditions of the DEF:
i corresponds directly or indirectly by concept to a word with DEF. With DEF, the necessary conditions for the concept to define DEF are promised.
If this portion of DEF is missing, it will be semantically incomplete;
ii generating, by the DEF converter, the DEF with the essential dependency and the optional dependency conditions;
there are mainly two methods to define the DEF dynamic dependency-on-demand and dependency-on-demand conditions:
i generates DEF by a DEF converter that can define these conditions during the merge process. Certainly, the definition in the merging process can also be realized by technical means such as configuration;
ii concept Definition (DEF) during operation to determine if necessary dependent DEF is missing.
II DEF analytical Process
The analysis process of concept Definition (DEF) is mainly the process of merging between DEFs. The analysis process mainly comprises concept connection analysis, dependency condition analysis, context analysis and the like.
The dependencies between children and parents in the dependency tree can be handled using the same idea of transfer actions. The Left child node depends on the right parent node and can be regarded as Left reduction (Left-Reduce); the child node on the Right depends on the parent node on the left and can be considered a Right Reduce (Right-Reduce).
Concept Definition (DEF) analysis is performed while the motion is transferred. One state contains the DEF Stack (Composition Stack). The DEF stack is then used to store the defined DEF. The last termination state after a series of transition actions will only contain one DEF in its DEF stack. The process is as follows: when the transfer action is Shift, the first word in the cache is pushed to the stack, meanwhile, a general DEF is generated for the word, and the DEF is pushed to the DEF stack; ii second DEF on top of DEF stack when the transfer action is Left-Reduce
Depends on the DEF top stack DEF, and is down stack. The transfer action is Right Reduce (DEF) when the DEF top DEF depends on the second DEF on the top of the DEF stack and the stack is dropped.
The dependence between DEF depends on the dependence. The DEF may be converted to other types of DEF during the dependency process. It may be determined during DEF analysis whether DEF dependencies are valid. When the syntax analysis is determined to be invalid, it means that the current branch action is invalid, and the current thread (including the original thread) of the syntax analysis is determined to be invalid early and is forced to end.
DEF is generated according to words when the transfer action is move-in (Shift), so:
i when the word corresponds to one or more word concepts, defining one or more DEFs according to a DEF converter of the word concept;
ii when the word corresponds to one or more lingering concepts, the DEF converter included in the concept identified by its concept identifier defines one or more DEF converters
DEF;
iii when a word is marked as an unknown word concept, then the DEF is not present and will be ignored during the analysis process.
It can be seen that one DEF may not be present, or only one, but more than one may be present. DEFs also express language ambiguity, while the absence of DEF indicates an inability to understand the current statement.
The DEF is not constant during the analysis, and may change from one to zero (i.e., not present), multiple to one, or even zero, depending on the dependency analysis under the context session.
(4) Concept Definition (DEF) operation
Concept Definition (DEF) is a specific definition or implementation of a concept, and is an object generated during the execution of a program, and thus can directly execute a process.
Preconditions for DEF operation:
A. is valid and all of its necessary dependency conditions are satisfied;
B. a run method under a context run session has been implemented;
a new approach to DEF-based operation is set forth herein that enables the accurate generation and manipulation of UI interfaces, data, programming language code, or the generation of a reply DEF. The method is based on GTL template engine technology of GScript language. The technology can analyze or compile and execute the GTL text file to generate a GScript language code, and then call different instruction function libraries to analyze or compile and execute the code to generate other programming languages according to the type of the GTL text file through a GScript engine. These generated programming languages can be executed by their corresponding executors or tools, for example, the HTML language can be executed by a browser.
The GTL template engine calls GTL text files with different formats to execute in different operation modes. The GTL template engine supports <%! % > label syntax. By using the label grammar, other GTL text files can be introduced, even nested introduction is carried out, so that a key practical problem is solved: "because of the existence of a plurality of operation modes and submodes, the workload of developing the GTL text file is greatly increased, however, the workload is repeated for many times, and the workload of development is greatly reduced.
The GTL template engine selects a GTL text file to execute:
a. if the DEF has been specified a GTL text file, then the file is selected for execution;
b. otherwise, the following steps are carried out:
b.1. DEF expressing an action, behavior or method
Selecting a GTL text file to execute according to the concept ID of the action.
b.2. Combined (Grouping) DEF
And traversing the combined members, and combining the processing results according to the current operation mode and the combination type.
b.3. Surface-mediated DEF
This DEF is typically dependent on other DEFs and is therefore ignored as not being processed. But needs to be pushed into the context analysis session queue and marked as DEF to be merged.
b.4. Other DEF
If the concept of the current DEF is subject to concept ID = object, the action of the table lookup is defaulted (concept ID = query), that is, the DEF is converted to express an action, behavior, or method and then processed.
c. If the corresponding GTL text file cannot be found for execution, the processing cannot be carried out, namely, the statement cannot be understood.
The GTL template engine allows external incoming cache data so that DEF can be incoming as cache data at execution time, so that those DEFs that are dependent (i.e., merged) can be processed during execution.
Contextual conversation
The context session runs through the entire analysis, running process. Both the analysis session and the running session are generated by a context session. The analysis session acts on the analysis link. The running session depends on the analysis session and acts on the running link. When the DEF is successfully run, the data contained (e.g., events, objects, relationships between objects, etc. in the running session) is incorporated into the contextual session before the analysis session, the running session, etc. is completed, to facilitate analysis, running of the context or next turn of the dialog.
The invention is based on the definition of the following conceptual network:
a system for defining object concepts and semantic relationships between concepts for use in an artificial intelligence concept network includes concepts and concept links, where a concept can form one or more semantic relationships with multiple concepts, and where the semantic relationships between concepts can be multi-tiered.
(1) Definition of concept network: concepts and concept connections.
The concepts have a unique number ID. The concept ID may be in any form, such as a string of characters, a number, a word vector, etc. The concepts are divided into three categories: WORD concepts (WORD), no WORD concepts (NON WORD), UNKNOWN concepts (UNKNOWN).
The word concepts include general word concepts, word concepts with relational components, compound word concepts, and word set concepts. The word concept is characterized by the presence of natural language words directly corresponding to the concept, and the presence of a converter that implements the DEF definition of the concept.
The concept of no word includes the concept of no word without ontology concept and the concept of no word with ontology concept. The characteristic of the concept without words is that natural language words do not directly correspond to the concept, but a concept recognizer is usually present, so that the natural language words can indirectly correspond to the concept.
The unknown concepts comprise unknown word concepts and unknown word set concepts. The unknown concept is characterized by an inability to generate a concept definition DEF for the concept, which will be ignored during natural language processing.
Concept connection, i.e. a semantic relation chain between concepts, is used to describe the multi-layer semantic relation between concepts. The method is characterized in that:
a/a concept may form one or more semantic relationships with multiple concepts, while the semantic relationships between concepts may be multi-tiered. Of course not all concepts can form a semantic relationship with each other.
b/some concept connections are not inherent or generated at the time of concept network construction, but are gradually formed during the understanding of natural language.
c/presence feature. These features represent affirmations, negatives, possibilities, scopes, probabilities, degrees, frequencies, times, moods, etc. that reflect relationships in the conceptual connection.
In addition, the invention also provides a generation method of the concept network.
The generation method of the concept network comprises the following steps: the method can be realized through the aspects of basic definition, generation, extended definition, dynamic formation and the like.
The base definition defines the concepts and concept connections of the base. In the technical means of realization, configuration files in XML and JSO formats are mainly selected to define the concepts and the concept connection, and then the concepts and the concept connection are generated by analyzing the configuration files.
The generation method converts a data structure, a programming language, a semantic network/dictionary/knowledge base and the like into concepts and concept connections. The main program generated by the conversion is the concept generation engine.
Data structure
A. The method comprises a row-column form and a key-value pair form. The row-column form is often represented and stored by Excel, a relational database and the like, and is represented by two-dimensional data, rows and columns and a predefined data model. The key-value pair form often uses a non-relational database to store data, which does not require a predefined structure.
B. The generation method comprises the following steps:
(1) the concept generation engine takes each table as a concept, and takes the table name of the table or the word referring to the table as the word of the concept. Meanwhile, the concept ID = object of the base definition and the concept generated from these tables constitute a SPECIES genus relationship.
(2) The engine takes each field of the table as a concept, and takes the field name of the field or the word referring to the field as the word of the concept. Meanwhile, the concepts generated by the belonging table and the concepts generated by the fields form an ATTRIBUTE relationship.
(3) The engine may treat each field type as a concept. These concepts are generally word-free concepts, and respective concept recognizers are used to identify whether a word is a matching concept. Meanwhile, concepts generated by fields having corresponding field types and concepts generated by these field types constitute a VALUE relationship.
(4) Of course, the word-free concepts may be customized for some fields with certain constraints or specificities for their values, such as dictionary data.
(5) In addition, some fields have foreign key relationships with other tables, and concepts generated by these field types and concepts generated by their associated master tables form VALUE relationships.
C. The method is realized through an object data management technology. The object data management technology uses an object template file in an XML format to define an object structure and establish a mapping relation with a database table. The concept generation engine can generate concepts and concept links according to the XML of the object template
And (6) connecting. The generation method comprises the following steps:
(1) the concept generation engine can take each object template (or object) as a concept, and the name of the object or the word referring to the object as the word of the concept. Meanwhile, the concept ID = object of the base definition and the concept generated by these objects constitute a SPECIES genus relationship.
(2) The engine can take each attribute defined by the template as a concept, and the field name of the attribute or the word referring to the attribute as the word of the concept. Meanwhile, the concepts generated by the belonged objects and the concepts generated by the ATTRIBUTEs form an ATTRIBUTE relationship.
(3) The engine may treat each attribute data type as a concept. These concepts are generally word-free concepts, and a respective concept recognizer is used to recognize whether a word is a matching concept. Meanwhile, the concepts generated with the attributes of the corresponding data types and the concepts generated with these attribute types constitute a VALUE relationship. Of course, the concept can also be customized to the attribute value by configuration. For an attribute of an object type, the attribute is associated with an object specified in the type to form a VALUE relationship. For an attribute of a compound type, the attribute constitutes a PART relationship with a member (sub-attribute) specified in the type. For an ATTRIBUTE of a structure type, the ATTRIBUTE forms an ATTRIBUTE relationship with a member (child ATTRIBUTE) specified in the type.
Programming language
A. The essence of the programming language is the grammar rules, and the concept generation engine generates the concept and the concept connection by analyzing the grammar rules and converting.
B. The formatting scripting language GScript is also a programming language. The GScript language also has a parser that can convert the code into a syntax tree, and can also convert the syntax tree into the code. The GScript language also has an important feature in that it can call different instruction function libraries to parse or compile execution codes to generate other programming languages. The current version of the GScript language has supported the generation of these programming languages: java, javascript, html, css, xml, json,
text. The concept generation engine generates the concept according to the GScript grammar rule, the Java grammar rule and the grammar rule of the target program language
Concepts and concept connections are generated. The generation method comprises the following steps:
(1) the GScript language grammar is composed of three parts of instruction function, operator and operation function. The concept generation engine is preferably configured to define a series of concepts and concept connections for instruction functions, operators and operation functions of the GScript language syntax.
The configuration method may use technical means consistent with the underlying definition.
(2) The Java language syntax contains classes, attributes, methods, data types, enumerations, annotations, etc. The concept generation engine preferably configures Java annotations to configure and define concepts and concept connections. The configuration method may use technical means consistent with the underlying definition. The Java annotation uses the language characteristics of Java, so that the definition of concept and concept connection can be performed in the code development process.
The I-concept generation engine may treat each class as a concept, and the class name of the class, or a word referring to the class as the word of the concept. Allowing multiple classes to point to the same concept, e.g., an interface and its implementation class will typically represent the same concept.
Meanwhile, the concept ID = object of the base definition and the concepts generated by these classes constitute a SPECIES genus relationship.
II Java enumeration, while also class itself, requires different processing, which defines two concepts. One of the concepts is a verbeless concept with an ontological concept and is a value of the other concept, and the other concept is an ontological concept of the verbeless concept.
The III engine processes the reference condition of each field (field) of the class respectively:
i if a field is configured or annotated to define a concept, then such concept constitutes with the concept of that field
ATTRIBUTE relationship;
ii if the data type of the field points to a class (or Java enumeration) and the component type of the class
(Component Type) is a class for which a concept has been defined, then a concept of this Type forms an ATTRIBUTE relationship with the concept of the field data Type;
and iii, if not, defining a concept for the field by the engine, taking the field name of the field or a word referring to the field as the word of the concept, and then forming an ATTRIBUTE relationship between the concept and the concept generated by the field.
The IV engine also needs to process separately for different data types:
(1) class (c): basic data types and some corresponding object types.
(2) Class (ii): the collection type mainly comprises an array and a class for realizing java.
(3) Class (c): other types, including java.
The V Engine defines the class field (field): if the field is not configured or the annotation defines a conceptual connection:
i define a concept if the field is a fixed constant, the constant VALUE, or a word referring to the constant VALUE, as the word of the concept, and then the concept of the field and the generated concept form a VALUE relationship.
ii if the concept ID of the field is not consistent with the concept ID of its data type, a VALUE relationship is formed between the two.
Method (method) for class definition by VI engine: if the method is not configured or annotated to define concepts and concept connections,
then:
i if the method name conforms to the JavaBean specification (i.e., the getter and setter methods), then there is no need to define it. Because the concept ID = get, ID = set and concept ID = object are often defined in the basic definition to constitute SV cardinal relations.
Otherwise, defining a concept, the name of the method, or a word referring to the method as the word of the concept, and then such concept and the generated concept constitute SV appellation.
The VII Engine defines inheritance of classes (extensions): if the ID of the parent class is not consistent with that of the subclass class, the concept of the parent class and the concept of the subclass class form a SPECIES genus relationship.
(3) The concept and concept connection of the target program language are generated in accordance with the section corresponding to the GScript 'instruction function-extended instruction'.
Semantic network/dictionary/knowledge base
The concept network is also a semantic network, so that the technology can realize the interconversion with the existing mainstream semantic network/dictionary/knowledge base.
Extended definitions further extend the definition concepts and concept connections. The technical means as the basic definition can be selected from the technical means of implementation.
And dynamically forming concepts and concept connection. Such as dynamically forming connections between concepts during natural language understanding, or artificially adding words to concepts, adding concept connections, and the like.
Use of word vectors. Words corresponding directly or indirectly to concepts may be replaced with word vectors. The concept ID may directly use the word vector.
The invention provides a brand-new natural language understanding method based on a concept network, which has a greater significance in comprehensively explaining how to convert natural language into concept Definition (DEF) which can be operated by a machine, so that discourse/paragraph and single-round/multi-round conversation can be accurately understood. A DEF operation processing method based on the GTL template engine technology of the GScript language is also provided, and the method can accurately generate and manipulate a UI interface, data, programming language code or generate a reply DEF. The method and system described herein provide a methodology, and even a novel solution, for machine understanding natural language.
Drawings
FIG. 1 is a conceptual network-based natural language understanding method process diagram
FIG. 2 is a word segmentation diagram
FIG. 3 "create a variable named a" statement Standard NLP Chinese model output dependency Structure CoNLL or dependency Tree
FIG. 4 "create a variable named a" statement Syntax Net Chinese model output dependency Structure CoNLL or dependency Tree
FIG. 5 "create a variable named a" sentence "HashLarge LTP language cloud output dependency Structure CoNLL or dependency Tree
FIG. 6 "List News when publish time is yesterday" statement Standard NLP Chinese model output dependency Structure CoNLL or dependency Tree
FIG. 7 "List News with publication time yesterday" statement Syntax Chinese model output dependency Structure CoNLL or dependency Tree
FIG. 8 "List News with publication time yesterday" statement "cloud output dependency Structure CoNLL or dependency Tree in the language" HaoMa LTP
FIG. 9 "order a 10 am later airline ticket from Nanjing to Hangzhou" sentence Standard NLP Chinese model output dependency CoNLL or dependency Tree
FIG. 10 "order a 10 am later air ticket from Nanjing to Hangzhou" statement Syntax Net Chinese model output dependency CoNLL or dependency Tree
FIG. 11 "order an air ticket from Nanjing to Hangzhou after 10 am later" sentence Haomanga LTP language cloud output dependency structure CoNLL or dependency tree
FIG. 12 "create a variable named a" statement concept connection
FIG. 13 "create a variable named a" statement Chinese sentence dependency parsing results
FIG. 14DEF example of condition configuration and definition of dependency and optional dependency
FIG. 15 is an example of selected concepts between a dependent word and its core word
FIG. 16 is a processing example of the concept that there is no coincidence between a dependent word and its center word
FIG. 17 "create a variable named a" statement DEF dependency analysis result
FIG. 18 concept Definition (DEF) requires different processing methods in different application scenarios
FIG. 19 "create a variable named a", "create two variables named a" two statements DEF analysis and run conclusions
FIG. 20 implementation of DEF operation process diagram based on GScript language GTL template engine technology
FIG. 21 "Row department" statement development mode GTL execution file operation results
FIG. 22 "View name is department of administrative group" statement development mode GTL execution File operation result
FIG. 23 "create variable with name a and value b" statement development mode GTL execution File run result
Detailed Description
Embodiments of the invention are described in further detail below with reference to the accompanying drawings:
referring to fig. 1, the understanding of the natural language of the present invention generally has the following features:
■ A concept-based network implementation;
■ The method comprises the steps of sentence segmentation, concept word segmentation, concept Definition (DEF) dependency analysis, concept Definition (DEF) operation and the like, so that chapters/paragraphs, single-round/multi-round conversations can be accurately understood, UI interfaces (namely man-machine interaction interfaces), data, programming language codes can be generated and manipulated, and reply DEF can be generated. Reply DEF is generated according to the intent to be expressed, which may be a statement generated by the natural language generation techniques described herein;
■ The context session runs through the whole analysis and operation process;
concept networks (Concept networks) are systems that define the concepts of things and semantic relationships between concepts, which are essentially semantic networks that provide a semantic basis for the understanding and generation of natural languages as described herein. The term "natural language word" (or "word") in the following text broadly refers to a character, word, phrase (phrase) of natural language.
1.1. Definition of
The concept network comprises two parts, namely concept and concept connection.
1.1.1. Concept
The concepts have a unique number ID to identify the uniqueness of the concept in the concept network. The concept ID may be in any form, such as a string of characters, a number, a word vector, etc. The concepts are divided into three categories: WORD concepts (WORD), no WORD concepts (NON WORD), UNKNOWN concepts (UNKNOWN).
1.1.1.1. Word concept
The presence of a natural language word corresponds directly to this concept, and there is also a converter that implements the definition DEF of this concept. This concept is also referred to as a direct concept.
Concept definition DEF refers to a specific definition or implementation of a concept, which may also be referred to as a concept instance.
A natural language word may correspond to one or more concepts. Some words have multiple parts of speech, so words of a particular part of speech may also be assigned to correspond to one or more concepts.
For example, concept ID = unit, to which the word "unit" may correspond. The sentence "the unit of the quantity is? "," unit "is the concept.
The word concepts in the above examples do not have any relational components and are referred to as general word concepts.
The word concept also includes:
word concepts with relational components
Such concepts have relational components, i.e., two sides or one side of a word as relational components of the concept.
For example, concept ID = selt, there may be words "{0} or {1}" with components, the phrases "apple or watermelon", or "both sides" apple "," watermelon "being the relational components of the concept, representing the choice;
meanwhile, the semantic relation to be expressed in the natural language can be indirectly realized through the relation components. Examples are as follows:
Figure SMS_1
/>
Figure SMS_2
combined word concept
Such concepts are made up of a union of words, at least between which are the relational components of the concept.
For example, the concept ID = eban, there may be a compound word "rather {1} nor {2}", the sentence "i can be rather dead and not tired of you", and both the right side of "rather" and the right side of "neither" are relationship components of the concept, representing a choice or selection, a choice first followed by a choice.
Concept of word set
Such a concept means that a plurality of words, or words with components, or compound words correspond to the concept.
For example, concept ID = caru, which may have "2 } because of 1}," 2} because of 1}, "2 } since 1} because of 2}," 1} because of 1}, "because of 1}," or "a combination thereof"
Since {1} "," is {1} ", and" is {2} ", respectively" so as to {2} ", and" so as to {1}
For the sake of convenience, the words "so as to {1}" correspond to each other, and indicate cause and effect, in case of reason and purpose, the result will be described later; for example concept ID = create, which may have a "definition
The words "create", "new", "build", "construct", "initial", "build", etc. correspond to each other and represent a creation action.
1.1.1.2. Concept without words
Meaning that there is no natural-language word that corresponds directly to the concept, but there is typically a concept recognizer present so that natural-language words can correspond indirectly to concepts. This concept is also referred to as an indirect concept.
For example, concept ID = @ time, used to refer to time, date, for example, words "20150712", "tomorrow", "previous days", "certain day". Need to identify by concept
The recognizer recognizes whether the word is a concept of time.
Some of the noun concepts express a certain class of concepts, which is called ontology concepts. Thus, the concept of no word is divided into a concept of no word without ontological concept and a concept of no word with ontological concept. For example:
a no-word concept with ontology concept
Such concepts, generally, are words used to describe a class of concepts to which a particular word concept corresponds.
For example, a concept ID = action, which is used to describe a concept relationship role — action, the concept ontology concept is a word concept ID = & action, and the word concept ID = & action corresponding word may be "action", "behavior", "operation"; for example, concept ID = attribute, which is used to describe concept relationship role — attribute, the concept ontology concept is word concept ID = & attribute, and word concept ID = & attribute corresponding to the word may be "attribute"; for example, concept ID = part, which is used to describe concept relationship role-part, the concept ontology concept is word concept ID = & part, the word concept ID = & part corresponding word may be "part"; for example, concept ID = value, which is used to describe concept relationship role — value, the concept ontology concept is word concept ID = & value, and word concept ID = & value corresponding word may be "value"; for example, the concept ID = @ string is used to describe words such as character strings, the concept ontology concept is the word concept ID = string, and the word concept ID = string corresponds to a word, which may be a "character string"; for example, concept ID = @ number, which is used to describe words in the category of numbers, the concept ontology concept is word concept ID = number, and the word concept ID = number corresponds to a word that may be "number"; for example, the concept ID = @ time is used to describe words such as time, the concept ontology concept is the word concept ID = time, and the word concept ID = time corresponds to words such as "time" and "date".
1.1.1.3. UNKNOWN concept UNKNOWN
Meaning that a concept definition DEF cannot be generated for the concept and will be ignored during natural language processing.
Such unknown concepts also include:
unknown word concepts
Such concepts have a single natural language word corresponding to them.
Unknown word set concepts
There are multiple natural language words to which this concept corresponds.
1.1.2. Concept linking
1.1.2.1. Basic
I.e., a semantic relationship chain between concepts, to describe a multi-level semantic relationship between concepts. For example, the following semantic relationships:
■ Genus (SPECIES)
The genus-genus relationship also includes the upper and lower relationship. The words with strong specificity are called as hyponyms of the words with strong generalization, and the words with strong generalization are called as hypernyms of the words with strong specificity. This relationship also exists between actions.
V. e.g. thing this — event | object; for example, object-news user department; for example, red-scarlet vermilion | carmine | crimson.
■ Component (PART)
Whole-to-part relationship.
For example, the number of quality-number | unit; for example, time-year | month | day | hour | minute | second | millisecond; such as human-head body.
■ Attribute (ATTRIBUTE)
Host-attribute, host-leadership, host-characteristic, material-finished product, and the like. The host also contains actions.
V.e. variable var-name | value; for example, news — title | text content | attachment attributes | author; for example, personality-good or bad goodbad; for example, format — schema pattern.
■ VALUE (VALUE)
Attribute-value, entity-value relationships.
V.gravy, e.g., the quantity qualification- @ qualification; for example, the title — @ string; for example, color — @ color.
■ Main component (SV)
Action source-action relationship.
V. me-Create create | Modify deleting delete | query | issue publish; e.g., @ number-plus add minus sub.
■ Weibin (VO)
Action-action object relationship.
Create, for example, a create — variable var; for example, publish-news.
■ Correlation (R)
A correlation relationship ofSame sense of meaningAntisense geneDefinition of the followingIncompatibilityTo aFive types of sub-relationships. Wherein the content of the first and second substances,definition of the followingMeaning that the semantics of one concept are among those of another concept,incompatibilityTo exclude the semantics of one concept from those of another,to aIt is meant that the two concepts are related but that the specific relationship between the two is unclear.
Cold √ hot; for example good-bad.
A concept may form one or more inherent semantic relationships with multiple concepts, while the semantic relationships between concepts may be multi-tiered.
1.1.2.2. Dynamic formation
A concept can be connected with a plurality of concepts step by step in the understanding process of natural language to form new concepts.
For example, the statement "i have one package" expresses that an ATTRIBUTE host and a leading link is formed between me and package bags, but before the statement, the link may not be formed between me and package bags
1.1.2.3. Characteristic of
Representing relationships reflected in conceptual linksAffirmation thatNegationCan make it possible toRangeProbability ofDegree Frequency ofTimeTone of voiceAnd the like.
For example, the statement "I do not have a bag" expresses that the connection between me and bag forms ATTRIBUTE host and territory, but the characteristic of the connection is thatNegation(ii) a Connection features expressed by e.g. the sentence "I may not have a package" other thanNegationIn addition, there areCan make it possible to(ii) a For example, the statement "I'm 80 percent may not have a packet" expresses a connection feature other thanNegationCan make it possible toIn addition, there areProbability of
1.2. Construction method
The construction of the concept network can be realized through the aspects of basic definition, generation, extended definition, dynamic formation and the like.
1.2.1. Base definition
The underlying concepts and concept connections are defined. In the technical means of realization, configuration files in XML and JSON formats are mainly selected to define the concepts and the concept connection, and then the concepts and the concept connection are generated by analyzing the configuration files. But does not exclude the use of configuration files in other formats.
For example, concept profiles concept/base 1.Json, concept/action 1.Json, concept connection profiles connection/base 1.Json, connection connections/action 1.Json.
1.2.2. Generating
The data structure, programming language, semantic network/dictionary/knowledge base and the like are converted into concepts and concept connection. The main program generated by the conversion is the concept generation engine.
1.2.2.1. Data structure
■ Structured data
The structured data in a row-column form can be represented and stored by using Excel, a relational database and the like, is represented as data in a two-dimensional form, and has rows and columns and predefined data models. In the structured data storage definition process, a table is used for representing basic units for processing data and establishing Excel, a relational database, an application program and the like. The "column" of the table is often referred to as a "field". Each field consists of several data items of the same data type divided by some boundary. Like in the "address book" database, "name," "contact number," etc. are attributes common to all rows in the table, so these columns are referred to as the "name" field and the "contact number" field. And the data stored at the table row column intersection is called the "value" which is the most basic memory cell. The field type defines the type of data that the value can be stored.
For example, excel field types (defined in cell format) are: numerical value, currency, date, time, text, etc.; for example, microsoft Access field types include: text, byte, integer, long, single, double, date/Time, etc.; for example, mySQL field types are: CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, INT, BIGINT, FLOAT, DOUBLE, DATE, DATETIME, etc.;
in a relational database, where a primary key is defined for a table, the representation may be such that each record in the table is uniquely represented by that key. The foreign key represents the link between the two relationships. A table having a foreign key in another relationship as a primary key is referred to as a primary table, and a table having such a foreign key is referred to as a secondary table of the primary table.
● The relational model refers to a two-dimensional table model, and a relational database is a data organization composed of two-dimensional tables and the links between the two-dimensional tables. Concepts commonly used in relational models: relationships-can be understood as a two-dimensional table, each relationship having a relationship name, i.e., a table name; tuple-understood as a row in a two-dimensional table, often referred to as a record in a database; attribute-can be understood as a column in a two-dimensional table, often referred to as a field in a database; domain-attribute value range, i.e. value limit of a certain column in the database; key-a set of attributes that uniquely identify a tuple, often referred to as a primary key in a database, consisting of one or more columns; relationship schema — refers to the description of relationships in the format: the relationship name (attribute 1, attribute 2, attribute N) becomes a table structure in the database;
■ Non-relational database
A non-relational database, in the form of key-value pairs, is also a collection of data structured storage methods. It does not require a predefined structure and stores data in the form of key-value pairs, and therefore easily accommodates variations in data type and structure. The commonly used non-relational databases are mainly:
the BigTable is a key value mapping, but a plurality of relational database terms such as tables, rows, columns and the like are used;
HBase-HBase stores data in the form of a table. The table consists of rows and columns. The columns are divided into several column families. A column family is a collection of columns, with a column family containing multiple columns. Row Key is a Row Key, the ID of each Row, this field is created automatically. Each column in the table belongs to a certain column family. The column family is part of the table chema and must be defined before the table can be used. Column names are prefixed by column families. HBase is a storage unit as determined by row and columns and is called a cell. Each cell holds multiple versions of the same data. The versions are indexed by a 64-bit integer timestamp. The data in the cell is of no type and is stored in all byte code form.
check-MongoDB — A document is the basic unit of data in MongoDB, resembling a row in a relational database. The document is in a key-value pair mode, the value of the document can be character strings, integers, arrays, documents and other types, and the keys of the document are character strings marked by double quotation marks and are equivalent to column names. A collection is a set of documents in MongoDB, similar to a data table in a relational database. Collections are identified by a unique name. Documents in the form of different key-value pairs may be stored in the same collection. A plurality of documents in the MongoDB form a set, and a plurality of sets form a database.
1.2.2.1.1. Generation method
The concept generation engine takes each table as a concept, and takes the table name of the table or the word referring to the table as the word of the concept.
For example, table user, generating concept ID = user, and there may be words "user", and so on; for example, the table news generates the concept ID = news, and the words "news", and "information" may be included.
Meanwhile, the concept ID = object of the base definition and the concept generated from these tables constitute a SPECIES genus relationship.
For example, the concept ID = object and the concept ID = user and ID = news in the above example constitute a SPECIES genus relationship.
The engine takes each field of the table as a concept, and takes the field name of the field or the word referring to the field as the word of the concept.
Name, generating a concept ID = user.name, and may have the words "name", and the like; sex, which may be the words "sex", "gender", etc., of the field sex of table user, generating concept ID = user. For example, the field title of table news, generating concept ID = news, title may have the words "title", "main title", etc.; a field creator of a table news generates a concept ID = news, and words of "creator", and the like can be included;
meanwhile, the concepts generated by the belonging table and the concepts generated by the fields form an ATTRIBUTE relationship.
For example, the concept ID = user in the above example, and the concept ID = user.name, ID = user.six form an ATTRIBUTE relationship; for example, the concept ID = news in the above example and the concept ID = news, title constitute an ATTRIBUTE relationship;
the engine may treat each field type as a concept. These concepts are generally word-free concepts, and respective concept recognizers are used to identify whether a word is a matching concept.
For example, text field type, generating a syllabic concept ID = @ string with an ontological concept ID = string; e.g., number field type, generating a verbess concept ID = @ number with an ontological concept ID = number; for example, a time and date field type, generating a no-word concept ID = @ time with an ontology concept, wherein the ontology concept is ID = time;
meanwhile, concepts generated by fields having corresponding field types and concepts generated by these field types constitute a VALUE relationship.
For example, if the field name of the table user and the field title type of the table news are both text field types, the concept ID = user.
Of course, the word-free concepts may be customized for some fields with certain constraints or specificities for their values, such as dictionary data.
For example field sex of table user, whose value is the dictionary data: male and female, the custom wordless concept ID = @ sex, and the concept identifier that can customize the concept identifies words such as "male", "female", and the like. Meanwhile, the concept ID = user, the name and the concept ID = @ sex form a VALUE relation;
in addition, some fields have foreign key relationships with other tables, and concepts generated by these field types and concepts generated by their associated master tables form VALUE relationships.
For example field creator of table news, which constitutes a foreign key relationship with table user, then concept ID = news.
1.2.2.1.2. Object data management techniques
The existing mainstream object relational mapping framework, such as Hibernate, provides a framework convenient to use for mapping an object-oriented domain model to a traditional relational database. Such a framework would be very easy to extend the implementation of a concept generation engine according to the generation methods described above.
The object data management technique described in this section is also an object relational mapping framework in Java language, which performs very lightweight object encapsulation for JDBC. It regards tables as objects, regards the fields of the tables as attributes of the objects, and expresses the relationships of attribute components, primary foreign keys between tables, primary table-sub-tables, etc. by value types. It uses XML format object template file to define object structure and build mapping relation with database table.
JDBC is implemented in JDBC-driven by commonly used non-relational databases such as HBase and MongoDB, which are accessible using JDBC. The technology performs object encapsulation on the non-relational database access APIs, and performs normalization processing aiming at MongoDB, namely, documents in a set and embedded documents use a consistent key value pair form;
the object template file supports multiple languages such as Chinese and English. It defines the word that refers to the table, defines the field name of the attribute, the word that refers to the attribute, and also defines the data type of the attribute. The object template file may be managed dynamically at runtime, e.g., a user may newly define an object template, create or modify properties, etc. through a data management interface. Xml, for example, an object template file.
The data types defined in the object data management technology are classified into a SIMPLE type SIMPLE, a compound type COMPLEX, and a STRUCTURE type structrue.
■ SIMPLE type SIMPLE
Including string type string, integer, short integer short, double floating point number double, single floating point number float, long integer long, boolean bolean, text htext: string, HTML text htext which is HTML, BBcode text htext which is BBCode, XML text htext which is XML, JSON text htext which is JSON, date type htime which is date, time type htime which is timetag, object template type which is base, integer number type hierarchy which is integer, long integer number type hierarchy which is long, character string number type hierarchy which is string, object type and the like. Wherein the object type representation defines attributes (fields) that form foreign key relationships with the current table or other tables.
V. for example, the attribute superior parent in the object template file parent xml is an object type and points to the current object table.
■ Compound type COMPLEX
The type indicates that a certain attribute is composed of a plurality of sub-attributes, for example, a price may be composed of a value and a unit. That is, a PART relationship is formed between the attribute and the sub-attribute.
The check mark MongoDB can be realized by using an embedded document type;
an attribute (field) price definition, for example, indicating that the price is composed of a number and a unit;
■ STRUCTURE type STRUCTURE
The type indicates that a plurality of rows of records exist in a certain attribute, namely the relationship of a main table and a sub table. For example, a role has multiple authorized functions, then this authorized function functions is an attribute of the role and is a structure type. The ATTRIBUTE is also composed of a plurality of sub-ATTRIBUTEs, and ATTRIBUTE-ATTRIBUTE relationship is formed between the ATTRIBUTEs and the sub-ATTRIBUTEs.
The MongoDB can be realized by using an array type and elements of the MongoDB as an embedded document type and represents a relationship between a document and an embedded document;
attribute (field) authorization function functions definitions of √ e.g
The object data management technique not only implements object/attribute definitions and mapping of database tables (including mapping of object attribute data types to SQL/NoSQL database data types), but also provides an object-oriented data query/management mechanism so that Java programmers can manipulate the database as desired using object programming thinking.
The concept generation engine may generate concepts and concept connections from the object template XML.
The concept generation engine can take each object template (or object) as a concept, and the name of the object or the word referring to the object as the word of the concept. Meanwhile, the concept ID = object of the base definition and the concept generated by these objects constitute a SPECIES genus relationship.
The engine can take each attribute defined by the template as a concept, and the field name of the attribute or the word referring to the attribute as the word of the concept. Meanwhile, the concepts generated by the belonged objects and the concepts generated by the ATTRIBUTEs form an ATTRIBUTE relationship.
The engine may treat each attribute data type as a concept. These concepts are generally word-free concepts, and a respective concept recognizer is used to recognize whether a word is a matching concept. Meanwhile, the concepts generated with attributes of the corresponding data types and the concepts generated with these attribute types constitute a VALUE relationship. Of course, the concept can also be customized to the attribute value by configuration. For an attribute of an object type, the attribute is associated with an object specified in the type to form a VALUE relationship. For an attribute of a compound type, the attribute constitutes a PART relationship with a member (sub-attribute) specified in the type. For an ATTRIBUTE of a structure type, the ATTRIBUTE forms an ATTRIBUTE relationship with a member (child ATTRIBUTE) specified in the type.
For example data type and concept.
Figure SMS_3
Figure SMS_4
Figure SMS_5
Xml, and concept linkage. Concept ID = @ phone, ID = @ fax are customConcept of no words, it
They are used to identify telephone numbers, faxes, respectively.
1.2.2.2. Programming language
Programming Language (Programming Language) is a set of grammatical rules that define a computer program. It is a standardized communication technique used to instruct computers. A computer language allows programmers to accurately define the data that the computer needs to use and to accurately define the actions that should be taken under different circumstances.
Programming languages can generally convert code into syntax trees and syntax trees into code through a parser, as shown in the following table:
Figure SMS_6
in addition, the template language is also a programming language, and has own grammar rules and template engine, which can make the computer compile and execute, and even generate other languages. Such as freemaker, velocity, which may generate HTML, SQL, postScript, XML, RTF, java source code, etc., from templates.
The formatted script language GScript, which will be described herein, is also a programming language, and the current version is written in Java. It has dual engines, supporting both interpretive execution and compiled execution. When the code is compiled and executed, the code can be compiled into Java byte codes, and the running efficiency of the code is quickly improved. It can also act as a template engine, like FreeMarker, velocity, generating text output based on templates. The GScript language also has a parser that can convert the code into a syntax tree and, at the same time, can convert the syntax tree into the code. The GScript language also has an important feature in that it can call different instruction function libraries to parse or compile execution codes to generate other programming languages.
1.2.2.2.1. Generation method
The essence of the programming language is the grammar rules, and the concept generation engine generates the concept and the concept connection by analyzing the grammar rules and converting. The following sections of this document focus on the generation methods for the GScript language. And the methods of most other programming languages such as JAVA, HTML, JS, CSS, HTML, JSON, XML, and the like are substantially the same, and are not described in detail.
1.2.2.2.2.GScript language
1.2.2.2.2.1.GScript grammar rule
The GScript language grammar is composed of three parts of instruction function, operator and operation function.
■ Instruction function
The instruction function is of the form:
< function name > (< function parameter 1>, < function parameter 2>, … …) retaining pocket
< function body >
};
Function parameters: the parameters following the function name are wrapped with () and the parameters are used and spaced apart. Not all functions have parameters. If there are no parameters () can be omitted. Function body: the function body is wrapped with { }. The inside of the function body can write a plurality of instruction functions. Not all functions have a function body. If there is no function body, then it can be omitted. Function end symbol: the use is carried out; as an end-stop.
The instruction function library comprises basic instructions and extended instructions. Basic instruction functions include code blocks (script), variable definitions and assignments (var, assign), condition controls (if, else), loops (do, while, for), traversals (loop, list), loops and process interrupts (break, continue, return), function definitions and calls (function, call), throw and catch exceptions (throw, try, catch, finish), external script loads (include), null executions (void), prints (print, printf, printn), debugs (info, debug, trace, n, error), sort (sort), marks (mark), and the like. And the extension instruction is related to the program language generated by the target and can be dynamically extended when in use.
Operation character
GScript supports the following characters, and the priority order is as follows:
Figure SMS_7
Figure SMS_8
the operators are divided into arithmetic operators, relational operators, logical operators, bit operators, ternary expression operators, type conversion operators, NULL assignment operators, value operators, escape operators, comment operators, character or string operators, and hold operators by function.
■ Function of operation
The operation functions mainly comprise operation functions of mathematics, character strings, HTML, dates, arrays, objects, values, JSON, debugging and the like and extended operation functions.
The GScript language has many advanced features such as type inference, constant optimization, function parameterization, and so on. The current version of the GScript language is written by using Java language, has good interactivity with Java and can be said to be seamless connection. Java constants, classes, attributes, calling methods and the like can be accessed through the value-taking operational characters and the value-taking operational functions. The current version of the GScript language already supports the following computer languages, as shown in the following table:
supported computer languages Instruction function library
java Basic
javascript Basic, javascript extension
html Basic, html extension
css Basic, cs extension
xml Basic, xml extension
json Basic, json extension
text Basic, text extension
Of course, these computer languages can be directly converted into the GScript language through the GScript parser in cooperation with the corresponding parsers, so that the concepts and the concept connection can be indirectly converted and generated through the GScript language.
Figure SMS_9
The GScript parser parses GScript language into an instruction function tree. The instruction function tree is a syntax tree of GScript. And conversely, the GScript parser can generate the instruction function tree into the corresponding GScript language. There is therefore a great advantage in using the GScript language: the method can generate uniform syntax trees for different computer languages, and is convenient for analysis and processing.
1.2.2.2.2.2.Java syntax rules
The Java language is an object-oriented programming language. The object-oriented system has the characteristics of encapsulation, inheritance, polymorphism and the like. (1) packaging: the attributes and operations of the objects are combined together to form an independent object. The use of the modifier enables the external object not to directly manipulate the properties of the object, but only to use the services provided by the object. (2) inheriting: namely, extension, subclasses inherit almost all attributes and behaviors of parent classes, but inheritance can only be single inheritance, and one subclass can only have one direct parent class; inheritance is transitive. The child class can be seen as the parent class. (3) polymorphic: the method is realized by method reloading and method rewriting.
A class is an abstraction of an object, and an object is an instance of a class; the object has two major elements of attribute and behavior, and the attribute is static description of the object. And the behavior embodies the function and behavior of the object. The use of a class is done by instantiating the class.
Attributes and behaviors constitute members of a class. The attributes of the classes can be basic data types or reference types; the method embodies the behavior; the Get and Set methods are used as a convention to provide operations on attributes. A class may be considered an attribute of another class, i.e., an attribute may be the reference data.
Inheritance of a class is an extension, whose key is extensions. Inheritance criterion: always letting the subclasses perform all the behaviors that the superclass can perform; ensuring that the subclasses contain all information of the superclass; adding members to the subclasses, thereby defining the characteristic behavior of the subclasses; migrating the common traits to the superclasses; the non-subclasses of the same super class are allowed to perform the same behavior, but in a different implementation-method rewrite.
An interface is a collection of abstract behaviors. Multiple inheritance is realized through the interface. The class should be better designed such that the class encapsulates the properties and behavior of the represented object. Functions external to the object may be implemented through the interface.
Enumeration is a means of defining a finite number of possible values, and the use of enumeration can reduce the chance of program error and can improve the readability and maintainability of the code. Enumeration in Java is not a collection of simple constants, but an object, the nature of which remains a class.
1.2.2.2.2.3. Concept generation engine
And the concept generation engine generates concepts and concept connection according to the GScript grammar rule, the Java grammar rule and the grammar rule of the target program language.
1.2.2.2.2.3.1.GScript
The concept generation engine is preferably configured to define a series of concepts and concept connections for instruction functions, operators and operation functions of the GScript language syntax. The configuration method may use technical means consistent with the underlying definition.
■ Instruction function
Basic instruction
Figure SMS_10
/>
Figure SMS_11
Expand instructions
●TEXT
Figure SMS_12
●JSON
Figure SMS_13
●XML
Figure SMS_14
●HTML
Figure SMS_15
/>
Figure SMS_16
●CSS
Figure SMS_17
Figure SMS_18
●JS(JAVASCRIPT)
Figure SMS_19
■ Operational character
Figure SMS_20
/>
Figure SMS_21
■ Function of operation
Figure SMS_22
/>
Figure SMS_23
/>
Figure SMS_24
1.2.2.2.2.3.2.Java
The GScript language seamlessly interacts with Java constants, objects and the like through numeric operators and numeric operation functions. The local variables and the global variables which are defined by the var instruction function and the assign instruction function are also stored in a storage area in a Java object form, and the variables are obtained through the value operators @, @ and the value operation functions @, @ and fromContext. Incoming third party Jar packages may also be accessed through a plug-in method.
The preferred configuration is associated with Java annotations to configure and define concepts and concept connections. The configuration method may use technical means consistent with the underlying definition. The Java annotation uses the language characteristics of Java, so that the definition of concept and concept connection can be performed in the code development process.
The Java annotation is some meta-information attached to the code, and is used for parsing and using tools during compiling and running, and plays a role in explanation and configuration.
For example, concept configuration concepts/c.o.g.p.c.i.actioncontext.json, connectivities/c.o.g.p.c.i.actioncontext.json, java notes com.onegid.grid.platform.nlp.4. Im.connected.connected.network.association.class con, com.oneegrid.grid.platform.nlp.4. Im.connected.network.association.class con, and com.oneegrid.grid.platform.nlp.4. Im.connected.network.association.class con.
The concept generation engine may treat each class as a concept, and treat the class name of the class or the word referring to the class as the word of the concept. Allowing multiple classes to point to the same concept, for example an interface and its implementation class will typically represent the same concept.
The notion ID of the V-class often uses a small skill: the length of the concept ID is reduced using a packet name index table, the index c.o.g.p.c.i to represent the packet name com.onegid.grid.platform.context4.
Meanwhile, the concept ID = object of the base definition and the concepts generated by these classes constitute a SPECIES genus relationship.
For example, the concept ID = object and the concept ID = c.o.g.p.c.i.actioncontext, ID = c.o.g.p.c.i.applicationcontext in the above example constitute the SPECIES genus;
java enumeration, while itself a class, requires different processing, which defines two concepts. One of the concepts is a noun concept with an ontological concept and is a value of the other concept, and the other concept is an ontological concept of the noun concept.
The engine processes the reference condition of each field (field) of the class differently:
(1) if a field is configured or annotated to define a concept, then such concept and the concept of the field form an ATTRIBUTE relationship;
(2) if the data Type of the field points to a class (or Java enumeration) and the Component Type (Component Type) of the class is a class for which a concept has been defined, then the concept of this class and the concept of the field data Type form an ATTRIBUTE relationship;
for example, the class com.
(3) Otherwise, the engine defines a concept for the field, the field name of the field or the word referring to the field is used as the word of the concept, and then the concept and the generated concept of the field form an ATTRIBUTE relationship.
For example, the class com.
The engine also needs to process separately for different data types:
(1) class (ii): basic data types and some corresponding object types.
(2) Class (c): collection type, mainly including array, class implementing java.
(3) Class (c): other types, including java.
Engine definition class field (field): if the field is not configured or the annotation defines a conceptual connection:
if a field is a fixed constant, a concept is defined, a constant VALUE, or a word referring to the constant VALUE, is taken as the word of the concept, and then the concept of the field and the generated concept constitute a VALUE relationship.
If the concept ID of a field is inconsistent with the concept ID of its data type, a VALUE VALUE relationship is formed between the two.
For example, the above-mentioned exemplary concept ID = c.o.g.p.c.i.b.requestpage.uri and concept ID = @ string constitute a VALUE relationship; for example, the above-described exemplary concept ID = c.o.g.p.c.i.b.requestpage.titles constitutes a VALUE relationship with concept ID = map.
Method for engine definition of class (method): if the method is not configured or annotated to define concepts and concept connections, then:
if the method name conforms to the JavaBean's specification (i.e., the getter and setter methods), then no definition is needed. Because the concept ID = get, ID = set and concept ID = object are often defined in the basic definition to constitute SV cardinal relations.
Else, a concept is defined, the name of the method, or a word referring to the method, as the word of the concept, and then such concept constitutes an SV cardinal relation with the generated concept.
Inheritance of engine definition classes (extensions): if the ID of the parent class is not consistent with that of the subclass, a SPECIES genus relationship is formed between the concept of the parent class and the concept of the subclass. 1.2.2.2.3.3.3. Html
The HTML grammar mainly comprises: the HTML document is defined by HTML elements; the HTML element starts with a start tag and ends with an end tag; the content of the element is the content between the start tag and the end tag; some HTML elements have empty content (empty content); null elements close in the start tag (ending with the end of the start tag); the HTML tag can have attributes which always appear in the form of name/value pairs; the HTML document has an external style, which is introduced by the tag link; the HTML document has an internal style and is defined by a tag style; the HTML element has an inline style, which is mainly defined by an attribute style; HTML element events exist in the form of attributes, i.e., the ability to have events trigger actions in the browser through event attributes. The concept and the concept connection are generated according to the GScript 'instruction function-extended instruction-HTML' in the previous section. 1.2.2.2.3.4. Css
The CSS syntax mainly includes: the CSS rule consists of two main parts: a selector, and one or more declarations; selector { declaration1; classification 2; .., deacarationn }; each declaration consists of an attribute and a value; selector { property: value }; each attribute has a value, and the attribute and the value are separated by a colon; styles can be defined by contextual relationships in terms of the element's location; the id selector can specify a specific style for the HTML element marked with a specific id, and is defined by "#"; the class selector is displayed with a dot number; styles may be set for HTML elements that possess specified attributes, not just class and id attributes. The concept and the concept connection are generated as in the previous section GScript 'instruction function-extended instruction-CSS'.
1.2.2.2.2.3.5.Xml
XML refers to extensible markup language that is designed to transmit and store data. XML grammar rules are simple and logical. The XML syntax mainly includes: an XML document must have a root element; the label of the XML element is not predefined and the label is required to be defined by self; XML elements must all be closed tags; the tags of XML elements are sensitive to case; like HTML elements, XML elements can also possess attributes (name/value pairs); the attribute value of the XML element must be quoted. The concept and concept connection are generated according to the previous section GScript 'instruction function-extended instruction-XML'.
1.2.2.2.2.3.6.Json
JSON: javaScript Object Notation (JavaScript Object Notation). JSON is a syntax for storing and exchanging text information. Like XML. JSON is smaller, faster, and easier to parse than XML. The JSON syntax is a subset of the JavaScript syntax. The JSON syntax mainly has: JSON name/value pair: the name/value pair includes a field name (in a double quote), followed by a colon, and then a value; the JSON value may be: numbers (integer or floating point), strings (in double quotation marks), logical values (true or false), arrays (in square brackets), objects (in curly brackets), null; JSON objects are written in curly brackets and contain a plurality of name/value pairs; the JSON array is written in square brackets and may contain multiple objects. The concept and the concept connection are generated as in the previous section GScript "instruction function-extended instruction-JSON". 1.2.2.2.2.3.7.Js (Javascript)
The GScript language seamlessly interacts with Javascript browser objects, functions and the like through numeric operators and numeric operation functions. Meanwhile, a var mapping is defined by using var, assign instruction functions and Javascript variables; defining function mapping by using a function of the function instruction and the Javascript; and obtaining mapping by taking the variables of the operator @ and the operation functions @ and fromContext and Javascript, and the like.
For example, execute GScript Js (Javascript) code to generate Javascript code.
The GScript language can introduce built-in objects and functions of browsers and external scripts such as jQuery frames into the GScript by configuration, so that the objects and functions can be accessed in the GScript.
For example a default GScript configuration.
JavaScript is a lightweight programming language that can be executed by all modern browsers. JavaScript, like Java, has variables, data types, objects, functions, operators, comparison and logic operators, if.. Also, a JavaScript object has properties and methods, properties being values associated with the object, and methods being actions that can be performed on the object. The generation of partial concepts and concept connections is described in the previous section GScript "instruction function-extended instruction-JS (JAVASCRIPT)", "operation function-extended operation function-JS (JAVASCRIPT)".
1.2.2.3. Semantic network/dictionary/knowledge base
The mainstream semantic network/dictionary/knowledge base at present mainly comprises: conceptNet, frameNet, wordNet, etc. The concept network described herein is also a semantic network, and thus technically it is possible to implement the semantic network/dictionary/knowledge base interconversion with the existing mainstream.
1.2.3. Extension definition
The concept and the concept connection generated by the method may have certain defects, or may not meet the requirement of semantic analysis reasoning to a certain extent, so that the definition can be further expanded. The technical means as the basic definition can be selected from the technical means of implementation.
In addition, the dictionary such as synonym dictionary, word2Vec tool and the like can help to expand the concept words and the related relations among the concepts.
The dictionary of synonyms is a book published by university press, east China. The synonyms and antonyms are collected into the 1200 rest of the vocabulary.
V Word2Vec is a tool for converting words into vector form. The processing of the text content can be simplified into vector operation in a vector space, and the vector can be calculated
Spatial similarity, to represent semantic similarity of text.
1.2.4. Dynamic formation
For example, as described in the "dynamic formation" section of concept connection, connections, i.e., relationships, are dynamically formed between concepts during natural language understanding. These concept connections can be dynamically added to the concept network. The concept connection added to the concept network usually has only one characteristicAffirmation thatThe purpose is as follows: concept connection in a concept network not only meets the needs of natural language understanding, but can also be adapted to different contexts.
For example, the statement "I do not have a bag" expresses that the connection of Attribute (Attribute host and Collar) is formed between me and bag bags, but the connection is in the current context
Is characterized in thatNegation. But the connection added to the concept network is characterized byAffirmation that
Such as artificial operations during natural language understanding, adding words to concepts, adding concept links, etc.
Depending on the actual application.
1.2.5. Word vector
A word vector is a way to mathematically transform words in a language, and as the name implies, a word vector is a representation of a word as a vector. There are two main ways of representation:
(ii) One-hot replication. A thesaurus is created and each word is numbered sequentially. In practical application, sparse coding storage is generally adopted, and sparse coding storage is mainly adopted
The number of words. One of the biggest problems of the representation method is that the similarity between words cannot be captured, and even a similar word cannot see any relation from a word vector. This representation is also prone to dimensional disasters, especially in Deep Learning related applications.
[ square root ] Distributed representation. The basic idea is to map each word into a K-dimensional real number vector (K is generally a hyper-parameter in the model) through training, and judge semantic similarity between words through distances between words (such as cosine similarity, euclidean distance, and the like). Word2Vec uses this Word vector representation.
Traditionally, natural Language Processing (NLP) systems encode words into strings. This approach is arbitrary and does not provide useful information for obtaining possible relationships between words. Word vectors are an alternative to the NLP domain. It maps words or phrases into real number vectors, reducing features from a vocabulary-sized high dimensional space to a relatively low dimensional space. The most popular Word vector model at present is Word2Vec proposed by Mikolov et al in 2013.
There are mainly two ways in which word vectors can be combined.
■ Words that correspond directly or indirectly to concepts are replaced with word vectors. The human being can not correspond all and accurate words for the concepts, and the similarity between words is captured through the word vector, which is beneficial to extending words for the concepts or establishing the correlation relationship among the concepts.
Concept IDs directly use word vectors.
2.1. Natural language understanding implementation process
2.1.1. Cutting sentence
■ Discourse/paragraph
A chapter is usually made up of one or more paragraphs. A paragraph is an aggregate of sentences. The "natural segment" is customarily called, and has a mark for changing lines and starting up.
■ Natural language dialogue
I.e. computers and people interact through the human language (natural language). Natural language dialogues can be divided into single-turn dialogues, which are the basis, and multiple-turn dialogues. The user has a single/multiple round conversation with the dialog system in a natural language such as english or chinese, and the system helps the user to complete the task.
Whether it is a chapter/paragraph or a single-round/multi-round dialog, a sentence is the most basic language operation unit, so that it is a very necessary step to segment the sentence. The method for implementing sentence segmentation mainly has two kinds:
■ Based on the sign rules. For example, english and Chinese have clear sentence symbols, but it is also necessary to consider that some words may contain these symbols.
The punctuation mark at the end of an English sentence, for example, is ".! Is there a (ii) a ", the punctuation mark at the end of the Chinese sentence is" yes ". | A Is there a …! Is there a ". English paired symbols "()" "" ", chinese paired symbols" "'' { } ()") [ J \ "\" \") can constitute clauses, and the end symbols of the sentences contained in the clauses should be ignored. The dot in english abbreviation does not serve as an end marker for a sentence. The English language also deals with the decimal recognition, if the left and right sides of the point number are followed by the number, the point number is the decimal point; if the left and right of the point number are non-numeric characters, the point number is a point number in the end punctuation or abbreviation of the sentence.
The open source projects Apache OpenNLP, stanford NLP, etc. all provide a sentence-separating tool.
■ Based on training/recognition. Such as Apache OpenNLP sequence Detector, which detects whether a punctuation mark in a Sentence marks the end of the Sentence.
Of course, if the natural sentence is definitely a single sentence, it means that the natural sentence does not need to be further segmented, and the step of segmenting the sentence can be skipped.
2.1.2. Segmentation concept word
The segmentation concept word mainly comprises three links of segmentation word, part of speech tagging and concept recognition.
2.1.2.1. Word segmentation
In Latin languages represented by English, spaces are used as natural separators between words. Chinese uses characters as basic writing units, there is no obvious mark between words, and the words have stronger meaning ability than single character. Therefore, chinese word segmentation is the basis and key of Chinese information processing.
There are many methods for Chinese word segmentation, such as word segmentation based on lexicon (or dictionary) such as forward minimum matching, reverse minimum matching, forward maximum matching, reverse maximum matching, minimum segmentation, etc., and word segmentation based on statistics such as Hidden Markov Model (HMM), bigram, conditional Random Field (CRF), seq2seq of two-way LSTM, etc. But the technical difficulties of ambiguity recognition, new word recognition and the like still exist.
The segmentation word described herein is one or a combination of methods that can be selected from a word-based sequence tagging model, a word-based n-gram model, a word-based lexicon, etc., as shown in fig. 2. The word segmentation method based on the n-element grammar model of the words has better performance for processing dictionary words, and the word segmentation method based on the sequence labeling model of the words has better robustness for unknown words in the dictionary. The segmentation process of the combination of the multiple methods is as follows:
1) And dynamically adding words directly or indirectly corresponding to the concepts in the concept network into the dictionary. If the word or the combined word is the word with the relation component, the relation component is ignored, the word is taken out, and then the word is added into the dictionary;
for example, concept ID = create, which may have words such as "define", "create", "establish", "build", "construct", "initial", "build", etc. corresponding to them, dynamically adding these words to the dictionary; for example, the word concept ID = selt, there may be a word "{0} or {1}" with a relationship component. Dynamically adding the 'or' word into the dictionary; for example, concept ID = eban, there may be a compound word "rather {1} nor {2}", and then the "rather", "neither" word is dynamically added to the dictionary;
2) The sentence is segmented by utilizing word segmentation methods based on a sequence labeling model of characters, such as a maximum entropy hidden Markov model, a conditional random field CRF, a seq2seq of a bidirectional LSTM, and the like. This word segmentation result can be used as the final word segmentation result;
v. has been implemented with many word segmenters, such as:
Figure SMS_25
3) Segmenting words of sentences by a full segmentation method based on a dictionary;
for example:
Figure SMS_26
4) Merging the words segmented by the two methods, and only keeping one of the words at the same position and in the same word during merging;
5) And then disambiguating by algorithms such as Bigram, shortest distance (least number of words), reverse maximum matching, forward maximum matching, reverse minimum matching, forward minimum matching and the like to obtain a final word segmentation result.
6) Different algorithms may produce different segmentation results. It is therefore possible to prioritize these algorithms separately and then prioritize them
Traversing according to the level of the priority, wherein the process is as follows:
i. firstly, segmenting a word segmentation result by using an algorithm with high priority;
and ii, performing subsequent semantic understanding processing on the word segmentation result, ending the processing if the processing is successful, and selecting an algorithm with high priority from the algorithms which are not traversed and then segmenting the word segmentation result. If the word segmentation result is processed, continuing to select the next algorithm with high priority, otherwise, repeating the step ii;
until the semantic understanding processing is successful, or the algorithm is completely traversed;
2.1.2.2. part-of-speech tagging
Part of speech is used to describe the role of a word in context, and part of speech tagging is used to identify the part of speech of the word to determine its role in context. In general, part-of-speech tagging is based on word segmentation, and each word in the word segmentation result is tagged with a correct part-of-speech, that is, a process of determining whether each word is a noun, a verb, an adjective, or other part-of-speech.
The words directly or indirectly corresponding to the concepts in the concept network can specify the part of speech, but if no words with specific parts of speech exist, the step of part of speech tagging can be skipped, otherwise, the part of speech tagging plays a key role in the subsequent concept identification link.
There are also many parts-of-speech tagging methods, most of which are based on statistical models, such as Hidden Markov Models (HMMs).
V. has been implemented by many parts-of-speech tags, for example:
Figure SMS_27
2.1.2.3. concept identification
Concepts are identified for each word in the word segmentation results based on the concept network. A word may correspond to one or more word concepts or a word-free concept. And those words that have not been identified with any concepts are labeled as unknown word concepts. This word segmentation result may therefore be referred to as a sequence of concept words.
For word concepts, words are matched for recognition. If the words or the compound words are words or compound words with the relation components, the components are ignored, words are taken out, and matching is carried out. The recognized concept of the compound word is also typically tagged with the location in the sentence where the word associated with it is located.
For example, table news, generating word concept ID = news, there may be words "news", "information", etc., and sentence "which news are most recently? "the word in" news "
That is, the concept is corresponded to; for example, the word concept ID = selt, there may be the word "{0} or {1}" with a relational component, the word "or" in the sentence "apple or watermelon"
Namely, the concept is corresponded; for example, the word concept ID = eban, there may be a combination word "rather {1} nor {2}", and the word "rather", "neither" in the sentence "i rather fatigued and not tired of you" corresponds to this concept. Meanwhile, "rather" marks that its association is also "5" in the sentence, and "rather" marks that its association is also "1" in the sentence.
For a concept without words, it is identified by its concept identifier. The identified concept typically also contains a converter that implements the concept Definition (DEF).
For example, the verbatim concept ID = @ time, which is used to refer to time, date, for example, the words "20150712", "tomorrow", "previous days", "certain day". Ontology of the concept
The concept is the word concept ID = time. The corresponding words of the word concept ID = time may be "time", "date". The statement "tomorrow date is 20150712" by means of which
The concept recognizer of the concept can recognize that the words "tomorrow", "20150712" correspond to the concept. Of course, the word "date" can also be recognized as corresponding to the concept ID = time.
2.1.3. Concept Definition (DEF) dependency analysis
The concept Definition (DEF) dependency analysis taught herein is implemented primarily using several methods:
(1) Performing dependency syntax analysis based on the Arc-Standard transfer action and the classifier, and performing concept Definition (DEF) analysis when the action is transferred in the analysis process;
(2) A dependency structure tree is trained directly based on an existing pre-training language characterization model, such as BERT, and then concept Definition (DEF) analysis is performed through the dependency structure tree.
2.1.3.1. Dependency parsing based on Arc-Standard transition actions and classifiers
The dependency syntax analysis is to use the dependency relationship between words in a sentence to express the syntax structure of the words, such as the structural relationships of a predicate, a verb-object, a middle and the like, and provides a basis for realizing accurate understanding of natural language.
Mainstream dependency parsing methods are divided into Graph-based (Graph-based) and Transition-based (Transition-based) two categories: graph-based approaches view dependency parsing as a problem of finding the largest spanning tree from a fully directed graph, where edges in the graph represent the likelihood that some syntactic relationship exists between two words; the method based on the transfer constructs a dependency syntax tree through a series of transfer actions such as shift-in and specification, and the learning aims to find the optimal action sequence. Compared with the method based on the graph, the method based on the transfer has lower algorithm complexity and higher analysis efficiency, and simultaneously has the analysis accuracy rate equivalent to that of the method based on the graph due to the adoption of richer features. And in recent years many deep learning techniques have been applied to the transfer-based approach.
What is taught herein is a dependency parsing approach based on classifiers and Arc-Standard transition actions, but with the exception that the concept Definition (DEF) analysis is performed at the time of the transition of the action during the parsing process.
2.1.3.1.1. Arc-Standard based transfer actions
The dependency parsing method based on the transition uses a series of syntax parsing processes represented by initial to termination states (states), one State is composed of a Stack (Stack) for storing words that have been parsed, a Buffer (Buffer) for storing words that have been parsed, and a part of the parsed dependency arcs.
In the initial state, the stack only contains one Root node (Root), and all words in the sentence are stored in the cache. One state is changed into a new state through a transfer Action (Action), and the transfer Action has three types of move (Shift), left-Reduce (Left-Reduce) and Right-Reduce (Right-Reduce). Wherein the move action pushes the first word in the cache onto the stack; generating a left pointing dependency arc between two words with left-side attribution to the top of the stack, and simultaneously pushing the second word at the top of the stack down; and generating a right pointing dependency arc between two words at the top of the stack, and simultaneously dropping the words at the top of the stack.
Through a series of transfer actions, the final state can be reached, namely, the stack only contains one root node, the cache is empty, at the moment, a complete dependency tree is just formed, and the dependency syntax analysis process of a sentence is completed.
There are many open source projects that implement the branch-based dependency syntax analysis, such as Standard NLP, syntax Net, hadamard LTP, mentioned in the above tables.
2.1.3.1.2. Based on the classifier
The dependency parsing based on transition also has a classifier to implement, where the input is a state and the output is the most likely action in that state. The traditional classifier needs to manually define a series of characteristics and various combined characteristic templates, is extremely complex, needs rich knowledge in the field of syntactic analysis, and has low accuracy.
A neural network-based classifier is completely different, and it only needs to take some scattered primitive features directly as an afferent neural network model. How the features are combined is not determined by manually written feature templates, but is automatically extracted by a hidden layer of the neural network model.
For example Google open source neural network SyntaxNet. Google researchers randomly extract sentences (source Penn Treebank) of the English news special line as a standard reference, and the Parsey McPerseface re-acquires the dependence relationship among the sentences and words, and the accuracy rate reaches 94%.
E.g. Harmony LTP. The nndeparser is a training suite of LTP neural network dependent syntactic analysis models, and a user can train to obtain the LTP dependent syntactic analysis model by using the nndeparser. The nndepparser respectively supports the training of a dependency syntax analysis model from the data manually marked with dependency syntax and the calling of the dependency syntax analysis model to carry out dependency syntax analysis on the sentence.
2.1.3.1.3. Concept network transfer action compensation
Despite the high accuracy of using neural networks, syntactic analysis remains very difficult, with the main problem being that human languages have significant ambiguity. It is not uncommon for a mid-length sentence containing 20 to 30 words to have hundreds, thousands, or even tens of thousands of possible syntactic structures. A natural language parser must be able to search through all of these structure choices and find the most reasonable structure given the context.
■ Example dependencies syntax are as follows (the following example comes from the syntactic analysis interface or platform of the project itself):
Figure SMS_28
Figure SMS_29
* Standard NLP and Syntax dependencies are as follows: http:// universaldependencies. Org/docs/u/dep/index. Html
* The Standard NLP chinese model is shown in: http:// nl p .stanford.edu/software/stanford-chinese-corenl p -2016-10-31-models.jar
* The SyntaxNet chinese model is shown in: http:// download. Tensorflow. Org/models p arsey_universal/Chinese.zi p
* The dependence of the LTP in Hagongda is shown as follows: http:// www.ltp-close. Com/intro/# sdp _ how
* The language cloud of the LTP language of Hagongda: http:// www.ltp-group
It is also proposed herein that a concept network transfer action compensation method can be introduced in the dependency parsing process. The method is based on the idea that the common knowledge is supplemented in the dependency syntax analysis process based on the transfer, so that the most reasonable syntax dependency can be found in a certain context. The process is as follows:
1) Converting concept connection in the concept network into possible transferring action among concepts;
an example of the method of conversion is as follows:
Figure SMS_30
■ VALUE relationship. The left-side concept of the relationship automatically forms an SV predicate relationship with the relationship action word concept, and the relationship action word concept automatically forms a VO predicate relationship with the right-side concept of the relationship. Therefore, referring to the above table, the left-side concept of the relationship and the relationship action word concept are converted into a possible transition action between the two concepts according to the SV main predicate relationship, and the relationship action word concept and the right-side concept of the relationship are converted into a possible transition action between the two concepts according to the VO predicate relationship;
■ The concept connection of multiple layers, namely, the generation of possible transfer actions among concepts. For example, a two-layer concept connection ATTRIBUTE-VALUE.
Figure SMS_31
■ SPECIES genus relationship. The right-hand concept of the relationship is a lower level (or subclass) of the left-hand concept and has the property of inheriting the relationship of the left-hand concept. The right-hand concept of the relationship also inherits the transfer action associated with the left-hand concept.
2) The concept word sequence (i.e. the word segmentation result after the concept recognition) determines the transfer action that each word and the following word may form in advance before the transfer action.
For example:
Figure SMS_32
/>
Figure SMS_33
3) During transferring actions, firstly, a classifier is used for judging the most possible actions in the state, then whether other possible new transferring actions exist in the possible transferring actions among concepts according to the actions, if so, a new child thread in the current state is copied for processing the new action transfer, and the specific idea is as follows:
Figure SMS_34
■ And marking the top words of the stack after the processing is finished so as to avoid repeated processing. If the stack top word has been processed, the judgment is not repeated in the possible transfer action among the concepts;
■ The new thread and the original thread work independently, namely, the states of the new thread and the original thread are independent, and the purpose of using the threads is to improve the analysis performance through parallel processing;
■ The score obtained by setting the action in this state of the new thread is equal to the score of the most probable action determined by the classifier.
4) After a series of transfer actions, a plurality of threads are generated, and each thread also reaches a termination state after a series of transfer actions, namely the stack of the original thread and each new thread only contains one root node, and the cache is empty.
■ Each thread comprises an original thread which can be judged invalid in advance and is forced to end in the process of the transfer action, so that the convergence of the result of the syntactic analysis is facilitated;
5) At this time, a plurality of syntactic analysis results, namely, dependency trees, may be generated, and an optimal result may be obtained from the dependency trees through an algorithm;
■ An example of the algorithm: and counting the action transfer of each left-right reduction, wherein the statistical contents are the score obtained by the action (namely the score of the most probable action determined by the classifier), the distance between the dependent arcs and whether the inter-concept transfer action is hit or not. Assuming that the score obtained by the action is the largest (the value after fitting by the Sigmoid function is in the range of 0-1, the value is better as being closer to 1), the distance between the dependent arcs is the smallest (the value obtained by dividing the distance by 1 is also in the range of 0-1, the value is better as being closer to 1), and the transfer action between the hit concepts (the value is 1 when hit, and the value is 0 otherwise), is the best, the cosine similarity between the vector formed by the scores and the vector with the same length and the value of 1 in the vector space is calculated, and the cosine value is closer to 1, which indicates that the result is better.
Figure SMS_35
The chinese sentence in the above example depends on the parsing result as shown in fig. 13.
2.1.3.2. Pre-trained language characterization models such as BERT, which train existing mainstream models from concept word sequences to dependent structure tree models based on pre-trained language characterization models such as BERT, have made significant breakthroughs in multiple natural language understanding tasks. Using these pre-trained models, models from the sequence of concept words to the dependency structure tree can be trained. The model can realize that the dependency structure tree is output from the concept word sequence, and then concept Definition (DEF) analysis is carried out through the output dependency structure tree.
The dependency structure tree expresses interdependencies between words in the sentence. The children depend on the parent node in the dependency structure tree, e.g., the sentence "my cat likes to eat cat food", the corresponding dependency structure tree is: "My" depends on "cat", "cat food" depends on "eating", and "cat" and "eating" both depend on "like".
Some words or words in a sentence may not have any dependency with other words because there is often no semantic meaning for some words or words in the spoken language representation, which may be ignored during the DEF analysis process described below.
Multiple dependency trees may exist in parallel in a sentence. When there are multiple dependency trees, the root of one of the dependency trees may be used as the center, and the roots of the other dependency trees depend on the root as the center. For example, the sentences "i want you me you", "i want you" are two dependency structure trees, respectively, "i want you" with "want" as the center, and "i want you" with "love" depends on "want". Therefore, the method of triple relation processing can be sufficiently effectively used.
2.1.3.3. Concept Definition (DEF) analysis
2.1.3.3.1. Concept Definition (DEF) definition
Concept Definition (DEF) refers to a specific definition or implementation of a concept, which may also be referred to as a concept instance.
Concept Definitions (DEF) are defined by DEF converters, which are configured and defined in a concept network. DEF converters generate one DEF according to one concept, and multiple concepts generate multiple DEFs.
DEF can be classified into simple DEF and complex DEF. Simple DEFs include general DEF, with additional and defined DEF, and complex DEF includes primarily both link and intermediate DEF, and masterpredicate DEF.
DEF is defined by the following classifications:
■ Simple DEF
Typically, DEF is the most basic, without any additional and defined DEF, and the resulting DEF may not support the incorporation of any other DEF. With appended DEF, meaning a DEF with a series of left appended, right appended, punctuation mark appended, the resulting DEF should support merging of other DEFs, and the merge is marked as appended. DEF with append and define means that with a series of definitions in addition to the appends described above, the DEF generated should also support merging of other DEFs and the flag is appended or defined when merged.
The language expression is arbitrary, and thus DEFs that are generated in general may support merging of other DEFs. Just for normal DEF, this process of merging may be omitted.
The concept is that when DEF is generated, if additional DEF is included, the additional DEF is merged to the DEF after the DEF is generated, and the merged DEF is marked as additional; if the DEF is additive and defined, both additive and defined DEFs are merged onto the DEF after it has been produced, and whether the merged DEF is additive or defined is marked.
■ Master-predicate-guest DEF
The host-predicate-guest DEF, contains a subject, an action, an object, a doublet or a guest complement and a series of supplemental DEFs, the DEF generated typically expresses an action, behavior or method. The DEF should support merging the subject DEF, object DEF, doublet or guest DEF, as well as merging a range of supplemental DEFs.
■ Connecting and guiding DEF
The link and intermediate DEFs, including a link or intermediate start DEF and a series of link or intermediate end DEFs, are treated separately depending on the application.
Table linkage
Expression of two DEFs before and after ligation, the DEF produced is typically a combination Grouping. The DEF should support merging DEFs that begin and end a series of DEFs with a composition type between the beginning and each end. And this combination type depends on semantics.
Introduction into the Table
The expression elicits a preceding or following DEF, whose starting DEF is often the preposition for the elicitation, and ending DEF is the core word of the elicitation section. The DEF generated is the DEF that initiated the DEF generation. The DEF should support merging a series of DEFs that end the DEF, with an intervening type between each end. And this type of invocation also depends on semantics.
The role of the DEF can generally be judged by the concept used for interfacing and guiding. A table invocation is determined if the starting DEF of the DEF is one of these concepts, otherwise a table join is determined. If the concept is a table reference, the semantic meaning of the concept is the reference type. If it is table-connected, it is further determined whether one of these concepts is present in either the start DEF or end DEF, and if so, the semantics of that concept is the composition type.
The concept of V.procedure for connection and introduction is:
Figure SMS_36
in summary, there is a general method for configuring and defining DEF converters in a conceptual network, which consists in:
Figure SMS_37
DEF converters typically generate DEF that supports merging of additional and defined DEF;
Figure SMS_38
for the right-hand concepts of SV predicate relationships, and the left-hand concepts of VO predicate relationships, their DEF converters should be able to generate DEF that expresses actions, behaviors, or methods;
Figure SMS_39
for concepts that can be used for connection and mediation, their DEF converters should be able to generate DEF that can be labeled connection and mediation, and also the combination type of the table connection and the mediation type of the table mediation. The table-mediated DEF should also support the incorporation of a series of DEFs that elicit DEFs.
Definition of concept (DEF)Essential dependency and optional dependency conditions. If the linguistic expression is incomplete, the meaning of the expression will not be understood. Also, some DEFs require the presence of DEF dependent thereon, which is referred to as a dependency condition. Conversely, if the intent of expression is understood without such DEFs being dependent thereon, then such DEFs are referred to as optional dependency conditions.
There are mainly two methods to define the essential and optional dependency conditions for DEF:
■ Words with the relationship DEF correspond directly or indirectly through concepts. The accompanying relation DEF stipulates the requirements of the definition of the concept (DEF). If this portion of DEF is missing, it will be semantically incomplete;
for example, concept ID = caru, there may be the word "because {1}" with DEF, where {1} is the essential dependency DEF for the concept. The phrase "cause is ill," and "cause is ill" on the right side of "cause" is the relationship DEF of the concept, indicating the cause. The statement "reason" lacks the necessary relationship DEF, resulting in an incomplete semantic representation.
■ Generating, by a DEF converter, a DEF with an essential dependency and an optional dependency condition;
the DEF dependency-on-demand and dependency-on-optional conditions are defined when the DEF is generated, and are referred to asBasic principle of To-depend and optional dependence conditions. There is also a limited dependency-on-demand and dependency-on-demand condition, which is associated with the statement itself, also known asDynamic dependency-on-demand and dependency-on-optional conditions
For example, concept ID = var, there may be the word "variable". The condition of essential dependence and optional dependence is not set. (1) The semantics of the expression of the statement "create variable" are incomplete, (2) the semantics of the expression of the statement "create variable whose name is c" are complete, (3) the semantics of the expression of the statement "create variable whose name is c and whose value is 100" are more complete. "create" will generate a DEF that expresses the creation of an action, while "variable" will generate a DEF that is merged onto the DEF (i.e., object DEF). Thus, "name is c" is an essential, dependent DEF for that DEF, while "value is 100" is an optional, dependent DEF.
Of course, sometimes dependency-required and dependency-optional DEF have been expressed in the above statements, and the current dependency-required and dependency-optional conditions may be satisfied. Therefore, the actual analysis process needs to be analyzed in conjunction with the context session.
There are mainly two methods to define the dynamic dependency-essential and dependency-optional conditions:
■ DEF is generated by a DEF converter that can define these conditions during the merging process. Certainly, the definition in the merging process can also be realized by technical means such as configuration;
for example, "DEF created" in the above example, the definition DEF of the DEF that sets "variable" when merging DEF of "variable" shall contain name ← → be ← → @ var. Name, while value ← → be ← → @ var. Value is the optional definition DEF;
■ A determination is made during concept Definition (DEF) operations as to whether an essential, dependent DEF is missing.
The DEF lacks its essential dependent DEF during statement understanding, and does not represent that the parsing is invalid, which can supplement the required essential dependent DEF through context or multiple dialogs.
In the above example, (1) statement "create variable", although the single round of semantic expression is unclear, the required essential dependency DEF is supplemented by the next round of statement such as "variable name is c", so that semantic understanding is complete.
The required dependency and optional dependency condition configuration and definition are typically applied in the DEF merging process for the predicate DEF. The configuration file in XML format is used for configuration and definition.
An example is shown in fig. 14, where a number of conditions are defined in a configuration file. The condition element attribute verb indicates an action, object indicates an object, subject indicates a subject, double indicates a doublet or object complement, and if-constrained indicates whether or not a condition is applicable as a qualified DEF. Further, the condition attribute or element require represents the necessary DEF. Element spec represents the definition DEF and element item represents one of the items.
2.1.3.3.2.DEF analytical procedure
The analysis process of concept Definition (DEF) is mainly a process of merging between DEFs. The analysis process mainly comprises concept connection analysis, dependency condition analysis, context analysis and the like.
The dependencies between children and parents in the dependency tree can be handled using the same idea of transfer actions. The Left child node depends on the right parent node and can be regarded as Left reduction (Left-Reduce); the child node on the Right depends on the parent node on the left and can be considered a Right Reduce (Right-Reduce).
Concept Definition (DEF) analysis is performed while the actions are transferred. One state contains the DEF Stack (Composition Stack). The DEF stack is then used to store the defined DEF. The last termination state after a series of transition actions will only contain one DEF in its DEF stack. The analysis process comprises the following steps:
■ The branch action is Shift (Shift) which pushes the first word in the cache onto the stack, and at the same time generates a general DEF for the word and pushes the DEF onto the DEF stack;
■ The transfer action is Left-Reduce (Left-Reduce) when the second DEF on the top of the DEF stack depends on the DEF on the top of the DEF stack, and the stack is dropped. The transfer action is Right Reduce (DEF) when the DEF top DEF depends on the second DEF on the top of the DEF stack and the stack is dropped.
Figure SMS_40
■ The dependency between DEFs depends on the dependency. The DEF may be converted to other types of DEF during the dependency process.
Figure SMS_41
/>
Figure SMS_42
■ It may be determined during DEF analysis whether DEF dependencies are valid. When the determination result is invalid, it indicates that the current branch action is invalid, and the current thread (including the original thread) of the syntactic analysis is determined to be invalid early and is forced to end. For example, one DEF can only depend on one according to five axioms of dependency. Therefore, if one DEF is dependent on a plurality of DEFs, it can be determined that DEF dependency is invalid, and the current syntactically analyzed thread (including the original thread) is also judged invalid early and is forced to end.
In the 70's of the 20 th century, robinson proposed four axioms for dependency relations in dependency grammar, and in the study of processing chinese information, chinese scholars proposed the fifth axiom for dependency relations: only one component in a sentence is independent; other components are directly dependent on one component; neither component can depend on two or more components; if component A is directly dependent on component B and component C is located between A and B in the sentence, then C is either directly dependent on B or is directly dependent on some component between A and B; the other components on the left and right sides of the central component are not related to each other.
■ DEF is generated according to words when the transfer action is move (Shift), so:
defining one or more DEFs according to a DEF converter of a word concept when the word corresponds to the one or more word concepts;
when a word corresponds to one or more non-word concepts, DEF converters included in the concepts identified by their concept identifier define one or more DEFs;
when a word is marked as an unknown word concept, then this DEF is not present and will be ignored during the analysis.
It can be seen that one DEF may not be present, or only one, but more than one may be present. DEFs also express language ambiguity, while the absence of DEF indicates an inability to understand the current statement.
The DEF is not constant during the analysis, and may change from one to zero (i.e., not present), multiple to one, or even zero, depending on the dependency analysis under the context session.
2.1.3.3.2.1. Concept connection analysis
The analysis is divided into the following two cases according to whether the transfer action is a possible transfer action between concepts.
■ If the transfer action is a possible transfer action between concepts, the core word and the dependent word only need to select the subordinate or the dependent word in the corresponding concept of the word
It is sufficient to equate to both concepts of the action.
And the dependent word is the central word of the dependent word dependent on the word. Then one case is to select a concept between the dependent word and the dependent word of the dependent word,
another case is to select a concept between the dependent word and its central word, so the dependent word should select only a concept that the two cases coincide with, as shown in fig. 15.
If there is no concept of coincidence for the dependency word, then DEF dependency may be assumed to be invalid;
if there are concepts that are not overlapped, then these non-overlapping concepts are removed from the selected concepts and their census word is processed up and its dependency word is processed down. Concepts connected to these non-overlapping concepts are removed from the concepts selected by these words. And then processed up and down centered on these words. For the same reason, DEF evidence may also be considered invalid if the words do not have a notion of coincidence. As shown in fig. 16.
■ If the transfer action is not a possible transfer action between concepts, this step can be skipped unless the concept connections in the concept network can all be converted to possible transfer actions between concepts, and it is necessary to determine whether there is a reasonable concept connection between the dependent word and its core word, and between the dependent word and its dependent word from the concept network, and to select concepts for these words.
If the word is scaled to unknown word concepts, it should be considered reasonable;
if the concept of the word can be connected with any other concept but is not reflected in the concept network, the word is considered to be reasonable;
it should be considered reasonable if it is within a coherent multi-layer conceptual connection. The inheritance derivation of the context (or parent-child) containing the SPECIES genus;
for example, the concept connection of this invention is said to be "@ speciales → object ← speciales → var, which expresses the speciales genus relationship between this ← → var, so the concept connection of @ quantity ← → var can be derived from the concept connection of this ← ATTRIBUTE → quantity ← VALUE → @ quantity.
This method can also be used to judge the rationality of a connection between more than two concepts.
When the concept network is used as a graph, the concept is a node of the graph, the concept is connected to a path of the graph, and the problem can be regarded as a two-point path searching problem of the graph, so that the graph theory algorithms such as the shortest path and depth-first traversal can be selected to realize;
if no reasonable conceptual connection exists between the deduced words, it should also be decided depending on the actual application scenario:
Figure SMS_43
if the application is in a particular domain and scenario, it should generally be determined that the DEF dependency is invalid; if the application is applied in a general field such as chat, the randomness of language and expression needs to be considered, so that the concept connection can be reasonable.
The two cases above analyze the rationality of the connection between the two concepts. However, other DEFs besides the generic DEF are all concerned with the rationality of the connection between the various concepts. Consider primarily the DEF with a definition, the Master Slave DEF.
■ With defined DEF
A priority decision defines connection reasonableness inside DEF;
if the defined DEF is a main predicate DEF, judging whether the concept of the defined DEF is a relational action word concept or not, (1) if yes, further judging whether an object is missing or not, if yes, judging that the object is unreasonable, otherwise judging the connection reasonableness of the subject or the object DEF and the current DEF; (2) if not, further judging whether the subject or object DEF is absent in the limited DEF, if both are absent or not, judging the connection rationality of the limited DEF and the current DEF, and if one is absent, judging the connection rationality of the limited DEF and the current DEF;
if a defined DEF is a connection or intermediate DEF, then (1) if the play lists of the defined DEF are connected, determining respective connection reasonableness of the start DEF and the series of end DEFs of the defined DEF to the current DEF; (2) determining the connection rationality of a beginning DEF and a series of end DEFs attached to the defined DEF, respectively, with a current DEF if the action table for the defined DEF is referenced;
if DEFs are defined as other DEFs, then a determination is made as to the connection rationality between the defined DEF and the current DEF.
■ Master-predicate-guest DEF
If there is a conjunctive or a guest-complemented DEF in the subject predicate DEF, considering the connection rationality of subject-predicate-concept ID = action and object-guest-complement in english, and considering the connection rationality of subject-predicate-concept ID = action and conjunctive-object in chinese;
if there is no conjunctive or guest-complemented DEF in the subject-predicate DEF, then the subject-predicate-object connection rationality is considered.
For example, the Chinese statement "create a name in the above example is a See fig. 17:
Figure SMS_44
Figure SMS_45
2.1.3.5.2.2. dependency condition analysis
The necessary and optional dependency conditions of concept Definition (DEF) are analyzed. The XML formatted configuration file described herein configures and defines these conditions. It is therefore easy to determine whether the sentence satisfies these conditions.
For example, the Chinese statement "create a name in the above example is a The variable of (c):
Figure SMS_46
if the essential and optional dependency conditions are not met, then a context analysis session is recorded to inform the user of the missing conditions using natural language generation techniques and to be able to receive DEF matching these conditions in the next round of statements.
2.1.3.5.2.3. Context analysis
The DEF is defined in a context session using the concept that "first-in-last-out" queues are stored valid. The number of queues may set an upper limit beyond which the most advanced enqueue may be removed.
When the last entry in the queue is a DEF to be merged, then that DEF is removed from the queue and merged to the DEF currently being analyzed. The DEF to be merged is typically table-indexed DEF, pushed into a queue during DEF operation.
When there are dependency conditions to be matched in the queue, the DEF currently being analyzed needs to determine whether these dependency conditions are satisfied, if so, (1) superimpose the DEF on the DEF to which these dependency conditions belong, which can be run when the DEF in the queue is satisfied for all of its necessary dependency conditions to be matched; (2) the current DEF is no longer pushed to the queue, while the superimposed DEF is moved to the forefront; (3) the superimposed DEF will inherit the dependent conditions to be matched if there are such conditions in the current DEF.
And judging whether the criterion is met: the current DEF is taken as the DEF indicated by the to-be-matched dependency condition of the judged DEF, and then whether the DEF is satisfied is judged according to the analysis of the upper section dependency condition. The overlay refers to DEF as indicated by the matched dependent condition.
For example:
Figure SMS_47
2.1.4. concept Definition (DEF) operation
Concept Definition (DEF) is a specific definition or implementation of a concept, is an object generated during the execution of a program, and thus can directly execute a process.
Concept Definition (DEF) requires different processing methods in different application scenarios, as shown in fig. 18.
A process for DEF-based operation that enables accurate generation and manipulation of UI interfaces, data, programming language code, and generation of reply DEF is set forth herein.
There may be multiple valid concept Definitions (DEF) through concept Definition (DEF) dependency analysis, but not all DEF operation results are successful. These DEFs are all independent and do not interfere with each other during runtime context running sessions. The results of the operation of multiple DEFs may be consistent, but may also be different, which just accounts for language ambiguity. If all DEF runs fail, that means that the statement is not understandable under the current session, it is also considered to supplement the understanding by the following or multiple rounds of dialog.
For example, in fig. 19, two statements "create a variable named a" and "create two variables named a" are shown, and although their analysis results are in agreement with each other, the statements
The operating conclusions are quite different. The right side frame DEF analysis in the figure concludes-assuming that the concept @ quality ← → the connection between names is reasonable. Preconditions for DEF operation:
■ Is valid and all of its necessary dependency conditions are satisfied;
■ A run method under a context run session has been implemented;
2.1.4.1.GScript GTL
a new approach to DEF-based operation is set forth herein that enables the accurate generation and manipulation of UI interfaces, data, programming language code, or the generation of a reply DEF. The method is based on GTL template engine technology of GScript language. The technology can analyze or compile and execute the GTL text file to generate a GScript language code, and then call different instruction function libraries to analyze or compile and execute the code to generate other programming languages according to the type of the GTL text file through a GScript engine. These generated programming languages can be executed by their corresponding executors or tools, for example, the HTML language can be executed by a browser.
The GScript language based GTL template engine technology implements DEF operations, as shown in FIG. 20.
The run mode is typically set in a context run session. These operation modes include a human-machine interaction mode (hci), a development mode (dev), a hybrid mode (dev + hci), and the like.
The text mainly describes two modes of a man-machine interaction mode and a development mode. The man-machine interaction mode is divided into two sub-modes of UI and NoUI according to the existence of the UI interface. The UI submode is commonly used for equipment with screens such as computers and mobile phones, and the NoUI submode is commonly used for equipment without screens such as intelligent sound boxes. The development mode in turn divides the sub-modes according to a computer language supportable by the GScript language. The current version of the GScript language already supports: java, js (javascript), html, css, xml, json, text. The GTL template engine calls GTL text files with different formats to execute in different operation modes.
The man-machine interaction mode, the development mode and the GTL text file format correspond to the following modes:
Figure SMS_48
v GTL template engine supports <%! % > label syntax. By using the label grammar, other GTL text files can be introduced, even nested introduction is carried out, so that a key practical problem is solved: "because of the existence of a plurality of operation modes and submodes, the workload of developing the GTL text file is greatly increased, however, the workload is repeated for many times, and the workload of development is greatly reduced.
The GTL template engine may select a GTL text file to execute as follows.
■ If the DEF has been specified a GTL text file, then the file is selected for execution;
■ Otherwise, the following steps are carried out:
expression of DEF of actions, behaviors or methods
Selecting a GTL text file to execute according to the concept ID of the action.
For example, concept ID = create, representing a create action, hci _ json/create.gtl _ json automatically selected in the human-machine interaction mode, and dev _ json/create.gtl _ json automatically selected in the development submode json.
Combined (Grouping) DEF
And traversing the combined members, and combining the processing results according to the current operation mode and the combination type.
For example, the statement "create one variable named a and create another variable named b". "create a variable named a" and "create a further variable named b" are combined DEFs that are traversed sequentially through the process, and the GScript code they generate merges.
Table-mediated DEF
This DEF is typically dependent on other DEFs and is therefore ignored as not being processed. But needs to be pushed into the context analysis session queue and marked as DEF to be merged.
Other DEFs
If the concept of the current DEF is subject to concept ID = object, the action of the table lookup is defaulted (concept ID = query), that is, the DEF is converted to express an action, behavior, or method and then processed.
If the sentence is 'news', for example, the sentence is converted into the sentence 'inquiry news' to be processed;
■ If the corresponding GTL text file cannot be found for execution, the processing cannot be carried out, namely, the statement cannot be understood.
The GTL template engine allows external incoming cache data so that DEF can be incoming as cache data at execution time, so that those DEFs that are dependent (i.e., merged) can be processed during execution.
V. for example
Figure SMS_49
/>
Figure SMS_50
2.1.4.2. Context operations
If the necessary dependency conditions are found not to be satisfied during the DEF run, then a record is made in the context run session to inform the user of the missing conditions using natural language generation techniques and to be able to receive components matching those conditions in the next round of statements.
If the DEF operation is successful, at the end of the operating session, proceed as follows:
■ Pushing the DEF into a DEF queue of a context session;
■ If the current DEF is a DEF that expresses an action, behavior, or method, an event (event) is formed for storage into an event queue of the context session. Meanwhile, the concept ID = event and the noun concept generated by the object form a SPECIES SPECIES relationship;
■ And respectively storing the objects and the relationships among the objects newly generated in the DEF operation process into an object queue and an object relationship queue of the context session. The set of objects also belongs to the object;
■ Synchronously modifying an object queue and an object relationship queue of a context session according to object attribute change (including change) and relationship change among objects in the DEF operation process;
queues of events, objects, object relationships, etc. in a context session play a key role in the method internal implementation (e.g., reference resolution) and subsequent natural language generation of the DEF operation.
2.1.5. Contextual conversation
The context session runs through the entire analysis, running process. Both the analysis session and the running session are generated by a context session.
■ Analyzing sessions
The analysis conversation acts on an analysis link, and the life cycle of the analysis conversation starts from sentence analysis and ends from: (1) no valid DEFs were analyzed (2) one DEF run was successful (3) all DEFs run to completion.
■ Running sessions
The run session is dependent on the analysis session, and acts on the run segment, with its lifecycle starting at DEF operation until the end of DEF operation. When the DEF is successfully run, the data contained (e.g., events, objects, relationships between objects, etc. in the running session) is incorporated into the contextual session before the analysis session, the running session, etc. is completed, to facilitate analysis, running of the context or next turn of the dialog.

Claims (10)

1. A natural language understanding method based on concept network includes the following steps: sequentially carrying out the following steps on natural language: segmenting concept words, performing concept definition dependency analysis and performing concept definition operation;
wherein the content of the first and second substances,
the segmentation concept words mainly comprise three links of segmentation words, part of speech tagging and concept identification;
segmenting words, and selecting a sequence tagging model based on characters, an n-element grammar model based on words, and a thought based on one or a combination of a plurality of methods in a word bank;
part-of-speech tagging is based on a statistical model; words in the concept network directly or indirectly corresponding to the concepts can specify parts of speech, and if no words with specific parts of speech exist, the step of part of speech tagging is skipped;
and (3) concept identification: identifying a concept for each word in the segmentation result based on the concept network, wherein one word can correspond to one or more word concepts or word-free concepts, and the words which are not identified with any concept are marked as unknown word concepts, so that the segmentation result can be called a concept word sequence;
concept definition dependency analysis is mainly achieved by several methods:
performing dependency syntax analysis based on the Arc-Standard transfer action and the classifier, and performing concept definition analysis when the action is transferred in the analysis process;
performing model training from the concept word sequence to the dependency structure tree based on a pre-training language characterization model such as BERT; concept definition operation: a concept definition is a specific definition or implementation of a concept and is an object generated during the running of a program.
2. The concept-network based natural language understanding method of claim 1, wherein: before the concept word segmentation, the concept word segmentation also comprises segmentation sentences, and the methods for segmenting the sentences mainly comprise two methods: firstly, the method is realized based on a sentence division symbol rule; and secondly, the sentence division symbol realization is based on training/recognition.
3. The concept network-based natural language understanding method of claim 1, wherein: the segmentation process of the combination of the multiple methods is as follows:
1) Dynamically adding words directly or indirectly corresponding to concepts in the concept network into the dictionary; if the word or the combined word is the word with the relation component, the relation component is ignored, the word is taken out, and then the word is added into the dictionary;
2) Segmenting words of sentences by utilizing a maximum entropy hidden Markov model, a conditional random field CRF, seq2seq of a bidirectional LSTM or other word-based sequence labeling model segmentation methods; this word segmentation result can be used as the final word segmentation result;
3) Segmenting words of sentences by a full segmentation method based on a dictionary;
4) Combining the words segmented by the two methods 1) -3), and only keeping one of the words at the same position and in the same way;
5) And then using a binary model, a shortest distance, reverse maximum matching, forward maximum matching, reverse minimum matching, forward minimum matching or other algorithms to eliminate ambiguity to obtain a final word segmentation result.
4. The concept-network based natural language understanding method of claim 3, wherein: different word segmentation results may be generated by different algorithms in the word segmentation process combined by multiple methods, priorities are set for the algorithms respectively, and then the algorithms are traversed according to the priorities, wherein the process is as follows:
i. firstly, segmenting a word segmentation result by using an algorithm with high priority;
and ii, performing subsequent semantic understanding processing on the word segmentation result, ending the processing if the processing is successful, or selecting an algorithm with high priority from the algorithms which are not traversed and then segmenting the word segmentation result. If the word segmentation result is processed, continuing to select the next algorithm with high priority, otherwise, repeating the step ii;
until the semantic understanding process is successful, or the algorithm is traversed in its entirety.
5. The concept-network based natural language understanding method of claim 1, wherein: the statistical model of the part-of-speech tagging is a Hidden Markov Model (HMM).
6. The concept network-based natural language understanding method of claim 1, wherein: the concept definition is defined by a DEF converter, and the DEF converter is configured and defined in the concept network; the DEF converter generates a DEF according to a concept, and the concepts generate DEFs; concept definition there are basic and dynamic essential and optional dependency conditions.
7. The concept network-based natural language understanding method of claim 1, wherein: also included is a reply DEF, generated according to the intent to be expressed, generated by the natural language generation techniques described herein; the context session runs through the entire analysis, running process.
8. The method of claim 1 for understanding applications of chapters/paragraphs, single-turn/multi-turn conversations.
9. Use according to claim 8, characterized in that: the UI interface, data, programming language code, or DEF reply can be generated and manipulated based on the understanding of the chapters/paragraphs, single-turn/multi-turn conversations.
10. Use according to claim 9, characterized in that: the GScript language based GTL template engine techniques generate and manipulate UI interfaces, data, programming language code, or generate reply DEF.
CN202210484567.5A 2022-02-20 2022-05-04 Natural language understanding method based on concept network Pending CN115859955A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210484567.5A CN115859955A (en) 2022-05-04 2022-05-04 Natural language understanding method based on concept network
PCT/CN2023/077271 WO2023155914A1 (en) 2022-02-20 2023-02-20 Concept network for artificial intelligence and natural language understanding and generation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210484567.5A CN115859955A (en) 2022-05-04 2022-05-04 Natural language understanding method based on concept network

Publications (1)

Publication Number Publication Date
CN115859955A true CN115859955A (en) 2023-03-28

Family

ID=85660068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210484567.5A Pending CN115859955A (en) 2022-02-20 2022-05-04 Natural language understanding method based on concept network

Country Status (1)

Country Link
CN (1) CN115859955A (en)

Similar Documents

Publication Publication Date Title
Zhang et al. SG-Net: Syntax guided transformer for language representation
US7945527B2 (en) Methods and systems for interpreting text using intelligent glossaries
Chakraborty et al. Introduction to neural network‐based question answering over knowledge graphs
CN103593335A (en) Chinese semantic proofreading method based on ontology consistency verification and reasoning
Antoniou et al. A survey on semantic question answering systems
CN114528846A (en) Concept network for artificial intelligence and generation method thereof
Chen et al. Type-directed synthesis of visualizations from natural language queries
Mezghanni et al. Deriving ontological semantic relations between Arabic compound nouns concepts
Longo et al. A framework for cognitive chatbots based on abductive–deductive inference
CN115345153A (en) Natural language generation method based on concept network
Mvumbi Natural language interface to relational database: a simplified customization approach
Diker et al. Creating CREATE queries with multi-task deep neural networks
CN115935943A (en) Analysis framework supporting natural language structure calculation
Di Buono Information extraction for ontology population tasks. An application to the Italian archaeological domain
Kedwan NLQ into SQL translation using computational linguistics
Longo et al. AD-CASPAR: Abductive-Deductive Cognitive Architecture based on Natural Language and First Order Logic Reasoning.
WO2023155914A1 (en) Concept network for artificial intelligence and natural language understanding and generation method thereof
CN115859955A (en) Natural language understanding method based on concept network
CN113468875A (en) MNet method for semantic analysis of natural language interaction interface of SCADA system
Litvin et al. Ontology-driven development of dialogue systems
CN115344181A (en) Man-machine interaction system and implementation method and application thereof
Rajbhoj et al. DocToModel: Automated Authoring of Models from Diverse Requirements Specification Documents
Sun et al. Study of Natural Language Understanding
Dubey Towards Complex Question Answering over Knowledge Graphs.
Taye Ontology alignment mechanisms for improving web-based searching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination