CN106705974B - Semantic role labeling and semantic extraction method for non-restricted path natural language - Google Patents

Semantic role labeling and semantic extraction method for non-restricted path natural language Download PDF

Info

Publication number
CN106705974B
CN106705974B CN201611264509.2A CN201611264509A CN106705974B CN 106705974 B CN106705974 B CN 106705974B CN 201611264509 A CN201611264509 A CN 201611264509A CN 106705974 B CN106705974 B CN 106705974B
Authority
CN
China
Prior art keywords
word
path
language
semantic
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611264509.2A
Other languages
Chinese (zh)
Other versions
CN106705974A (en
Inventor
张珂
陈奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201611264509.2A priority Critical patent/CN106705974B/en
Publication of CN106705974A publication Critical patent/CN106705974A/en
Application granted granted Critical
Publication of CN106705974B publication Critical patent/CN106705974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

A semantic role labeling and semantic extraction method of an unrestricted path natural language comprises the steps of firstly collecting Chinese path natural language corpora under an unrestricted condition and establishing a Chinese path natural language corpus; then, a semantic role labeling method based on language block analysis and dependency syntax analysis is used for realizing automatic labeling of path natural language corpora; and finally, according to the semantic role labeling result, dividing the path units in sequence, and extracting the navigation semantic information of the path units. The invention uses a semantic role labeling method based on language block analysis and dependency syntax analysis to label the path natural language semantic roles, and divides path units according to the extracted semantic role labeling result, and finally extracts the semantic information of each path unit. The method can accurately divide the path units according to the sequence and accurately extract the path semantic information, so that the smooth implementation of the robot road-asking navigation can be guided.

Description

Semantic role labeling and semantic extraction method for non-restricted path natural language
Technical Field
The invention relates to a semantic role labeling and semantic extraction method of natural language for robot navigation, belonging to the technical field of data processing.
Background
Natural language processing is an important branch in the field of artificial intelligence, which aims at studying methods for communication between computers and people using natural language. Natural language processing is not a simple study of the grammatical and syntactic relationships of natural language, but rather a study of systems that can effectively implement computer communications or human-computer interactions based on natural language, which is part of computer science. The use of natural language to realize human-computer interaction means that a computer can not only understand natural language but also express ideas, intentions, and the like using natural language. The former is natural language understanding, the latter is natural language generation.
People have long sought to use natural language to interact with computers, because it has both important theoretical and obvious practical significance. Firstly, people can use the computer by using the natural language which is used to, and do not need to spend a great deal of time to learn various complex computer languages; secondly, people can further understand the mechanism of human language ability and intelligence through the system.
Language is an important method for human communication, and if human-computer interaction can be performed through natural language, a robot can be controlled through the natural language. If the robot can be controlled in natural language, the robot can be controlled by ordinary people at will. The robot is controlled by the natural language more conveniently than other methods, and the method is more in line with the communication habit of human beings. Modern robot technology is rapidly developed under the promotion of sensor technology, computer technology and artificial intelligence, wherein the mobile robot has mobility and autonomy, and is widely applied to the fields of service, detection, logistics and the like. One of the core technologies of a mobile robot is a navigation technology, particularly an autonomous navigation technology. Among them, autonomous navigation using a natural language controlled robot is becoming a research focus. Researchers hope that the robot can be controlled by natural language to complete a navigation task in the future, and the navigation task is the basis of other complex tasks, so that navigation through natural language is the basis for realizing other navigation tasks, and has important significance for developing artificial intelligence.
In the field of path natural language processing, robot navigation based on english path natural language processing has been developed primarily, and researchers have focused on determining a robot navigation path through research on verbs in combination with corresponding environments. The research on the natural language of the chinese route is still immature, and there is a large gap compared with the natural language processing of the english route. Lezao legendary small dragon with horns et al employ a hierarchical finite state automata-based approach to analyze orientation relationships in natural languages. Zhang Xueliang et al proposed a path natural language processing method based on urban traffic by studying the grammar of Chinese path natural language. Liu Yu et al put forward an NLRP analysis method based on restricted Chinese on the basis of deep research of verbs in natural languages of the routes. A jiang civilization et al performed preliminary analysis of path natural language processing from two natural language processing methods based on statistics and rules. Lexinde et al propose a method for path natural language processing based on block analysis. After a small amount of linguistic data are collected under a specific environment, a plurality of main semantic components are provided, and strong connection between syntax and semantics of the Chinese path natural language is found. The semantic block extraction method comprises the steps of carrying out a series of processing on linguistic data such as word segmentation, noun entity recognition, block analysis and the like, and extracting semantic blocks in path natural languages by adopting a block analysis-based method. And finally, extracting the semantic information effective to the constructed path for each extracted semantic language block according to the corresponding slot. However, these chinese path natural language processing methods all have certain technical problems:
1. in the traditional Chinese path natural language processing method, only a small amount of linguistic data collected in a specific environment are used as research objects, and the actual natural language path description is diversified, so that a larger-scale path natural language corpus under an unrestricted condition needs to be established for path natural language processing research work;
2. under the condition of the unrestricted path natural language, the relationship between the syntactic relationship and the semantics of the path natural language is more complex, and the semantic role marking of the path natural language is difficult to accurately carry out only by the existing semantic role marking method of the path natural language, so that the accuracy of navigation semantic information extraction is influenced;
3. in order to perform robot route-asking navigation, accurate navigation semantic information needs to be obtained on the basis of semantic role labeling, but path natural language often comprises a plurality of ordered path units, so that the problem that how to divide the path units and extract semantic information from each path unit is needed to be solved at present.
Disclosure of Invention
The invention aims to provide a semantic role labeling and semantic extraction method of an unrestricted path natural language aiming at the defects of the prior art so as to improve the accuracy of the robot navigation semantic information extraction.
The problems of the invention are solved by the following technical scheme:
a semantic role labeling and semantic extraction method of an unrestricted path natural language comprises the steps of firstly collecting Chinese path natural language corpora under an unrestricted condition and establishing a Chinese path natural language corpus; then, a semantic role labeling method based on language block analysis and dependency syntax analysis is used for realizing automatic labeling of path natural language corpora; and finally, according to the semantic role labeling result, dividing the path units in sequence, and extracting the navigation semantic information of the path units.
The method for semantic role labeling and semantic extraction of the unrestricted path natural language comprises the following steps of:
a. performing word segmentation and part-of-speech tagging on the linguistic data in the linguistic data base by using an NLPIR Chinese word segmentation system;
b. block analysis of a path-natural language using Conditional Random Fields (CRF)
Firstly, carrying out language block classification on a path natural language corpus, wherein the language block classification comprises 7 language blocks containing semantics and 1 boundary language block, the 7 language blocks containing semantics are respectively direction conversion (DT), target finding according to Reference (RT), target advancing without reference (MT), direct target finding (ST), Preposition Reference (PR), Space Conversion (SC) and target advancing according to reference (FR), and carrying out manual language block labeling on an original language corpus according to the defined language block classification to obtain a language block analysis language corpus; secondly, carrying out language block labeling on each word by adopting an IOB2 labeling method, wherein each word has only three states, namely the word is at the beginning of the language block, the word is in the interior of the language block and the word does not belong to any language block, and the three states are respectively represented by 'B-X', 'I-X' and 'O', wherein 'X' represents the type of the language block; then, determining a language block analysis characteristic template according to the current word and the context environment thereof; finally, training a CRF model according to the customized feature template and the training corpus to obtain the probability distribution of the model, and obtaining the probability value of a word given a certain language block label symbol according to the probability distribution of the model and the context environment of the word, thereby realizing the automatic labeling of the language block;
c. dependency parsing for a path native language using CRF
Firstly, determining a dependency syntax labeling method, and performing dependency syntax labeling on a training corpus: the label of each word is defined as follows: [ +/- ] dPOS, where "+" indicates that the dominant word follows the dependent word, "-" indicates that the dominant word precedes the dependent word, POS indicates the part of speech of the dominant word, and "d" indicates the number of occurrences between the dependent word and the dominant word and the number of occurrences where the dominant word has the same part of speech; then, defining a dependency syntactic analysis characteristic template; finally, a CRF model is trained by using a training corpus according to the dependency syntax analysis feature template, so that dependency syntax automatic labeling is realized;
d. semantic role labeling of path natural language by using CRF (conditional random field) based on language block analysis and dependency syntax analysis results
Firstly, dividing semantic roles into a core semantic role and an additional semantic role; then, semantic role labeling is carried out on the training corpus according to the defined semantic role category; secondly, the results of the language block analysis and the dependency syntax analysis are used as features, and a semantic role labeling feature template is established; and finally, training a CRF (fuzzy F) model by using a training corpus according to the semantic role marking feature template to realize automatic semantic role marking.
The method for semantic role labeling and semantic extraction of the unrestricted path natural language comprises the following steps of dividing path units and extracting navigation semantic information of the path units:
⑴ inputting a complete semantic angle color block and its contained words, part of speech labels, language block labels and dependency syntax labels;
⑵ judging the type of each word block in sequence, if not, turning to step ⑶, if it is, the path unit extraction is finished;
⑶ judging semantic role labeling type, if A0, going to step ⑷, if V, going to step ⑸, if DIR, going to step ⑹, if LOC, going to step ⑺, if A1, going to step ⑼;
⑷ judging the part of speech of the words in A0, filling the words with n (noun) parts of speech into the motion body module in sequence;
⑸ judging the part of speech of the words in V, and filling the words with the part of speech V (verb) into the motion control module in sequence;
⑹ judging the part of speech of the words in the DIR, and filling the words with the part of speech f (direction words) into the direction control module;
⑺ judging the type of the words in LOC, if PR, filling the words into the motion reference module, if SC, going to step ⑻;
⑻ judging the number of words with part of speech n (noun) in SC, if the number of noun is 1, going to step ⑽, if the number of noun is more than 1, going to step ⑾;
⑼ judging the number of words with part of speech n (noun) in A1, if the number of nouns is 1, going to step ⑽, if the number of nouns is more than 1, going to step ⑾;
⑽ judging the dependency relationship of the noun, if the word depends on the verb (v), filling the word into the sports target, if the word depends on the non-verb, adding the depended word into the noun until the new noun depends on the verb, and filling the new noun into the sports target;
⑾ judging the dependency relationship among nouns, if they are parallel, taking them as a whole noun, then going to step ⑽, if one noun a depends on another noun b, filling the noun a into the motion reference module, the noun b is processed according to step ⑽;
⑿ judging whether the motion control module and the motion target module are not empty, if not, turning to step ⑴, if yes, the path unit extraction is finished.
The invention uses a semantic role labeling method based on language block analysis and dependency syntax analysis to label the path natural language semantic roles, and divides path units according to the extracted semantic role labeling result, and finally extracts the semantic information of each path unit. The method can accurately divide the path units according to the sequence and accurately extract the path semantic information, so that the smooth implementation of the robot road-asking navigation can be guided.
Drawings
FIG. 1 is a flow diagram of semantic role labeling;
fig. 2 is a flow chart of path unit division and semantic extraction.
Detailed Description
The invention will be further explained with reference to the drawings.
The invention provides a Chinese-oriented path natural language processing method, which comprises the following 3 steps:
1. collecting Chinese path natural language corpora under an unlimited condition, and establishing a Chinese path natural language corpus.
2. And realizing automatic labeling of path natural language corpora by using a semantic role labeling method based on language block analysis and dependency syntax analysis.
3. And according to the semantic role labeling result, dividing the path units in sequence, and extracting navigation semantic information of the path units.
Step 1, collecting Chinese path natural language linguistic data under the unrestricted condition, and establishing a Chinese path natural language corpus. In order to collect sufficient unrestricted path natural language corpora, the invention uses Webots for Nao to build 10 3D simulation environments, then records 10 videos according to the simulation environments and gives a top view of each environment, and gives a navigation task for the robot in each environment (namely giving an initial position and a target position of the robot). Under the condition of not informing task details, 100 volunteers with different ages and different professions freely select a path capable of completing a navigation task and give corresponding natural language path description by watching videos and top views, in order to ensure the generality of the corpus, the selected 100 volunteers are aged from 12 years to 60 years, and the education degree is from schoolchildren to major, and thus the volunteers are distributed all over the country. Volunteers freely select descriptive sentences in an unrestricted environment, and finally 1000 linguistic data are collected to form a Chinese path natural language corpus.
The 2 nd step is semantic role labeling based on language block analysis and dependency syntax analysis, and the specific flow is shown in fig. 1.
(1) And performing word segmentation and part-of-speech tagging on the linguistic data in the linguistic database by using an NLPIR Chinese word segmentation system.
(2) Conditional Random Fields (CRF) are used to perform block analysis on the path natural language. Firstly, the path natural language corpus is divided into language block categories, and the invention determines 7 language blocks containing semantics and 1 boundary language block, as shown in table 1. And manually labeling the language blocks of the original language database according to the defined language block types to obtain a language block analysis language database. Secondly, for the problem of language block analysis, carrying out language block labeling on each word, specifically adopting an IOB2 labeling method, wherein each word has only three states, namely the word is at the beginning of the language block, the word is in the interior of the language block and the word does not belong to any language block, and the three states are respectively represented by 'B-X', 'I-X' and 'O', wherein 'X' represents the type of the language block. Then, a speech block analysis feature template is determined from the current word and its context, and the speech block analysis feature template defined by the present invention is shown in table 2. And finally, training a CRF model by using a training corpus according to the feature sets, namely training to obtain an optimal parameter vector Lambda ═ Lambda according to a customized feature template and the training corpus1,…λKAnd after parameter estimation of the CRF model is completed, obtaining probability distribution of the model, and obtaining a probability value of a word given with a certain word block label symbol according to the probability distribution of the model and the context environment of the word, thereby realizing automatic labeling of the word block.
TABLE 1 language block definitions
Figure GDA0002302730720000051
TABLE 2 feature templates for language block analysis
Figure GDA0002302730720000061
(3) And performing dependency syntax analysis on the path natural language by using the CRF. Firstly, determining a dependency syntax labeling method, and performing dependency syntax labeling on a training corpus. The label of each word is defined as follows: [ +/- ] dPOS, where "+" indicates that the dominant word follows the dependent word, "-" indicates that the dominant word precedes the dependent word, POS indicates the part of speech of the dominant word, and "d" indicates the number of words present between the dependent word and the dominant word that have the same part of speech. Then, the present invention defines a dependency parsing feature template, as shown in Table 3. And finally, training a CRF model by using a training corpus according to the feature sets, and realizing automatic labeling of dependency syntax.
TABLE 3 feature templates for dependency parsing
Figure GDA0002302730720000071
(4) And performing semantic role labeling on the path natural language by using the CRF based on the result of the language block analysis and the dependency syntax analysis. First, the present invention divides semantic roles into core semantic roles and additional semantic roles, Arg + number (abbreviated with a + number) representing core semantic roles (core identification), wherein Arg0 generally represents the action's event, Arg1 generally represents the action's event, Arg2-Arg4 have different semantic meanings according to the predicate verb, V is also a core semantic role representing a predicate, ArgM-represents an adjunct component, where x represents the function of the adjunct component, and the adjunct component is classified as shown in table 4. And then, semantic role labeling is carried out on the training corpus according to the defined semantic role category. Then, the result of the speech block analysis and the dependency syntax analysis is used as a feature to establish a semantic role labeling feature template, and the semantic role labeling feature template based on the speech block analysis and the dependency syntax analysis defined by the present invention is shown in table 5. And finally, training a CRF model by using a training corpus according to the feature sets to realize automatic semantic role labeling.
TABLE 4 adjunct ingredients
Figure GDA0002302730720000081
TABLE 5 semantic role tagging feature templates based on chunk analysis and dependency syntax analysis
Figure GDA0002302730720000082
And 3, dividing the path units according to the semantic role labeling result, and further extracting the navigation semantic information of each path unit. A path element consists of four parts: motion body, motion direction, motion control, motion target. The path unit division and semantic extraction flow is shown in fig. 2.
(1) And inputting a complete semantic angle color block and words, part of speech labels, language block labels and dependency syntax labels contained in the semantic angle color block.
(2) Judging the type of each word block in sequence. If not, turning to the step (3); if the path is 'B-PT', the path unit extraction is finished.
(3) And judging the semantic role marking type. If A0, go to step (4); if the voltage is V, turning to the step (5); if the DIR exists, turning to the step (6); if LOC, go to step (7); if A1, go to step (9).
(4) And judging the part of speech of the word in A0, and filling the word with the part of speech n (noun) into the motion body module in sequence.
(5) And judging the part of speech of the words in the V, and sequentially filling the words with the part of speech V (verbs) into the motion control module.
(6) And judging the part of speech of the words in the DIR, and sequentially filling the words with the part of speech f (directional words) into the direction control module.
(7) And judging the type of the language block of the words in the LOC. If PR, fill the word into the motion reference module. If the result is SC, go to step (8).
(8) The number of words with part of speech n (noun) in SC is determined. If the noun number is 1, go to step (10); if the number of nouns is greater than 1, go to step (11).
(9) The number of words with part of speech n (noun) in a1 is determined. If the noun number is 1, go to step (10); if the number of nouns is greater than 1, go to step (11).
(10) The dependency relationship of the noun is determined. If the word depends on the verb (v), filling the word into the moving target; if the word depends on the non-verb, the depended word is added into the noun until the new noun depends on the verb, and then the new noun depends on the verb and is filled into the motion target.
(11) And judging the dependency relationship among the nouns. If the nouns are in parallel relationship, then the nouns are taken as a whole noun, and the step (10) is carried out; if one of the nouns a depends on the other noun b, filling the noun a into the motion reference module, and processing the noun b according to the step (10).
(12) And judging whether the motion control module and the motion target module are not empty. If not, turning to the step (1); if yes, the path unit extraction is completed.
The invention provides a path natural language semantic role labeling method based on language block analysis and dependency syntax analysis, aiming at the characteristic that the relationship between the syntax relationship and the semantic of a path natural language is more complex under the condition of an unlimited path natural language. Firstly, according to the characteristics of a path natural language, a proper language block type and dependency syntax type definition method is provided, a language block analysis characteristic template and a syntax analysis characteristic template are designed, and CRF is utilized to realize high-accuracy language block automatic labeling and dependency syntax automatic labeling; secondly, in order to extract path semantic information facing to robot question-path navigation, a new semantic role category and a semantic role labeling characteristic template based on language block analysis and dependency syntax analysis are designed, and high-accuracy path natural language semantic role labeling is realized by adopting CRF.
In order to perform robot route-asking navigation, accurate navigation semantic information needs to be obtained on the basis of semantic role labeling, but a path natural language often comprises a plurality of ordered path units. The invention provides a path unit dividing and semantic extracting method based on semantic role marking information, which can accurately divide path units according to a sequence and simultaneously accurately extract path semantic information, thereby guiding a robot to smoothly implement road asking navigation.
Experimental analysis:
to illustrate the above advantages, experimental methods were employed for verification. Dividing all the corpora in the corpus according to the proportion of 4:1, wherein 80% of the corpora are training corpora, and 20% of the corpora are testing corpora. And then training a training corpus according to the defined characteristic template to obtain a corresponding training model, and testing the accuracy of the obtained training model by using a test corpus.
Experiment one: speech block analysis experiment
Table 6 shows the analysis results of the respective blocks. The accuracy P and the recall R, F1 of the speech block analysis are very high, which shows that the partition type determined by the invention is reasonable and is suitable for path natural language processing. The higher speech block analysis accuracy rate also meets the requirement of semantic role labeling on the basis of the subsequent speech block analysis.
TABLE 6 analysis of the blocks (%)
Figure GDA0002302730720000101
Experiment two: dependency syntactic analysis experiment
Table 7 shows the dependency parsing result. The automatically labeled mean value using dependency syntax represents the dependency syntax analysis result, which, although relatively low in accuracy, recall, and F1 values as compared to the chunk analysis, is still numerically high and is complementary to the chunk analysis from the semantic character label recognition perspective.
Table 7 dependency syntax analysis result (%)
Figure GDA0002302730720000102
Experiment three: semantic role labeling experiment based on language block analysis and dependency syntax analysis
Table 8 shows semantic role labeling results based on chunk analysis and dependency syntax analysis. The accuracy, the recall rate and the F1 value of the method are high, which indicates the effectiveness of the method; and for the robot navigation task, the path unit is mainly composed of the following 4 parts: the predicate, the direction word, the reference and the target are exactly in V, AM-DIR and A1 respectively, and the labeling effect of the predicate, the direction word, the reference and the target is better than the average value, which shows that the method of the invention is very suitable for extracting the path semantic information of the path natural language.
TABLE 8 semantic role labeling results based on chunk analysis and dependency syntax analysis (%)
Figure GDA0002302730720000111
Experiment four: the method of the invention is utilized to realize the path semantic information extraction process of the example sentence.
Example sentence: the robot walks forwards to the bed, leftwards to the chair, turns right, walks to the wall, leftwards to the doorway, passes through the door, enters the living room, walks forwards to the vase, rightwards to the space between the sofa and the table, turns left, and takes the pop can on the table.
Word segmentation and part of speech tagging results:
the robot/n forward/vi walk/v to/v bed/n,/wd go/p left/f walk/v to/v chair/n side/f,/wd go/d right/f turn/v,/wd straight/d walk/v to/v wall/n, go/p left/f walk/v to/v doorway/s,/wd go/pdiref pass/v door/n,/wd enter/v living room/n,/wd forward/vi walk/v to/v vase/n side/f,/wd go/p right/f walk/v to/v sofa/n and/cc table/n side/f, the/wd turns to/p left/f/v, the/wd brings/v table/n/f/ude 1 pop can/n. /wj
And (3) analyzing the language blocks:
[ robot/n Forward/vi walk/v to/v bed/n ] MT [,/wd ] PT [ go/p left/f ] DT [ go/v to/v chair/n side/f ] MT [,/wd ] PT [ re/d ] O [ go/p right/f turn/v ] DT [,/wd ] PT [ go/v to/v wall/n ] MT [,/wd ] PT [ go/p left/f ] DT [ go/v to/v doorway/s ] MT [ go/p right/f ] DT [,/wd ] PT [ pass/v door/n ] SC [,/wd ] PT [ enter/v living room/n ] SC [,/wd ] PT [ go/v side/n/f ] PT [ forward/vi ] O [ go/v to/v/n side/f ] MT [, the/wd PT [ go/p right/f ] DT [ go/v to/v sofa/n and/cc table/between/f MT [,/wd PT [ go/p left/f turn/v ] DT [,/wd PT [ na/v table/n on/f/ude 1 pop can/n ] RT [. /wj ] PT
Dependency parsing results:
[ robot/n ] +1V/SBV [ forward/vi ] +1V/ADV [ walk/V ]0/ROOT [ to/V ] -1V/CMP [ bed/n ] -1V/POB [, wd ] O [ go/P ] +1V/ADV [ left/F ] -1P/POB [ walk/V ] -2V/COO [ to/V ] -1V/CMP [ chair/n ] +1F/ATT side/F ] -1V/POB [,/wd ] O [ re/d ] +1V/ADV [ go/P ] +1/ADV [ right/F ] -1P/POB [ turn/V ] -2V/COO [,/wd ] O [ go/d ] +1V/ADV [ walk/V ] -1V/COO [ to/V ] -1V/CMP [ wall/n ] -1V/POB [,/wd ] O [ to/P ] +1V/ADV [ left/F ] -1P/POB [ walk/V ] -2V/COO [ to/V ] -1V/CMP [ door/s ] -1V/POB [,/wd ] O [ to/P ] +1V/ADV [ right/F ] -1P/POB [ cross/V ] -2V/COO [ door/n ] -1V/VOB [,/wd ] O [ enter/V ] -1V/COO [ living room/n ] -1V/VOB [,/wd ] O [ forward/vi ] +1V/ADV [ walk/V ] -1V/COO [ to/V ] -1V/CMP [ vase/n ] +1F [,/ ATT [ side/F ] -1V/POB [,/wd ] O [ side/P ] +1V/ADV [ right/F ] -1P/POB [ walk/V ] -2V/COO [ side/V ] -1V/CMP [ sofa/N ] +1F/ATT [ and/cc ] +1N/RAD [ table/N ] -1N/COO [ between/F ] -1V/POB [,/wd ] O [ side/P ] +1V/ADV [ left/F ] -1P/POB [ turn/V ] -2V/COO [,/wd ] O [ na/V ] -1V/COO [ table/N ] +1F/ATT [ up/F ] +1N/ATT [/ude 1] -1F/ATT [ can/N ] -1V/ VOB [. W j O
Semantic role labeling result:
[ robot/n ] A0[ Forward/vi ] ADV [ go/V to/V ] V [ bed/n ] A1[,/wd ] O [ go/p left/f ] AM-DIR [ go/V to/V ] V [ chair/n side/f ] A1[,/wd ] O [ re/d ] AM-ADV [ go/p right/f ] AM-DIR [ turn/V ] V [,/wd ] O [ straight/d ] AM-ADV [ go/V to/V ] V [ wall/n ] A1[,/wd ] O [ go/p left/f ] AM-DIR [ go/V to/V ] doorway V [ A1[,/wd ] O [ go/V/s ] A1[,/p right/f ] AM-DIR [ pass/V ] V [ door/n ] A1[, [ go/V ] door/n ] A1[, [ w ]0[ enter/V ] V [ living room/n ] A1[, [ w ] d ] [ forward/vi ] AM-ADV [ go/V to/V ] V [ vase/n side/f ] A1[, [ w ] O [ go/p right/f ] AM-DIR [ go/V to/V ] V [ sofa/n and/cc table/n between/f ] A1[, [ w ] O [ go/p left/f ] AM-DIR [ turn/V ] V [, [ w ] O [ V ] table/n/f/ude 1 can/n ] A1 [. W j O
The following 9 path units are obtained in sequence.
1. A motion body: robot
The direction is as follows: air (forward default)
And (3) motion control: walking machine
Moving the target: bed
2. The direction is as follows: left side of
And (3) motion control: walking machine
Moving the target: chair (Ref. TM. chair)
3. The direction is as follows: right side
And (3) motion control: walking machine
Moving the target: wall with a plurality of walls
4. The direction is as follows: left side of
And (3) motion control: walking machine
Moving the target: door opening
5. The direction is as follows: right side
And (3) motion control: through the hole
Moving the target: door with a door panel
6. The direction is as follows: air conditioner
And (3) motion control: enter into
Moving the target: parlor
7. The direction is as follows: air conditioner
And (3) motion control: walking machine
Moving the target: flower vase
8. The direction is as follows: right side
And (3) motion control: walking machine
Moving the target: between sofa and table 9, direction: left side of
And (3) motion control: take
Moving the target: a pop can on the table.

Claims (1)

1. A semantic role labeling and semantic extraction method of an unrestricted path natural language is characterized in that the method comprises the steps of firstly collecting Chinese path natural language corpora under an unrestricted condition and establishing a Chinese path natural language corpus; then, a semantic role labeling method based on language block analysis and dependency syntax analysis is used for realizing automatic labeling of path natural language corpora; finally, according to the semantic role labeling result, dividing the path units according to the path natural language sequence, and extracting the navigation semantic information of the path units;
the automatic labeling method for the path natural language corpus comprises the following steps:
a. performing word segmentation and part-of-speech tagging on the linguistic data in the linguistic data base by using an NLPIR Chinese word segmentation system;
b. conditional random field CRF (conditional random field) for language block analysis of path natural language
Firstly, carrying out language block classification on a path natural language corpus, wherein the language block classification comprises 7 language blocks containing semantics and 1 boundary language block B-PT, the 7 language blocks containing semantics are respectively direction conversion DT, a reference searching target RT, a non-reference target advancing MT, a direct searching target ST, a preposition reference PR, a space conversion SC and a reference advancing FR, and carrying out manual language block labeling on an original language corpus according to the defined language block classification to obtain a language block analysis language corpus; secondly, carrying out language block labeling on each word by adopting an IOB2 labeling method, wherein each word has only three states, namely the word is at the beginning of the language block, the word is in the interior of the language block and the word does not belong to any language block, and the three states are respectively represented by 'B-X', 'I-X' and 'O', wherein 'X' represents the type of the language block; then, determining a language block analysis characteristic template according to the current word and the context environment thereof; finally, training a CRF model according to the customized feature template and the training corpus to obtain the probability distribution of the model, and obtaining the probability value of a word given a certain language block label symbol according to the probability distribution of the model and the context environment of the word, thereby realizing the automatic labeling of the language block;
c. dependency parsing for a path native language using CRF
Firstly, determining a dependency syntax labeling method, and performing dependency syntax labeling on a training corpus: the label of each word is defined as follows: [ +/- ] dPOS, where "+" denotes that the dominant word follows the dependent word, "-" denotes that the dominant word precedes the dependent word, POS denotes the part of speech of the dominant word, "d" denotes a number, which denotes the number of words that exist between the dependent word and the dominant word and that have the same part of speech as the dominant word; then, defining a dependency syntactic analysis characteristic template, wherein semantic role marking types comprise an action actor A0, an action victim A1, a predicate V, an adverb mark ADV, a location lattice LOC and a direction mark DIR; finally, a CRF model is trained by using a training corpus according to the dependency syntax analysis feature template, so that dependency syntax automatic labeling is realized;
d. semantic role labeling of path natural language by using CRF (conditional random field) based on language block analysis and dependency syntax analysis results
Firstly, dividing semantic roles into a core semantic role and an additional semantic role; then, semantic role labeling is carried out on the training corpus according to the defined semantic role category; secondly, the results of the language block analysis and the dependency syntax analysis are used as features, and a semantic role labeling feature template is established; finally, training a CRF (model frequent learning) model by using a training corpus according to the semantic role marking feature template to realize automatic semantic role marking;
the steps of dividing the path unit and extracting the navigation semantic information of the path unit are as follows:
⑴ inputting a complete semantic angle color block and its contained words, part of speech labels, language block labels and dependency syntax labels;
⑵ judging the type of each word block in sequence, if not, turning to step ⑶, if it is, then path unit extraction is completed;
⑶ judging semantic role labeling type, if A0, going to step ⑷, if V, going to step ⑸, if DIR, going to step ⑹, if LOC, going to step ⑺, if A1, going to step ⑼;
⑷ judging the part of speech of the words in A0, and filling the words with n parts of speech into the motion main body module;
⑸ judging the part of speech of the V words, and filling the sequence of the V words into the motion control module;
⑹ judging the part of speech of the words in the DIR, and filling the words with the part of speech of f direction words into the direction control module;
⑺ judging the type of the words in LOC, if PR, filling the words into the motion reference module, if SC, going to step ⑻;
⑻ judging the number of words with part of speech n in SC, if the number of nouns is 1, going to step ⑽, if the number of nouns is more than 1, going to step ⑾;
⑼ judging the number of the words with part of speech n in A1, if the number of the nouns is 1, going to step ⑽, if the number of the nouns is more than 1, going to step ⑾;
⑽ judging the dependency relationship of the noun, if the word depends on the verb v, filling the word into the sports target, if the word depends on the non-verb, adding the depended word into the noun until the new noun depends on the verb, and filling the new noun into the sports target;
⑾ judging the dependency relationship among nouns, if they are parallel, taking them as a whole noun, then going to step ⑽, if one noun a depends on another noun b, filling the noun a into the motion reference module, the noun b is processed according to step ⑽;
⑿ judging whether the motion control module and the motion target module are not empty, if not, turning to step ⑴, if yes, the path unit extraction is finished.
CN201611264509.2A 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language Active CN106705974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611264509.2A CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611264509.2A CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Publications (2)

Publication Number Publication Date
CN106705974A CN106705974A (en) 2017-05-24
CN106705974B true CN106705974B (en) 2020-05-12

Family

ID=58905583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611264509.2A Active CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Country Status (1)

Country Link
CN (1) CN106705974B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491556A (en) * 2017-09-04 2017-12-19 湖北地信科技集团股份有限公司 Space-time total factor semantic query service system and its method
CN109522551B (en) * 2018-11-09 2024-02-20 天津新开心生活科技有限公司 Entity linking method and device, storage medium and electronic equipment
CN113139380A (en) * 2020-01-20 2021-07-20 华为技术有限公司 Corpus screening method and apparatus
EP3859587A1 (en) 2020-01-29 2021-08-04 Tata Consultancy Services Limited Robotic task planning for complex task instructions in natural language

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514157A (en) * 2013-10-21 2014-01-15 东南大学 Path natural language processing method for indoor intelligent robot navigation
CN105718442A (en) * 2016-01-19 2016-06-29 齐鲁工业大学 Word sense disambiguation method based on syntactic analysis
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514157A (en) * 2013-10-21 2014-01-15 东南大学 Path natural language processing method for indoor intelligent robot navigation
CN105718442A (en) * 2016-01-19 2016-06-29 齐鲁工业大学 Word sense disambiguation method based on syntactic analysis
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Route Natural Language Processing Method for Robot Navigation";Ke Zhang;《Procee dings of the IEEE International Conference on Information and Automation》;20160803;正文第915-920页 *
"基于序列标注的中文依存句法分析方法";计峰;《计算机应用于软件》;20091031;第26卷(第10期);正文第133-135页 *
"面向机器人导航的汉语路径自然语言组块分析方法研究";王浩;《电脑知识与技术》;20160428;第12卷(第10期);正文第181-186页 *

Also Published As

Publication number Publication date
CN106705974A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN110287494A (en) A method of the short text Similarity matching based on deep learning BERT algorithm
CN106705974B (en) Semantic role labeling and semantic extraction method for non-restricted path natural language
CN110704621B (en) Text processing method and device, storage medium and electronic equipment
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN106021227B (en) A kind of Chinese Chunk analysis method based on state transfer and neural network
CN111291156B (en) Knowledge graph-based question and answer intention recognition method
CN110516245A (en) Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium
Liu et al. Image captioning based on deep neural networks
CN112990296B (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
CN107844473B (en) Word sense disambiguation method based on context similarity calculation
Shimizu et al. Learning to follow navigational route instructions
CN106503192A (en) Name entity recognition method and device based on artificial intelligence
CN108984661A (en) Entity alignment schemes and device in a kind of knowledge mapping
CN110866093A (en) Machine question-answering method and device
CN110309277B (en) Man-machine conversation semantic analysis method and system, vehicle-mounted man-machine conversation method and system, controller and storage medium
CN110232127B (en) Text classification method and device
Xie et al. Knowledge base question answering based on deep learning models
CN115357719A (en) Power audit text classification method and device based on improved BERT model
CN112836487A (en) Automatic comment method and device, computer equipment and storage medium
CN110032736A (en) A kind of text analyzing method, apparatus and storage medium
CN114841353A (en) Quantum language model modeling system fusing syntactic information and application thereof
CN110377691A (en) Method, apparatus, equipment and the storage medium of text classification
CN114049501A (en) Image description generation method, system, medium and device fusing cluster search
CN108255818B (en) Combined machine translation method using segmentation technology
CN114722774B (en) Data compression method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant