CN106705974A - Semantic role tagging and semantic extracting method of unrestricted path natural language - Google Patents
Semantic role tagging and semantic extracting method of unrestricted path natural language Download PDFInfo
- Publication number
- CN106705974A CN106705974A CN201611264509.2A CN201611264509A CN106705974A CN 106705974 A CN106705974 A CN 106705974A CN 201611264509 A CN201611264509 A CN 201611264509A CN 106705974 A CN106705974 A CN 106705974A
- Authority
- CN
- China
- Prior art keywords
- word
- semantic
- language
- noun
- path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a semantic role tagging and semantic extracting method of an unrestricted path natural language. The method comprises the steps that firstly Chinese path natural language linguistic data under an unrestricted condition is collected and a Chinese path natural language linguistic database is built; secondly, automatic tagging of the path natural language linguistic data is achieved by using the semantic role tagging method based on chunk analysis and dependency grammar analysis; finally, according to a semantic role tagging result, path unit division is sequentially conducted, and navigational semantic information of path units is extracted. Path natural language semantic role tagging is conducted by using the semantic role tagging method based on chunk analysis and dependency grammar analysis, according to the extracted semantic role tagging result, path unit division is conducted, and finally the semantic information of the path units is extracted. By means of the method, path unit division can be sequentially and accurately conducted, the path semantic information can be accurately extracted, and accordingly the method can provide guidance for smooth implementation of asking for directions and navigation with a robot.
Description
Technical field
The present invention relates to a kind of semantic character labeling and semantic extracting method of the natural language for robot navigation, category
In technical field of data processing.
Background technology
Natural language processing is an important branch in artificial intelligence field, and it used between computer and people with being studied
The method that natural language is communicated is target.Natural language processing is not the grammer and syntax for simply studying natural language
Relation, but research can effectively realize the system that compunication or man-machine interaction are carried out based on natural language, it is to calculate
A part for machine science.Realize that man-machine interaction means that computer is not only understood that natural language using natural language, moreover it is possible to make
With natural language expressing thought, intention etc..The former is natural language understanding, and the latter is spatial term.
People pursue always for a long time carries out man-machine interaction using natural language and computer, because it is existing important
Theory significance, while also there is obvious practical significance.First, the natural language that the mankind can be accustomed to oneself uses computer,
And learn the computer language of various complexity without devoting a tremendous amount of time again;Secondly, people also can be further by it
Solve the language ability of the mankind and the mechanism of intelligence.
Language is the important method of Human communication, if man-machine interaction can be carried out by natural language, then just can be with
Robot is controlled by natural language.If robot can be controlled with natural language, then robot also can be by ordinary people
Indiscriminately ad. as one wishes control.Control robot easier than other method by natural language, also more meet the exchange custom of the mankind.
Modern machines people technology is fast-developing under the promotion of sensor technology, computer technology and artificial intelligence, wherein moving machine
The fields such as people is because with mobility and capacity of self-government, being widely used in servicing, detecting, logistics.The core skill of mobile robot
One of art is airmanship, particularly autonomous navigation technology.Wherein, independent navigation is being carried out just using natural language control robot
It is increasingly becoming study hotspot.Researchers wish that future can control robot to complete navigation task by natural language, and lead
Boat task is also the basis of other complex tasks, therefore it is the base for realizing other navigation tasks by natural language navigate
Plinth, has great significance to development artificial intelligence.
In path natural language processing field, the robot navigation based on English path natural language processing has had just
Step development, researcher lays particular emphasis on and determines robot navigation path with reference to corresponding environment by the research to verb.And the Chinese
The research of language path natural language is also very immature, has larger gap compared to the natural language processing of English path.Pleasure is small
Legendary small dragon with horns et al. analyzes the position relation in natural language using the method based on layering finite-state automata.Zhang Xueying et al. leads to
Cross the grammer of the Chinese path natural language of research, it is proposed that the path natural language processing method based on urban transportation.Liu Yu etc.
People proposes a kind of NLRP analysis methods based on restricted Chinese on the basis of verb in furtheing investigate path natural language.Jiang
Civilization et al. has carried out preliminary point from the footpath natural language processing of satisfying the need of two kinds of natural language processing methods based on statistics and rule
Analysis.Li Xinde et al. proposes a kind of method of the path natural language processing based on chunk parsing.They are under particular circumstances
After have collected a small amount of language material, propose several main semantic components, find Chinese path natural language syntax and semantic it
Between there is stronger contact.They carry out a series for the treatment of such as participle, noun Entity recognition and chunk parsing to language material, take
Method based on chunk parsing extracts the semantic language block in outbound path natural language.Finally, each semantic language block to extracting
The semantic information effective to build path is extracted according to corresponding cell body.But these Chinese path natural language processing methods are equal
In the presence of certain technical problem:
1. conventional Chinese path natural language processing method only with a small amount of language material collected in certain circumstances as grinding
Object is studied carefully, and the natural language path description of reality is then varied, so fairly large under the conditions of needing to set up untethered
Path database for natural language, for path natural language processing research work;
2. under the conditions of the natural language of untethered path, the relation between the syntactic relation and semanteme of path natural language is more
It is complexity, only relies on existing path natural language semantic character labeling method, it is difficult to carries out path natural language language exactly
Adopted character labeling, so as to influence the degree of accuracy of navigational semantic information extraction;
3. navigation is asked the way, it is necessary to obtain accurate navigational semantic letter from the basis of semantic character labeling in order to carry out robot
Breath, but path natural language frequently includes orderly multiple path units, so how to divide path unit, and from each road
It is the problem for needing to solve at present that semantic information is extracted in the unit of footpath.
The content of the invention
A kind of drawback it is an object of the invention to be directed to prior art, there is provided the semantic angle of untethered path natural language
Colour code is noted and semantic extracting method, to improve the degree of accuracy of robot navigation's Semantic features extraction.
Problem of the present invention is solved with following technical proposals:
The semantic character labeling and semantic extracting method of a kind of untethered path natural language, methods described is collected non-first
Chinese path natural language language material under confined condition, sets up Chinese path database for natural language;Then use based on language block point
Analyse the automatic marking of the footpath natural language language material that satisfied the need with the semantic character labeling method realization of interdependent syntactic analysis;Finally according to language
Adopted character labeling result, divides path unit, and extract the navigational semantic information of path unit in sequence.
The semantic character labeling and semantic extracting method of above-mentioned untethered path natural language, satisfy the need footpath natural language language material
Automatic marking method it is as follows:
A. participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed;
B. chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need
First, the footpath natural language language material that satisfies the need carries out language block category division, and language block classification includes that 7 include semantic language block
With 1 border language block, 7 be respectively direction conversion (DT) comprising semantic language block, according to reference to finding target (RT), without reference
Advanced (MT) to target, directly find target (ST), preposition reference (PR), space conversion (SC) and advance according to reference to target
(FR), the language block classification according to definition carries out manual language block mark to original language material storehouse, obtains chunk parsing corpus;Secondly,
Language block mark is carried out to each word using IOB2 mask methods, each word only have three kinds of states, the i.e. word language block beginning,
The word is not belonging to any language block in the inside of language block and the word, represents these three states with " B-X ", " I-X " " O " respectively, wherein
" X " represents language block type;Then, chunk parsing feature templates are determined by current word and its context environmental;Finally, according to customization
Feature templates and training corpus training CRF models, the probability distribution of model is obtained, by the probability distribution and word of model
Context environmental obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking;
C. interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need
First, determine interdependent syntax mask method, and interdependent syntax mark is carried out to training corpus:By the label of each word
It is defined as form:[+/-] dPOS, wherein "+" represent governing word after dependent, and "-" represents governing word in dependent
Before, POS represents the part of speech of governing word, and " d " represents to exist between dependent and governing word have identical part of speech with governing word
Quantity;Then, interdependent syntactic analysis feature templates are defined;Finally, training corpus is used according to interdependent syntactic analysis feature templates
CRF models are trained in storehouse, realize interdependent syntax automatic marking;
D. semantic character labeling is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result
First, semantic role is divided into core semantic role and additional semantic role;Then, according to the semantic angle of definition
Color classification carries out semantic character labeling to training corpus;Then, using chunk parsing and the result of interdependent syntactic analysis as spy
Levy, set up semantic character labeling feature masterplate;Finally, trained using training corpus according to semantic character labeling feature masterplate
CRF models, realize semantic role automatic marking.
The semantic character labeling and semantic extracting method of above-mentioned untethered path natural language, divide path unit and extraction
The step of path unit navigational semantic information, is as follows:
Be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax
Mark;
(2) the language block type of each word is judged in order, if not " B-PT ", then go to and (3) walk;If " B-PT ", then
Path unit is extracted and completed;
(3) judge semantic character labeling type, if A0, then go to and (4) walk;If V, then go to and (5) walk;If
DIR, then go to and (6) walk;If LOC, then go to and (7) walk;If A1, then go to and (9) walk;
(4) the part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person module;
(5) the part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module;
(6) the part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling mould
Block;
(7) judge the language block type of word in LOC, if PR, word is filled into motion reference module;If SC, go to
(8) walk;
(8) part of speech is the quantity of the word of n (noun) in judging SC, if noun quantity is 1, goes to the and (10) walks;If noun
Quantity is more than 1, then go to the and (11) walk;
(9) part of speech is the quantity of the word of n (noun) in judging A1;If noun quantity is 1, goes to the and (10) walk;If noun
Quantity is more than 1, then go to the and (11) walk;
(10) judge the dependence of the noun, if the word depends on verb (v), the word is filled into moving target;If
The word depends on non-verb, then will add noun until new term depends on verb by interdependent word, then fills out new noun is interdependent
It is charged into moving target;
(11) judge the dependence between noun, if noun is coordination, as an overall noun, return again to
(10) walk;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b according to
(10) step treatment;
(12) judge whether motion-control module and moving target module are not empty, if it is not, go to the (1) walking;If so,
Then path unit is extracted and completed.
The present invention carries out path nature language using the semantic character labeling method based on chunk parsing and interdependent syntactic analysis
Speech semantic character labeling, path unit division is carried out according to the semantic character labeling result for extracting, and finally extracts each path list
The semantic information of unit.The method can in sequence divide path unit exactly, and accurately extract path semantic information, therefore
Can guidance machine people ask the way the smooth implementation of navigation.
Brief description of the drawings
Fig. 1 is semantic character labeling flow chart;
Fig. 2 is that path unit is divided and extraction of semantics flow chart.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
The present invention proposes a kind of path natural language processing method towards Chinese, and the method includes following 3 steps:
1. Chinese path natural language language material under the conditions of untethered is collected, Chinese path database for natural language is set up.
2. realized for path nature language using the semantic character labeling method based on chunk parsing and interdependent syntactic analysis
The automatic marking of speech material.
3., according to semantic character labeling result, path unit is divided in sequence, and extract the navigational semantic of path unit
Information.
1st step collects Chinese path natural language language material under the conditions of untethered, sets up Chinese path database for natural language.
In order to be collected into the untethered path natural language language material of abundance, the present invention has built 10 using Webots for Nao
3D simulated environments, then record 10 sections of videos and provide the top view of each environment according to simulated environment, are robot at each
A navigation task (providing initial position and the target location of robot) is given in environment.Do not informing the feelings of task details
Under condition, look for the volunteer of 100 all ages and classes, different occupation by watching video and top view, unrestricted choice one can be with complete
Into the path of navigation task and corresponding natural language path description is provided, in order to ensure the generality of language material, selected 100
From 12 years old to 60 years old, schooling acquired master from small, and native place is distributed throughout the country for position volunteer's age distribution.Volunteer
The unrestricted choice descriptive statement under untethered environment, finally have collected 1000 language materials, constitute Chinese path natural language language
Material storehouse.
2nd step is the semantic character labeling based on chunk parsing and interdependent syntactic analysis, idiographic flow such as Fig. 1 institutes
Show.
(1) participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed.
(2) chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need.First, satisfy the need footpath natural language
Language material carries out language block category division, present invention determine that 7 include semantic language blocks and 1 border language block, as shown in table 1.Root
Manual language block mark is carried out to original language material storehouse according to the language block classification of definition, chunk parsing corpus is obtained.Secondly, for language block
Problem analysis, language block mark is carried out to each word, and specific using IOB2 mask methods mark, each word only has three kinds of states,
I.e. the word is not belonging to any language block in the beginning of language block, the word in the inside of language block and the word, respectively with " B-X ", " I-X " " O "
These three states are represented, wherein " X " represents language block type.Then, determine that chunk parsing is special by current word and its context environmental
Template is levied, the chunk parsing feature templates of present invention definition are as shown in table 2.Finally, training language is used according to these characteristic sets
Material storehouse training CRF models, i.e., according to the feature templates and training corpus of customization, training obtain optimal parameter vector Λ=
{λ1,…λK, complete CRF models parameter Estimation after, can obtain the probability distribution of model, by model probability distribution and
The context environmental of word obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking.
The language block of table 1 is defined
(3) interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need.First, interdependent syntax mask method is determined, and
Interdependent syntax mark is carried out to training corpus.It is following form by the tag definition of each word:[+/-] dPOS, wherein "+" is represented
After dependent, "-" represented governing word before dependent to governing word, and POS represents the part of speech of governing word, and " d " represents subordinate
There is the quantity that there is identical part of speech with governing word between word and governing word.Then, it is special invention defines interdependent syntactic analysis
Template is levied, as shown in table 3.Finally, CRF models are trained using training corpus according to these characteristic sets, realizes interdependent syntax
Automatic marking.
The feature templates of the interdependent syntactic analysis of table 3
(4) semantic role mark is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result
Note.First, semantic role is divided into core semantic role and additional semantic role by the present invention, and Arg+ numerals are (using A+ numerals
It is abridged), represent core semantic role (core argument), the agent of wherein Arg0 generally expression actions, Arg1
The word denoting the receiver of an action of usual expression action, Arg2-Arg4 has different semantic meanings according to the difference of predicate verb, and V is also a seed nucleus
Heart semantic role, for representing predicate, ArgM-* represents attachment component, and * here represents the function of attachment component, attachment component
Classification is as shown in table 4.Then, the semantic role classification according to definition carries out semantic character labeling to training corpus.Then, by language
The result of block analysis and interdependent syntactic analysis sets up semantic character labeling feature masterplate, the base of present invention definition as feature
It is as shown in table 5 in chunk parsing and the semantic character labeling feature templates of interdependent syntactic analysis.Finally, according to these characteristic sets
CRF models are trained using training corpus, semantic role automatic marking is realized.
The attachment component of table 4
Table 5 is based on the semantic character labeling feature templates of chunk parsing and interdependent syntactic analysis
3rd step is to carry out path unit division according to semantic character labeling result, and then extracts each path unit
Navigational semantic information.One path unit is made up of four parts:Moving person, the direction of motion, motion control, moving target.
Path unit is divided and extraction of semantics flow is as shown in Figure 2.
(1) be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax
Mark.
(2) the language block type of each word is judged in order.If not " B-PT ", then go to (3rd) step;If " B-PT ",
Then path unit is extracted and completed.
(3) semantic character labeling type is judged.If A0, then (4th) step is gone to;If V, then (5th) step is gone to;If
DIR, then go to (6th) step;If LOC, then (7th) step is gone to;If A1, then (9th) step is gone to.
(4) part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person mould
Block.
(5) part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module.
(6) part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling
Module.
(7) the language block type of word in LOC is judged.If PR, word is filled into motion reference module.If SC, turn
To (8th) step.
(8) part of speech is the quantity of the word of n (noun) in judging SC,.If noun quantity is 1, then go to (10th) step;
If noun quantity is more than 1, (11st) step is gone to.
(9) part of speech is the quantity of the word of n (noun) in judging A1.If noun quantity is 1, (10th) step is gone to;If
Noun quantity is more than 1, then go to (11st) step.
(10) dependence of the noun is judged.If the word depends on verb (v), the word is filled into moving target;
If the word depends on non-verb, noun will be added by interdependent word until new term depends on verb then new noun is interdependent
It is filled into moving target.
(11) dependence between noun is judged.If noun is coordination, as an overall noun, then turn
To (10th) step;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b
Processed according to (10th) step.
(12) judge motion-control module and moving target module whether not for empty.If it is not, going to (1st) step;If
It is that then path unit is extracted and completed.
The present invention is directed under the conditions of the natural language of untethered path, between the syntactic relation and semanteme of path natural language
Relation it is increasingly complex the characteristics of, it is proposed that a kind of path natural language semanteme angle based on chunk parsing and interdependent syntactic analysis
Colour code injecting method.First, according to path natural language feature, appropriate language block classification and interdependent syntactic category definition side is given
Method, and chunk parsing feature templates and syntactic analysis feature masterplate are devised, the language block for realizing high accuracy using CRF is automatic
Mark and interdependent syntax automatic marking;Secondly, asked the way the path semantic information of navigation to extract object manipulator, devised
New semantic role classification and the semantic character labeling feature masterplate based on chunk parsing and interdependent syntactic analysis, using CRF
Realize the path natural language semantic character labeling of high accuracy.
Navigation is asked the way, it is necessary to obtain accurate navigational semantic letter from the basis of semantic character labeling in order to carry out robot
Breath, but path natural language frequently includes orderly multiple path units.The present invention is proposed based on semantic character labeling letter
The path unit of breath is divided and semantic extracting method, and the method accurately can in sequence divide path unit, and simultaneously
Accurately extract path semantic information, the smooth implementation of navigation so that guidance machine people asks the way.
Experimental analysis:
In order to illustrate above advantage, verified using experimental technique.By all language materials in corpus according to 4:1 ratio
Divided, wherein 80% is training corpus, 20% is testing material.Then the feature templates according to definition train training corpus
Storehouse, obtains corresponding training pattern, then the training pattern obtained with testing material library test accuracy rate.
Experiment one:Chunk parsing is tested
Table 6 is each chunk parsing result.The accuracy rate P of chunk parsing, recall rate R, F1 value are all very high, illustrate this hair
It is rational to specify fixed piecemeal type, it is adaptable to path natural language processing.Chunk parsing accuracy rate higher is also complied with and connect
Get off to be carried out on the basis of chunk parsing the requirement of semantic character labeling.
The chunk parsing result (%) of table 6
Experiment two:Interdependent syntactic analysis experiment
Table 7 is interdependent syntactic analysis result.Interdependent syntactic analysis knot is represented using the average value of interdependent syntax automatic marking
Really, although compared with chunk parsing, its accuracy rate, recall rate, F1 values are all relatively low, but numerically still higher, and
And from the point of view of semantic character labeling identification angle, interdependent syntactic analysis is the supplement of chunk parsing.
The interdependent syntactic analysis result (%) of table 7
Experiment three:Tested based on chunk parsing and the semantic character labeling of interdependent syntactic analysis
Table 8 is the semantic character labeling result based on chunk parsing and interdependent syntactic analysis.The accuracy rate of the method, recall
The value of rate and F1 values is all very high, illustrates the validity of the method;And for robot navigation's task, path unit master
To be made up of following 4 parts:Predicate, direction word, with reference to, target, this 4 parts just respectively in V, AM-DIR, A1, and
Their mark effect is even more better than average value, illustrates that the method for the present invention is highly suitable for the path language of path natural language
Adopted information extraction.
Table 8 is based on the semantic character labeling result (%) of chunk parsing and interdependent syntactic analysis
Experiment four:The path Semantic features extraction process of example sentence is realized using the inventive method.
Example sentence:Robot goes to forward bed, goes to by chair to the left, then turns right, and directly goes to wall, and doorway is gone to the left,
Through door, into parlor, go to by vase forward, go between sofa and desk to the right, turn left, by easy-to-draw on desk
Tank.
Participle and part-of-speech tagging result:
Robot/n forward/vi walks/v to/v/n ,/wd walked to/p a left sides/f/v to/v chairs/n by/f ,/wd again/d is to/p
The right side/f turns/v, and/wd is straight/d walks/v to/v walls/n, and/v to/v doorways/s is walked to/p a left sides/f, and/wd passes through/v/n ,/wd to the/p right sides/f
Into/v parlors/n ,/wd forward/vi walks/v to/v vases/n by/f ,/wd walked to/p the right sides/f/v to/v sofas/n and/cc desks/
Between n/f ,/wd turns/v to/p a left sides/f, and/wd takes/and v desks/n is upper/f /ude1 pop cans/n./wj
Chunk parsing result:
[robot/n forward/vi walk/v to/v/n] MT [,/wd] PT [to/p a left sides/f] DT [walk/v to/v chairs/n by/
F] MT [,/wd] PT [again/d] O [turning/v to the/p right sides/f] DT [,/wd] PT [straight/d walks/v to/v walls/n] MT [,/wd] PT is [to/p
A left side/f] DT [walking/v to/v doorways/s] MT [to/p right sides/f] DT [,/wd] and PT [through/v/n] SC [,/wd] and PT [entrance/v visitors
The Room/n] SC [,/wd] PT [forward/vi] O [walk/v to/v vases/n by/f] MT [,/wd] PT [to/p right sides/f] DT [walks/v to/v
Between sofa/n and/cc desks/n/f] MT [,/wd] PT [turning/v to/p a left sides/f] DT [,/wd] PT [take/v desks/n is upper/f/
Ude1 pop cans/n] RT [./wj]PT
Interdependent syntactic analysis result:
[robot/n]+1V/SBV [forward/vi]+1V/ADV [walking/v] 0/ROOT [to/v] -1V/CMP [bed/n] -1V/
POB [, wd] and O [to/p]+1V/ADV [left side/f] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [chair/n]+1F/ATT
[side/f] -1V/POB [,/wd] O [again/d]+1V/ADV [to/p]+1/ADV [right side/f] -1P/POB [turning/v] -2V/COO [,/wd]
O [straight/d]+1V/ADV [walking/v] -1V/COO [to/v] -1V/CMP [wall/n] -1V/POB [,/wd] O [to/p]+1V/ADV [left/
F] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [doorway/s] -1V/POB [,/wd] and O [to/p]+1V/ADV [right side/f] -
1P/POB [through/v] -2V/COO [door/n] -1V/VOB [,/wd] O [entrance/v] -1V/COO [parlor/n] -1V/VOB [,/wd]
O [forward/vi]+1V/ADV [walking/v] -1V/COO [to/v] -1V/CMP [vase/n]+1F/ATT [side/f] -1V/POB [,/wd]
O [to/p]+1V/ADV [right side/f] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [sofa/n]+1F/ATT [and/cc]+
1N/RAD [desk/n] -1N/COO [between/f] -1V/POB [,/wd] and O [to/p]+1V/ADV [left side/f] -1P/POB [turning/v] -
2V/COO [,/wd] O [taking/v] -1V/COO [desk/n]+1F/ATT [upper/f]+1N/ATT [/ude1] -1F/ATT is [easy-to-draw
Tank/n] -1V/VOB [./wj]O
Semantic character labeling result:
[robot/n] A0 [forward/vi] ADV [walking/v to/v] V [bed/n] A1 [,/wd] and O [to/p a left sides/f] AM-DIR
[walking/v to/v] V [by chair/n/f] A1 [,/wd] and O [again/d] AM-ADV [to/p right sides/f] AM-DIR [turning/v] V [,/wd] and O
[straight/d] AM-ADV [walking/v to/v] V [wall/n] A1 [,/wd] and O [to/p a left sides/f] AM-DIR [walking/v to/v] V [doorway/s] A1
[,/wd] O [to/p right sides/f] AM-DIR [through/v] V [door/n] A1 [,/wd] 0 [entrance/v] V [parlor/n] A1 [,/wd] [to
Before/vi] AM-ADV [walking/v to/v] V [by vase/n/f] A1 [,/wd] O [to/p right sides/f] AM-DIR [walking/v to/v] V [sofa/
Between n and/cc desks/n/f] A1 [,/wd] O [to/p a left sides/f] AM-DIR [turning/v] V [,/wd] O [taking/v] V [desk/n is upper/f
/ ude1 pop cans/n] A1 [./wj]O
Following 9 path units are obtained successively.
1st, moving person:Robot
Direction:Empty (acquiescence is forward)
Motion control:Walk
Moving target:Bed
2nd, direction:It is left
Motion control:Walk
Moving target:Chair
3rd, direction:It is right
Motion control:Walk
Moving target:Wall
4th, direction:It is left
Motion control:Walk
Moving target:Doorway
5th, direction:It is right
Motion control:Pass through
Moving target:Door
6th, direction:It is empty
Motion control:Into
Moving target:Parlor
7th, direction:It is empty
Motion control:Walk
Moving target:Vase
8th, direction:It is right
Motion control:Walk
Moving target:9, direction between sofa and desk:It is left
Motion control:Take
Moving target:Pop can on desk.
Claims (3)
1. the semantic character labeling and semantic extracting method of a kind of untethered path natural language, it is characterized in that, methods described is first
Chinese path natural language language material under the conditions of untethered is first collected, Chinese path database for natural language is set up;Then base is used
Realize satisfying the need in chunk parsing and the semantic character labeling method of interdependent syntactic analysis the automatic marking of footpath natural language language material;Most
Afterwards according to semantic character labeling result, path unit is divided in sequence, and extract the navigational semantic information of path unit.
2. the semantic character labeling and semantic extracting method of a kind of untethered path natural language according to claim 1,
It is characterized in that, the automatic marking method of the footpath natural language language material that satisfies the need is as follows:
A. participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed;
B. chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need
First, the footpath natural language language material that satisfies the need carries out language block category division, and language block classification includes that 7 include semantic language block and 1
Individual border language block, 7 comprising semantic language block be respectively direction conversion (DT), according to reference to find target (RT), without with reference to
Target advances (MT), directly finds target (ST), preposition reference (PR), space conversion (SC) and advance according to reference to target
(FR), the language block classification according to definition carries out manual language block mark to original language material storehouse, obtains chunk parsing corpus;Secondly,
Language block mark is carried out to each word using IOB2 mask methods, each word only have three kinds of states, the i.e. word language block beginning,
The word is not belonging to any language block in the inside of language block and the word, represents these three states with " B-X ", " I-X " " O " respectively, wherein
" X " represents language block type;Then, chunk parsing feature templates are determined by current word and its context environmental;Finally, according to customization
Feature templates and training corpus training CRF models, the probability distribution of model is obtained, by the probability distribution and word of model
Context environmental obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking;
C. interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need
First, determine interdependent syntax mask method, and interdependent syntax mark is carried out to training corpus:By the tag definition of each word
It is following form:[+/-] dPOS, wherein "+" represent governing word after dependent, and "-" represented governing word before dependent,
POS represents the part of speech of governing word, and " d " represents the quantity for existing between dependent and governing word and having identical part of speech with governing word;
Then, define interdependent syntactic analysis feature templates, semantic character labeling type include adverbial word mark (ADV), adessive (LOC) and
Arrow mark (DIR);Finally, CRF models are trained using training corpus according to interdependent syntactic analysis feature templates, is realized interdependent
Syntax automatic marking;
D. semantic character labeling is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result
First, semantic role is divided into core semantic role and additional semantic role;Then, according to the semantic role class of definition
It is other that semantic character labeling is carried out to training corpus;Then, chunk parsing and the result of interdependent syntactic analysis are built as feature
Vertical semantic character labeling feature masterplate;Finally, CRF moulds are trained using training corpus according to semantic character labeling feature masterplate
Type, realizes semantic role automatic marking.
3. the semantic character labeling and semantic extracting method of a kind of untethered path natural language according to claim 2,
It is characterized in that, divide path unit as follows with the step of extracting path unit navigational semantic information:
Be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax mark;
(2) the language block type of each word is judged in order, if not " B-PT ", then go to and (3) walk;If " B-PT ", then path
Unit is extracted and completed;
(3) judge semantic character labeling type, if A0, then go to and (4) walk;If V, then go to and (5) walk;If DIR, then
Is gone to (6) to walk;If LOC, then go to and (7) walk;If A1, then go to and (9) walk;
(4) the part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person module;
(5) the part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module;
(6) the part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling module;
(7) judge the language block type of word in LOC, if PR, word is filled into motion reference module;If SC, is gone to (8)
Step;
(8) part of speech is the quantity of the word of n (noun) in judging SC, if noun quantity is 1, goes to the and (10) walks;If noun quantity
More than 1, then go to and (11) walk;
(9) part of speech is the quantity of the word of n (noun) in judging A1;If noun quantity is 1, goes to the and (10) walk;If noun quantity
More than 1, then go to and (11) walk;
(10) judge the dependence of the noun, if the word depends on verb (v), the word is filled into moving target;If the word
Non- verb is depended on, then noun will be added until new term depends on verb by interdependent word, then be filled into new noun is interdependent
Moving target;
(11) judge the dependence between noun, if noun is coordination, as an overall noun, return again to the (10)
Step;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b according to (10)
Step treatment;
(12) judge whether motion-control module and moving target module are not empty, if it is not, go to the (1) walking;If so, then road
Footpath unit is extracted and completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611264509.2A CN106705974B (en) | 2016-12-30 | 2016-12-30 | Semantic role labeling and semantic extraction method for non-restricted path natural language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611264509.2A CN106705974B (en) | 2016-12-30 | 2016-12-30 | Semantic role labeling and semantic extraction method for non-restricted path natural language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106705974A true CN106705974A (en) | 2017-05-24 |
CN106705974B CN106705974B (en) | 2020-05-12 |
Family
ID=58905583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611264509.2A Active CN106705974B (en) | 2016-12-30 | 2016-12-30 | Semantic role labeling and semantic extraction method for non-restricted path natural language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106705974B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491556A (en) * | 2017-09-04 | 2017-12-19 | 湖北地信科技集团股份有限公司 | Space-time total factor semantic query service system and its method |
CN109522551A (en) * | 2018-11-09 | 2019-03-26 | 天津新开心生活科技有限公司 | Entity link method, apparatus, storage medium and electronic equipment |
WO2021147875A1 (en) * | 2020-01-20 | 2021-07-29 | 华为技术有限公司 | Text screening method and apparatus |
EP3859587A1 (en) | 2020-01-29 | 2021-08-04 | Tata Consultancy Services Limited | Robotic task planning for complex task instructions in natural language |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514157A (en) * | 2013-10-21 | 2014-01-15 | 东南大学 | Path natural language processing method for indoor intelligent robot navigation |
CN105718442A (en) * | 2016-01-19 | 2016-06-29 | 齐鲁工业大学 | Word sense disambiguation method based on syntactic analysis |
CN106250524A (en) * | 2016-08-04 | 2016-12-21 | 浪潮软件集团有限公司 | Organization name extraction method and device based on semantic information |
-
2016
- 2016-12-30 CN CN201611264509.2A patent/CN106705974B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514157A (en) * | 2013-10-21 | 2014-01-15 | 东南大学 | Path natural language processing method for indoor intelligent robot navigation |
CN105718442A (en) * | 2016-01-19 | 2016-06-29 | 齐鲁工业大学 | Word sense disambiguation method based on syntactic analysis |
CN106250524A (en) * | 2016-08-04 | 2016-12-21 | 浪潮软件集团有限公司 | Organization name extraction method and device based on semantic information |
Non-Patent Citations (3)
Title |
---|
KE ZHANG: ""Route Natural Language Processing Method for Robot Navigation"", 《PROCEE DINGS OF THE IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION》 * |
王浩: ""面向机器人导航的汉语路径自然语言组块分析方法研究"", 《电脑知识与技术》 * |
计峰: ""基于序列标注的中文依存句法分析方法"", 《计算机应用于软件》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491556A (en) * | 2017-09-04 | 2017-12-19 | 湖北地信科技集团股份有限公司 | Space-time total factor semantic query service system and its method |
CN109522551A (en) * | 2018-11-09 | 2019-03-26 | 天津新开心生活科技有限公司 | Entity link method, apparatus, storage medium and electronic equipment |
CN109522551B (en) * | 2018-11-09 | 2024-02-20 | 天津新开心生活科技有限公司 | Entity linking method and device, storage medium and electronic equipment |
WO2021147875A1 (en) * | 2020-01-20 | 2021-07-29 | 华为技术有限公司 | Text screening method and apparatus |
EP3859587A1 (en) | 2020-01-29 | 2021-08-04 | Tata Consultancy Services Limited | Robotic task planning for complex task instructions in natural language |
US11487577B2 (en) | 2020-01-29 | 2022-11-01 | Tata Consultancy Services Limited | Robotic task planning for complex task instructions in natural language |
Also Published As
Publication number | Publication date |
---|---|
CN106705974B (en) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110795543B (en) | Unstructured data extraction method, device and storage medium based on deep learning | |
CN110263324A (en) | Text handling method, model training method and device | |
US11908483B2 (en) | Inter-channel feature extraction method, audio separation method and apparatus, and computing device | |
CN106705974A (en) | Semantic role tagging and semantic extracting method of unrestricted path natural language | |
CN106407333A (en) | Artificial intelligence-based spoken language query identification method and apparatus | |
Spranger | The evolution of grounded spatial language | |
CN107861938A (en) | A kind of POI official documents and correspondences generation method and device, electronic equipment | |
CN107766320A (en) | A kind of Chinese pronoun resolution method for establishing model and device | |
CN114639139A (en) | Emotional image description method and system based on reinforcement learning | |
CN105210085A (en) | Image labeling using geodesic features | |
CN107704456A (en) | Identify control method and identification control device | |
CN109918501A (en) | Method, apparatus, equipment and the storage medium of news article classification | |
CN108737530A (en) | A kind of content share method and system | |
CN110580516B (en) | Interaction method and device based on intelligent robot | |
CN108664465A (en) | One kind automatically generating text method and relevant apparatus | |
CN110969023B (en) | Text similarity determination method and device | |
CN106339366A (en) | Method and device for requirement identification based on artificial intelligence (AI) | |
CN113506377A (en) | Teaching training method based on virtual roaming technology | |
CN115394321A (en) | Audio emotion recognition method, device, equipment, storage medium and product | |
CN111399629B (en) | Operation guiding method of terminal equipment, terminal equipment and storage medium | |
Porfirio et al. | Sketching robot programs on the fly | |
CN114049501A (en) | Image description generation method, system, medium and device fusing cluster search | |
CN110377691A (en) | Method, apparatus, equipment and the storage medium of text classification | |
Tzafestas | Advances in Intelligent Systems: Concepts, Tools and Applications | |
KR102508656B1 (en) | Method, device and system for providing customized language ability test learning service based on problem analysis through artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |