CN106705974A - Semantic role tagging and semantic extracting method of unrestricted path natural language - Google Patents

Semantic role tagging and semantic extracting method of unrestricted path natural language Download PDF

Info

Publication number
CN106705974A
CN106705974A CN201611264509.2A CN201611264509A CN106705974A CN 106705974 A CN106705974 A CN 106705974A CN 201611264509 A CN201611264509 A CN 201611264509A CN 106705974 A CN106705974 A CN 106705974A
Authority
CN
China
Prior art keywords
word
semantic
language
noun
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611264509.2A
Other languages
Chinese (zh)
Other versions
CN106705974B (en
Inventor
张珂
陈奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201611264509.2A priority Critical patent/CN106705974B/en
Publication of CN106705974A publication Critical patent/CN106705974A/en
Application granted granted Critical
Publication of CN106705974B publication Critical patent/CN106705974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a semantic role tagging and semantic extracting method of an unrestricted path natural language. The method comprises the steps that firstly Chinese path natural language linguistic data under an unrestricted condition is collected and a Chinese path natural language linguistic database is built; secondly, automatic tagging of the path natural language linguistic data is achieved by using the semantic role tagging method based on chunk analysis and dependency grammar analysis; finally, according to a semantic role tagging result, path unit division is sequentially conducted, and navigational semantic information of path units is extracted. Path natural language semantic role tagging is conducted by using the semantic role tagging method based on chunk analysis and dependency grammar analysis, according to the extracted semantic role tagging result, path unit division is conducted, and finally the semantic information of the path units is extracted. By means of the method, path unit division can be sequentially and accurately conducted, the path semantic information can be accurately extracted, and accordingly the method can provide guidance for smooth implementation of asking for directions and navigation with a robot.

Description

The semantic character labeling and semantic extracting method of a kind of untethered path natural language
Technical field
The present invention relates to a kind of semantic character labeling and semantic extracting method of the natural language for robot navigation, category In technical field of data processing.
Background technology
Natural language processing is an important branch in artificial intelligence field, and it used between computer and people with being studied The method that natural language is communicated is target.Natural language processing is not the grammer and syntax for simply studying natural language Relation, but research can effectively realize the system that compunication or man-machine interaction are carried out based on natural language, it is to calculate A part for machine science.Realize that man-machine interaction means that computer is not only understood that natural language using natural language, moreover it is possible to make With natural language expressing thought, intention etc..The former is natural language understanding, and the latter is spatial term.
People pursue always for a long time carries out man-machine interaction using natural language and computer, because it is existing important Theory significance, while also there is obvious practical significance.First, the natural language that the mankind can be accustomed to oneself uses computer, And learn the computer language of various complexity without devoting a tremendous amount of time again;Secondly, people also can be further by it Solve the language ability of the mankind and the mechanism of intelligence.
Language is the important method of Human communication, if man-machine interaction can be carried out by natural language, then just can be with Robot is controlled by natural language.If robot can be controlled with natural language, then robot also can be by ordinary people Indiscriminately ad. as one wishes control.Control robot easier than other method by natural language, also more meet the exchange custom of the mankind. Modern machines people technology is fast-developing under the promotion of sensor technology, computer technology and artificial intelligence, wherein moving machine The fields such as people is because with mobility and capacity of self-government, being widely used in servicing, detecting, logistics.The core skill of mobile robot One of art is airmanship, particularly autonomous navigation technology.Wherein, independent navigation is being carried out just using natural language control robot It is increasingly becoming study hotspot.Researchers wish that future can control robot to complete navigation task by natural language, and lead Boat task is also the basis of other complex tasks, therefore it is the base for realizing other navigation tasks by natural language navigate Plinth, has great significance to development artificial intelligence.
In path natural language processing field, the robot navigation based on English path natural language processing has had just Step development, researcher lays particular emphasis on and determines robot navigation path with reference to corresponding environment by the research to verb.And the Chinese The research of language path natural language is also very immature, has larger gap compared to the natural language processing of English path.Pleasure is small Legendary small dragon with horns et al. analyzes the position relation in natural language using the method based on layering finite-state automata.Zhang Xueying et al. leads to Cross the grammer of the Chinese path natural language of research, it is proposed that the path natural language processing method based on urban transportation.Liu Yu etc. People proposes a kind of NLRP analysis methods based on restricted Chinese on the basis of verb in furtheing investigate path natural language.Jiang Civilization et al. has carried out preliminary point from the footpath natural language processing of satisfying the need of two kinds of natural language processing methods based on statistics and rule Analysis.Li Xinde et al. proposes a kind of method of the path natural language processing based on chunk parsing.They are under particular circumstances After have collected a small amount of language material, propose several main semantic components, find Chinese path natural language syntax and semantic it Between there is stronger contact.They carry out a series for the treatment of such as participle, noun Entity recognition and chunk parsing to language material, take Method based on chunk parsing extracts the semantic language block in outbound path natural language.Finally, each semantic language block to extracting The semantic information effective to build path is extracted according to corresponding cell body.But these Chinese path natural language processing methods are equal In the presence of certain technical problem:
1. conventional Chinese path natural language processing method only with a small amount of language material collected in certain circumstances as grinding Object is studied carefully, and the natural language path description of reality is then varied, so fairly large under the conditions of needing to set up untethered Path database for natural language, for path natural language processing research work;
2. under the conditions of the natural language of untethered path, the relation between the syntactic relation and semanteme of path natural language is more It is complexity, only relies on existing path natural language semantic character labeling method, it is difficult to carries out path natural language language exactly Adopted character labeling, so as to influence the degree of accuracy of navigational semantic information extraction;
3. navigation is asked the way, it is necessary to obtain accurate navigational semantic letter from the basis of semantic character labeling in order to carry out robot Breath, but path natural language frequently includes orderly multiple path units, so how to divide path unit, and from each road It is the problem for needing to solve at present that semantic information is extracted in the unit of footpath.
The content of the invention
A kind of drawback it is an object of the invention to be directed to prior art, there is provided the semantic angle of untethered path natural language Colour code is noted and semantic extracting method, to improve the degree of accuracy of robot navigation's Semantic features extraction.
Problem of the present invention is solved with following technical proposals:
The semantic character labeling and semantic extracting method of a kind of untethered path natural language, methods described is collected non-first Chinese path natural language language material under confined condition, sets up Chinese path database for natural language;Then use based on language block point Analyse the automatic marking of the footpath natural language language material that satisfied the need with the semantic character labeling method realization of interdependent syntactic analysis;Finally according to language Adopted character labeling result, divides path unit, and extract the navigational semantic information of path unit in sequence.
The semantic character labeling and semantic extracting method of above-mentioned untethered path natural language, satisfy the need footpath natural language language material Automatic marking method it is as follows:
A. participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed;
B. chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need
First, the footpath natural language language material that satisfies the need carries out language block category division, and language block classification includes that 7 include semantic language block With 1 border language block, 7 be respectively direction conversion (DT) comprising semantic language block, according to reference to finding target (RT), without reference Advanced (MT) to target, directly find target (ST), preposition reference (PR), space conversion (SC) and advance according to reference to target (FR), the language block classification according to definition carries out manual language block mark to original language material storehouse, obtains chunk parsing corpus;Secondly, Language block mark is carried out to each word using IOB2 mask methods, each word only have three kinds of states, the i.e. word language block beginning, The word is not belonging to any language block in the inside of language block and the word, represents these three states with " B-X ", " I-X " " O " respectively, wherein " X " represents language block type;Then, chunk parsing feature templates are determined by current word and its context environmental;Finally, according to customization Feature templates and training corpus training CRF models, the probability distribution of model is obtained, by the probability distribution and word of model Context environmental obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking;
C. interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need
First, determine interdependent syntax mask method, and interdependent syntax mark is carried out to training corpus:By the label of each word It is defined as form:[+/-] dPOS, wherein "+" represent governing word after dependent, and "-" represents governing word in dependent Before, POS represents the part of speech of governing word, and " d " represents to exist between dependent and governing word have identical part of speech with governing word Quantity;Then, interdependent syntactic analysis feature templates are defined;Finally, training corpus is used according to interdependent syntactic analysis feature templates CRF models are trained in storehouse, realize interdependent syntax automatic marking;
D. semantic character labeling is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result
First, semantic role is divided into core semantic role and additional semantic role;Then, according to the semantic angle of definition Color classification carries out semantic character labeling to training corpus;Then, using chunk parsing and the result of interdependent syntactic analysis as spy Levy, set up semantic character labeling feature masterplate;Finally, trained using training corpus according to semantic character labeling feature masterplate CRF models, realize semantic role automatic marking.
The semantic character labeling and semantic extracting method of above-mentioned untethered path natural language, divide path unit and extraction The step of path unit navigational semantic information, is as follows:
Be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax Mark;
(2) the language block type of each word is judged in order, if not " B-PT ", then go to and (3) walk;If " B-PT ", then Path unit is extracted and completed;
(3) judge semantic character labeling type, if A0, then go to and (4) walk;If V, then go to and (5) walk;If DIR, then go to and (6) walk;If LOC, then go to and (7) walk;If A1, then go to and (9) walk;
(4) the part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person module;
(5) the part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module;
(6) the part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling mould Block;
(7) judge the language block type of word in LOC, if PR, word is filled into motion reference module;If SC, go to (8) walk;
(8) part of speech is the quantity of the word of n (noun) in judging SC, if noun quantity is 1, goes to the and (10) walks;If noun Quantity is more than 1, then go to the and (11) walk;
(9) part of speech is the quantity of the word of n (noun) in judging A1;If noun quantity is 1, goes to the and (10) walk;If noun Quantity is more than 1, then go to the and (11) walk;
(10) judge the dependence of the noun, if the word depends on verb (v), the word is filled into moving target;If The word depends on non-verb, then will add noun until new term depends on verb by interdependent word, then fills out new noun is interdependent It is charged into moving target;
(11) judge the dependence between noun, if noun is coordination, as an overall noun, return again to (10) walk;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b according to (10) step treatment;
(12) judge whether motion-control module and moving target module are not empty, if it is not, go to the (1) walking;If so, Then path unit is extracted and completed.
The present invention carries out path nature language using the semantic character labeling method based on chunk parsing and interdependent syntactic analysis Speech semantic character labeling, path unit division is carried out according to the semantic character labeling result for extracting, and finally extracts each path list The semantic information of unit.The method can in sequence divide path unit exactly, and accurately extract path semantic information, therefore Can guidance machine people ask the way the smooth implementation of navigation.
Brief description of the drawings
Fig. 1 is semantic character labeling flow chart;
Fig. 2 is that path unit is divided and extraction of semantics flow chart.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
The present invention proposes a kind of path natural language processing method towards Chinese, and the method includes following 3 steps:
1. Chinese path natural language language material under the conditions of untethered is collected, Chinese path database for natural language is set up.
2. realized for path nature language using the semantic character labeling method based on chunk parsing and interdependent syntactic analysis The automatic marking of speech material.
3., according to semantic character labeling result, path unit is divided in sequence, and extract the navigational semantic of path unit Information.
1st step collects Chinese path natural language language material under the conditions of untethered, sets up Chinese path database for natural language. In order to be collected into the untethered path natural language language material of abundance, the present invention has built 10 using Webots for Nao 3D simulated environments, then record 10 sections of videos and provide the top view of each environment according to simulated environment, are robot at each A navigation task (providing initial position and the target location of robot) is given in environment.Do not informing the feelings of task details Under condition, look for the volunteer of 100 all ages and classes, different occupation by watching video and top view, unrestricted choice one can be with complete Into the path of navigation task and corresponding natural language path description is provided, in order to ensure the generality of language material, selected 100 From 12 years old to 60 years old, schooling acquired master from small, and native place is distributed throughout the country for position volunteer's age distribution.Volunteer The unrestricted choice descriptive statement under untethered environment, finally have collected 1000 language materials, constitute Chinese path natural language language Material storehouse.
2nd step is the semantic character labeling based on chunk parsing and interdependent syntactic analysis, idiographic flow such as Fig. 1 institutes Show.
(1) participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed.
(2) chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need.First, satisfy the need footpath natural language Language material carries out language block category division, present invention determine that 7 include semantic language blocks and 1 border language block, as shown in table 1.Root Manual language block mark is carried out to original language material storehouse according to the language block classification of definition, chunk parsing corpus is obtained.Secondly, for language block Problem analysis, language block mark is carried out to each word, and specific using IOB2 mask methods mark, each word only has three kinds of states, I.e. the word is not belonging to any language block in the beginning of language block, the word in the inside of language block and the word, respectively with " B-X ", " I-X " " O " These three states are represented, wherein " X " represents language block type.Then, determine that chunk parsing is special by current word and its context environmental Template is levied, the chunk parsing feature templates of present invention definition are as shown in table 2.Finally, training language is used according to these characteristic sets Material storehouse training CRF models, i.e., according to the feature templates and training corpus of customization, training obtain optimal parameter vector Λ= {λ1,…λK, complete CRF models parameter Estimation after, can obtain the probability distribution of model, by model probability distribution and The context environmental of word obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking.
The language block of table 1 is defined
(3) interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need.First, interdependent syntax mask method is determined, and Interdependent syntax mark is carried out to training corpus.It is following form by the tag definition of each word:[+/-] dPOS, wherein "+" is represented After dependent, "-" represented governing word before dependent to governing word, and POS represents the part of speech of governing word, and " d " represents subordinate There is the quantity that there is identical part of speech with governing word between word and governing word.Then, it is special invention defines interdependent syntactic analysis Template is levied, as shown in table 3.Finally, CRF models are trained using training corpus according to these characteristic sets, realizes interdependent syntax Automatic marking.
The feature templates of the interdependent syntactic analysis of table 3
(4) semantic role mark is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result Note.First, semantic role is divided into core semantic role and additional semantic role by the present invention, and Arg+ numerals are (using A+ numerals It is abridged), represent core semantic role (core argument), the agent of wherein Arg0 generally expression actions, Arg1 The word denoting the receiver of an action of usual expression action, Arg2-Arg4 has different semantic meanings according to the difference of predicate verb, and V is also a seed nucleus Heart semantic role, for representing predicate, ArgM-* represents attachment component, and * here represents the function of attachment component, attachment component Classification is as shown in table 4.Then, the semantic role classification according to definition carries out semantic character labeling to training corpus.Then, by language The result of block analysis and interdependent syntactic analysis sets up semantic character labeling feature masterplate, the base of present invention definition as feature It is as shown in table 5 in chunk parsing and the semantic character labeling feature templates of interdependent syntactic analysis.Finally, according to these characteristic sets CRF models are trained using training corpus, semantic role automatic marking is realized.
The attachment component of table 4
Table 5 is based on the semantic character labeling feature templates of chunk parsing and interdependent syntactic analysis
3rd step is to carry out path unit division according to semantic character labeling result, and then extracts each path unit Navigational semantic information.One path unit is made up of four parts:Moving person, the direction of motion, motion control, moving target. Path unit is divided and extraction of semantics flow is as shown in Figure 2.
(1) be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax Mark.
(2) the language block type of each word is judged in order.If not " B-PT ", then go to (3rd) step;If " B-PT ", Then path unit is extracted and completed.
(3) semantic character labeling type is judged.If A0, then (4th) step is gone to;If V, then (5th) step is gone to;If DIR, then go to (6th) step;If LOC, then (7th) step is gone to;If A1, then (9th) step is gone to.
(4) part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person mould Block.
(5) part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module.
(6) part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling Module.
(7) the language block type of word in LOC is judged.If PR, word is filled into motion reference module.If SC, turn To (8th) step.
(8) part of speech is the quantity of the word of n (noun) in judging SC,.If noun quantity is 1, then go to (10th) step; If noun quantity is more than 1, (11st) step is gone to.
(9) part of speech is the quantity of the word of n (noun) in judging A1.If noun quantity is 1, (10th) step is gone to;If Noun quantity is more than 1, then go to (11st) step.
(10) dependence of the noun is judged.If the word depends on verb (v), the word is filled into moving target; If the word depends on non-verb, noun will be added by interdependent word until new term depends on verb then new noun is interdependent It is filled into moving target.
(11) dependence between noun is judged.If noun is coordination, as an overall noun, then turn To (10th) step;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b Processed according to (10th) step.
(12) judge motion-control module and moving target module whether not for empty.If it is not, going to (1st) step;If It is that then path unit is extracted and completed.
The present invention is directed under the conditions of the natural language of untethered path, between the syntactic relation and semanteme of path natural language Relation it is increasingly complex the characteristics of, it is proposed that a kind of path natural language semanteme angle based on chunk parsing and interdependent syntactic analysis Colour code injecting method.First, according to path natural language feature, appropriate language block classification and interdependent syntactic category definition side is given Method, and chunk parsing feature templates and syntactic analysis feature masterplate are devised, the language block for realizing high accuracy using CRF is automatic Mark and interdependent syntax automatic marking;Secondly, asked the way the path semantic information of navigation to extract object manipulator, devised New semantic role classification and the semantic character labeling feature masterplate based on chunk parsing and interdependent syntactic analysis, using CRF Realize the path natural language semantic character labeling of high accuracy.
Navigation is asked the way, it is necessary to obtain accurate navigational semantic letter from the basis of semantic character labeling in order to carry out robot Breath, but path natural language frequently includes orderly multiple path units.The present invention is proposed based on semantic character labeling letter The path unit of breath is divided and semantic extracting method, and the method accurately can in sequence divide path unit, and simultaneously Accurately extract path semantic information, the smooth implementation of navigation so that guidance machine people asks the way.
Experimental analysis:
In order to illustrate above advantage, verified using experimental technique.By all language materials in corpus according to 4:1 ratio Divided, wherein 80% is training corpus, 20% is testing material.Then the feature templates according to definition train training corpus Storehouse, obtains corresponding training pattern, then the training pattern obtained with testing material library test accuracy rate.
Experiment one:Chunk parsing is tested
Table 6 is each chunk parsing result.The accuracy rate P of chunk parsing, recall rate R, F1 value are all very high, illustrate this hair It is rational to specify fixed piecemeal type, it is adaptable to path natural language processing.Chunk parsing accuracy rate higher is also complied with and connect Get off to be carried out on the basis of chunk parsing the requirement of semantic character labeling.
The chunk parsing result (%) of table 6
Experiment two:Interdependent syntactic analysis experiment
Table 7 is interdependent syntactic analysis result.Interdependent syntactic analysis knot is represented using the average value of interdependent syntax automatic marking Really, although compared with chunk parsing, its accuracy rate, recall rate, F1 values are all relatively low, but numerically still higher, and And from the point of view of semantic character labeling identification angle, interdependent syntactic analysis is the supplement of chunk parsing.
The interdependent syntactic analysis result (%) of table 7
Experiment three:Tested based on chunk parsing and the semantic character labeling of interdependent syntactic analysis
Table 8 is the semantic character labeling result based on chunk parsing and interdependent syntactic analysis.The accuracy rate of the method, recall The value of rate and F1 values is all very high, illustrates the validity of the method;And for robot navigation's task, path unit master To be made up of following 4 parts:Predicate, direction word, with reference to, target, this 4 parts just respectively in V, AM-DIR, A1, and Their mark effect is even more better than average value, illustrates that the method for the present invention is highly suitable for the path language of path natural language Adopted information extraction.
Table 8 is based on the semantic character labeling result (%) of chunk parsing and interdependent syntactic analysis
Experiment four:The path Semantic features extraction process of example sentence is realized using the inventive method.
Example sentence:Robot goes to forward bed, goes to by chair to the left, then turns right, and directly goes to wall, and doorway is gone to the left, Through door, into parlor, go to by vase forward, go between sofa and desk to the right, turn left, by easy-to-draw on desk Tank.
Participle and part-of-speech tagging result:
Robot/n forward/vi walks/v to/v/n ,/wd walked to/p a left sides/f/v to/v chairs/n by/f ,/wd again/d is to/p The right side/f turns/v, and/wd is straight/d walks/v to/v walls/n, and/v to/v doorways/s is walked to/p a left sides/f, and/wd passes through/v/n ,/wd to the/p right sides/f Into/v parlors/n ,/wd forward/vi walks/v to/v vases/n by/f ,/wd walked to/p the right sides/f/v to/v sofas/n and/cc desks/ Between n/f ,/wd turns/v to/p a left sides/f, and/wd takes/and v desks/n is upper/f /ude1 pop cans/n./wj
Chunk parsing result:
[robot/n forward/vi walk/v to/v/n] MT [,/wd] PT [to/p a left sides/f] DT [walk/v to/v chairs/n by/ F] MT [,/wd] PT [again/d] O [turning/v to the/p right sides/f] DT [,/wd] PT [straight/d walks/v to/v walls/n] MT [,/wd] PT is [to/p A left side/f] DT [walking/v to/v doorways/s] MT [to/p right sides/f] DT [,/wd] and PT [through/v/n] SC [,/wd] and PT [entrance/v visitors The Room/n] SC [,/wd] PT [forward/vi] O [walk/v to/v vases/n by/f] MT [,/wd] PT [to/p right sides/f] DT [walks/v to/v Between sofa/n and/cc desks/n/f] MT [,/wd] PT [turning/v to/p a left sides/f] DT [,/wd] PT [take/v desks/n is upper/f/ Ude1 pop cans/n] RT [./wj]PT
Interdependent syntactic analysis result:
[robot/n]+1V/SBV [forward/vi]+1V/ADV [walking/v] 0/ROOT [to/v] -1V/CMP [bed/n] -1V/ POB [, wd] and O [to/p]+1V/ADV [left side/f] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [chair/n]+1F/ATT [side/f] -1V/POB [,/wd] O [again/d]+1V/ADV [to/p]+1/ADV [right side/f] -1P/POB [turning/v] -2V/COO [,/wd] O [straight/d]+1V/ADV [walking/v] -1V/COO [to/v] -1V/CMP [wall/n] -1V/POB [,/wd] O [to/p]+1V/ADV [left/ F] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [doorway/s] -1V/POB [,/wd] and O [to/p]+1V/ADV [right side/f] - 1P/POB [through/v] -2V/COO [door/n] -1V/VOB [,/wd] O [entrance/v] -1V/COO [parlor/n] -1V/VOB [,/wd] O [forward/vi]+1V/ADV [walking/v] -1V/COO [to/v] -1V/CMP [vase/n]+1F/ATT [side/f] -1V/POB [,/wd] O [to/p]+1V/ADV [right side/f] -1P/POB [walking/v] -2V/COO [to/v] -1V/CMP [sofa/n]+1F/ATT [and/cc]+ 1N/RAD [desk/n] -1N/COO [between/f] -1V/POB [,/wd] and O [to/p]+1V/ADV [left side/f] -1P/POB [turning/v] - 2V/COO [,/wd] O [taking/v] -1V/COO [desk/n]+1F/ATT [upper/f]+1N/ATT [/ude1] -1F/ATT is [easy-to-draw Tank/n] -1V/VOB [./wj]O
Semantic character labeling result:
[robot/n] A0 [forward/vi] ADV [walking/v to/v] V [bed/n] A1 [,/wd] and O [to/p a left sides/f] AM-DIR [walking/v to/v] V [by chair/n/f] A1 [,/wd] and O [again/d] AM-ADV [to/p right sides/f] AM-DIR [turning/v] V [,/wd] and O [straight/d] AM-ADV [walking/v to/v] V [wall/n] A1 [,/wd] and O [to/p a left sides/f] AM-DIR [walking/v to/v] V [doorway/s] A1 [,/wd] O [to/p right sides/f] AM-DIR [through/v] V [door/n] A1 [,/wd] 0 [entrance/v] V [parlor/n] A1 [,/wd] [to Before/vi] AM-ADV [walking/v to/v] V [by vase/n/f] A1 [,/wd] O [to/p right sides/f] AM-DIR [walking/v to/v] V [sofa/ Between n and/cc desks/n/f] A1 [,/wd] O [to/p a left sides/f] AM-DIR [turning/v] V [,/wd] O [taking/v] V [desk/n is upper/f / ude1 pop cans/n] A1 [./wj]O
Following 9 path units are obtained successively.
1st, moving person:Robot
Direction:Empty (acquiescence is forward)
Motion control:Walk
Moving target:Bed
2nd, direction:It is left
Motion control:Walk
Moving target:Chair
3rd, direction:It is right
Motion control:Walk
Moving target:Wall
4th, direction:It is left
Motion control:Walk
Moving target:Doorway
5th, direction:It is right
Motion control:Pass through
Moving target:Door
6th, direction:It is empty
Motion control:Into
Moving target:Parlor
7th, direction:It is empty
Motion control:Walk
Moving target:Vase
8th, direction:It is right
Motion control:Walk
Moving target:9, direction between sofa and desk:It is left
Motion control:Take
Moving target:Pop can on desk.

Claims (3)

1. the semantic character labeling and semantic extracting method of a kind of untethered path natural language, it is characterized in that, methods described is first Chinese path natural language language material under the conditions of untethered is first collected, Chinese path database for natural language is set up;Then base is used Realize satisfying the need in chunk parsing and the semantic character labeling method of interdependent syntactic analysis the automatic marking of footpath natural language language material;Most Afterwards according to semantic character labeling result, path unit is divided in sequence, and extract the navigational semantic information of path unit.
2. the semantic character labeling and semantic extracting method of a kind of untethered path natural language according to claim 1, It is characterized in that, the automatic marking method of the footpath natural language language material that satisfies the need is as follows:
A. participle is carried out to the language material in corpus using NLPIR Chinese word segmentation systems and part-of-speech tagging is processed;
B. chunk parsing is carried out using condition random field (CRF) the footpath natural language that satisfies the need
First, the footpath natural language language material that satisfies the need carries out language block category division, and language block classification includes that 7 include semantic language block and 1 Individual border language block, 7 comprising semantic language block be respectively direction conversion (DT), according to reference to find target (RT), without with reference to Target advances (MT), directly finds target (ST), preposition reference (PR), space conversion (SC) and advance according to reference to target (FR), the language block classification according to definition carries out manual language block mark to original language material storehouse, obtains chunk parsing corpus;Secondly, Language block mark is carried out to each word using IOB2 mask methods, each word only have three kinds of states, the i.e. word language block beginning, The word is not belonging to any language block in the inside of language block and the word, represents these three states with " B-X ", " I-X " " O " respectively, wherein " X " represents language block type;Then, chunk parsing feature templates are determined by current word and its context environmental;Finally, according to customization Feature templates and training corpus training CRF models, the probability distribution of model is obtained, by the probability distribution and word of model Context environmental obtains the probable value that word is endowed certain language block label symbol, so as to realize language block automatic marking;
C. interdependent syntactic analysis is carried out using the CRF footpath natural languages that satisfy the need
First, determine interdependent syntax mask method, and interdependent syntax mark is carried out to training corpus:By the tag definition of each word It is following form:[+/-] dPOS, wherein "+" represent governing word after dependent, and "-" represented governing word before dependent, POS represents the part of speech of governing word, and " d " represents the quantity for existing between dependent and governing word and having identical part of speech with governing word; Then, define interdependent syntactic analysis feature templates, semantic character labeling type include adverbial word mark (ADV), adessive (LOC) and Arrow mark (DIR);Finally, CRF models are trained using training corpus according to interdependent syntactic analysis feature templates, is realized interdependent Syntax automatic marking;
D. semantic character labeling is carried out using the CRF footpath natural languages that satisfy the need based on chunk parsing and interdependent syntactic analysis result
First, semantic role is divided into core semantic role and additional semantic role;Then, according to the semantic role class of definition It is other that semantic character labeling is carried out to training corpus;Then, chunk parsing and the result of interdependent syntactic analysis are built as feature Vertical semantic character labeling feature masterplate;Finally, CRF moulds are trained using training corpus according to semantic character labeling feature masterplate Type, realizes semantic role automatic marking.
3. the semantic character labeling and semantic extracting method of a kind of untethered path natural language according to claim 2, It is characterized in that, divide path unit as follows with the step of extracting path unit navigational semantic information:
Be input into a complete semantic role block and its comprising word, part-of-speech tagging, language block mark and interdependent syntax mark;
(2) the language block type of each word is judged in order, if not " B-PT ", then go to and (3) walk;If " B-PT ", then path Unit is extracted and completed;
(3) judge semantic character labeling type, if A0, then go to and (4) walk;If V, then go to and (5) walk;If DIR, then Is gone to (6) to walk;If LOC, then go to and (7) walk;If A1, then go to and (9) walk;
(4) the part of speech of word in A0 is judged, by wherein part of speech for the word order of n (noun) is filled into moving person module;
(5) the part of speech of word in V is judged, by wherein part of speech for the word order of v (verb) is filled into motion-control module;
(6) the part of speech of word in DIR is judged, by wherein part of speech for the word order of f (direction word) is filled into direction controlling module;
(7) judge the language block type of word in LOC, if PR, word is filled into motion reference module;If SC, is gone to (8) Step;
(8) part of speech is the quantity of the word of n (noun) in judging SC, if noun quantity is 1, goes to the and (10) walks;If noun quantity More than 1, then go to and (11) walk;
(9) part of speech is the quantity of the word of n (noun) in judging A1;If noun quantity is 1, goes to the and (10) walk;If noun quantity More than 1, then go to and (11) walk;
(10) judge the dependence of the noun, if the word depends on verb (v), the word is filled into moving target;If the word Non- verb is depended on, then noun will be added until new term depends on verb by interdependent word, then be filled into new noun is interdependent Moving target;
(11) judge the dependence between noun, if noun is coordination, as an overall noun, return again to the (10) Step;If one of noun a depends on another noun b, noun a is filled into motion reference module, noun b according to (10) Step treatment;
(12) judge whether motion-control module and moving target module are not empty, if it is not, go to the (1) walking;If so, then road Footpath unit is extracted and completed.
CN201611264509.2A 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language Active CN106705974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611264509.2A CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611264509.2A CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Publications (2)

Publication Number Publication Date
CN106705974A true CN106705974A (en) 2017-05-24
CN106705974B CN106705974B (en) 2020-05-12

Family

ID=58905583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611264509.2A Active CN106705974B (en) 2016-12-30 2016-12-30 Semantic role labeling and semantic extraction method for non-restricted path natural language

Country Status (1)

Country Link
CN (1) CN106705974B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491556A (en) * 2017-09-04 2017-12-19 湖北地信科技集团股份有限公司 Space-time total factor semantic query service system and its method
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
WO2021147875A1 (en) * 2020-01-20 2021-07-29 华为技术有限公司 Text screening method and apparatus
EP3859587A1 (en) 2020-01-29 2021-08-04 Tata Consultancy Services Limited Robotic task planning for complex task instructions in natural language

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514157A (en) * 2013-10-21 2014-01-15 东南大学 Path natural language processing method for indoor intelligent robot navigation
CN105718442A (en) * 2016-01-19 2016-06-29 齐鲁工业大学 Word sense disambiguation method based on syntactic analysis
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514157A (en) * 2013-10-21 2014-01-15 东南大学 Path natural language processing method for indoor intelligent robot navigation
CN105718442A (en) * 2016-01-19 2016-06-29 齐鲁工业大学 Word sense disambiguation method based on syntactic analysis
CN106250524A (en) * 2016-08-04 2016-12-21 浪潮软件集团有限公司 Organization name extraction method and device based on semantic information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KE ZHANG: ""Route Natural Language Processing Method for Robot Navigation"", 《PROCEE DINGS OF THE IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION》 *
王浩: ""面向机器人导航的汉语路径自然语言组块分析方法研究"", 《电脑知识与技术》 *
计峰: ""基于序列标注的中文依存句法分析方法"", 《计算机应用于软件》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491556A (en) * 2017-09-04 2017-12-19 湖北地信科技集团股份有限公司 Space-time total factor semantic query service system and its method
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN109522551B (en) * 2018-11-09 2024-02-20 天津新开心生活科技有限公司 Entity linking method and device, storage medium and electronic equipment
WO2021147875A1 (en) * 2020-01-20 2021-07-29 华为技术有限公司 Text screening method and apparatus
EP3859587A1 (en) 2020-01-29 2021-08-04 Tata Consultancy Services Limited Robotic task planning for complex task instructions in natural language
US11487577B2 (en) 2020-01-29 2022-11-01 Tata Consultancy Services Limited Robotic task planning for complex task instructions in natural language

Also Published As

Publication number Publication date
CN106705974B (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
CN110263324A (en) Text handling method, model training method and device
US11908483B2 (en) Inter-channel feature extraction method, audio separation method and apparatus, and computing device
CN106705974A (en) Semantic role tagging and semantic extracting method of unrestricted path natural language
CN106407333A (en) Artificial intelligence-based spoken language query identification method and apparatus
Spranger The evolution of grounded spatial language
CN107861938A (en) A kind of POI official documents and correspondences generation method and device, electronic equipment
CN107766320A (en) A kind of Chinese pronoun resolution method for establishing model and device
CN114639139A (en) Emotional image description method and system based on reinforcement learning
CN105210085A (en) Image labeling using geodesic features
CN107704456A (en) Identify control method and identification control device
CN109918501A (en) Method, apparatus, equipment and the storage medium of news article classification
CN108737530A (en) A kind of content share method and system
CN110580516B (en) Interaction method and device based on intelligent robot
CN108664465A (en) One kind automatically generating text method and relevant apparatus
CN110969023B (en) Text similarity determination method and device
CN106339366A (en) Method and device for requirement identification based on artificial intelligence (AI)
CN113506377A (en) Teaching training method based on virtual roaming technology
CN115394321A (en) Audio emotion recognition method, device, equipment, storage medium and product
CN111399629B (en) Operation guiding method of terminal equipment, terminal equipment and storage medium
Porfirio et al. Sketching robot programs on the fly
CN114049501A (en) Image description generation method, system, medium and device fusing cluster search
CN110377691A (en) Method, apparatus, equipment and the storage medium of text classification
Tzafestas Advances in Intelligent Systems: Concepts, Tools and Applications
KR102508656B1 (en) Method, device and system for providing customized language ability test learning service based on problem analysis through artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant