CN109271502A - A kind of classifying method and device of the space querying theme based on natural language processing - Google Patents

A kind of classifying method and device of the space querying theme based on natural language processing Download PDF

Info

Publication number
CN109271502A
CN109271502A CN201811116358.5A CN201811116358A CN109271502A CN 109271502 A CN109271502 A CN 109271502A CN 201811116358 A CN201811116358 A CN 201811116358A CN 109271502 A CN109271502 A CN 109271502A
Authority
CN
China
Prior art keywords
theme
word
natural language
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811116358.5A
Other languages
Chinese (zh)
Other versions
CN109271502B (en
Inventor
呙维
赵雨慧
李铭
朱欣焰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201811116358.5A priority Critical patent/CN109271502B/en
Publication of CN109271502A publication Critical patent/CN109271502A/en
Application granted granted Critical
Publication of CN109271502B publication Critical patent/CN109271502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The classifying method and device of the present invention provides a kind of space querying theme based on natural language processing, method therein includes: that the set of partition word is divided by the natural language for inputting user, then successively carries out characteristic matching to the word in set and rearranges with semantic sequence.The sample most adjacent with the natural language of input is searched further according to the result of theme training, and returns to theme, the space querying theme classification to natural language is reached with this.Realize the technical effect for improving subject distillation accuracy.

Description

A kind of classifying method and device of the space querying theme based on natural language processing
Technical field
The present invention relates to natural language technical fields, and in particular to a kind of space querying theme based on natural language processing Classifying method and device.
Background technique
With the rapid development of generation information technology industry, personal intelligence assistant has become the stream for improving quality of the life Row application.According to the input of user, personal intelligence assistant can be completed by natural language understanding and automated information processing Operational order.Natural language is the subdiscipline of artificial intelligence, and natural language processing is the reason using machine processing human language By and technology, language is studied into corresponding algorithm as computing object.Purpose be allow the mankind can with natural language form with Computer system carry out human-computer interaction, thus it is more convenient, information management is effectively performed.At the beginning of from the end of the nineties to 21 century, people Gradually recognize, be all only successfully to carry out at natural language with Statistics-Based Method with rule-based method or only Reason.Subsequent Case-based Reasoning and rule-based corpus technology are come into being.
Existing natural language processing technique, mainly include rule-based method and method two major classes type based on probability, It is subdivided into based on Bayes principle method, based on Hidden Markov Model method, discourse analysis method, neural network method etc. Deng.But the increasingly increase with people to information service demand, the semantic understanding of natural language there are still subject distillation difficulty with The problems such as theme ambiguity.
With the continuous development of natural language understanding technology, the research of place query language there has also been it is certain into Exhibition achieves many significant achievements.The main morphology being related to including place query language, syntax and semantic content and The research of method and the research of place query language spatial relation semantics and refinement.Natural language cognition is related to processing Oneself tends to be mature to theory and method, but also seldom for the research of the natural language of space field.Present invention applicant is implementing In process of the invention, find existing method in, be primarily present following both sides problem: first is that due to natural language morphology, Syntax and semantic is flexibly complicated, and interpretation process ambiguity situation is more, and existing research may be only available for some specific profession neck mostly The GIS-Geographic Information System in domain, to be considered merely as the arbitrary way of other space querying modes for a long time.Second is that place The research and accumulation of query language domain knowledge are less, lead to space dictionary summary and induction not system, space querying syntactic analysis It concludes not perfect while less to the existing research achievement of Spatial Semantics.At present in natural language understanding technology, space is believed The extraction of theme is ceased, ambiguity and mistake are more in explanation results.
From the foregoing, it will be observed that the technical problem of subject distillation result inaccuracy existing for method in the prior art.
Summary of the invention
In view of this, the classifying method and dress of the present invention provides a kind of space querying theme based on natural language processing It sets, to solve or at least partly solve the technical problem of subject distillation result inaccuracy existing for method in the prior art.
First aspect present invention provides a kind of classifying method of space querying theme based on natural language processing, packet It includes:
Step S1: natural language to be processed is divided into the set of word based on default partition word;
Step S2: the word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, is obtained Obtain word sequence corresponding with preset structure;
Step S3: concentrating in theme training result, lookup and the most adjacent sample of the natural language to be processed, In, the theme training result collection is obtained, the sample by the natural language sample collected in advance after being trained by the word sequence Comprising text and inquiry theme in this, the inquiry theme for including in the sample is returned to, and using the inquiry theme as classification As a result.
In one implementation, the default partition word includes: actional verb, preposition, subject, Feature Words and query Word.
In one implementation, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
In one implementation, the preset structure is " theme-verb-point of interest-verb-article ", step S2 tool Body includes:
Step S2.1: carrying out characteristic matching with the conceptual lexicon constructed in advance for the word in the set of institute's predicate, Obtain Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
In one implementation, theme training result collection passes through the word order by the natural language sample collected in advance It is obtained after column training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text, Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into It is trained into ElaticSearch, obtains the theme training result collection.
In one implementation, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed Sample.
In one implementation, it is replaced in the conceptual lexicon constructed in advance by the word for including in training sample Feature Words before, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
Based on same inventive concept, second aspect of the present invention provides a kind of space querying based on natural language processing The categorization arrangement of theme, comprising:
Language divides module, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module, it is special for carrying out the word in the set of institute's predicate with the conceptual lexicon constructed in advance Sign matching, obtains word sequence corresponding with preset structure;
Theme classifying module is searched most adjacent with the natural language to be processed for concentrating in theme training result Sample, wherein the theme training result collection is by the natural language sample collected in advance, by obtaining after word sequence training , comprising text and inquiry theme in the sample, return to the inquiry theme for including in the sample, and by the inquiry theme As categorization results.
In one implementation, the default partition word includes: actional verb, preposition, subject, Feature Words and query Word.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects Fruit:
It, then will be in set by the way that natural language to be processed to be divided into the set of word in method provided by the invention Word carry out characteristic matching with the conceptual lexicon that constructs in advance, obtain word sequence corresponding with preset structure, then according to The result set of theme training searches the sample most adjacent with natural language to be processed, and returns to theme.By will be to be processed Natural language is split, after characteristic matching, and splitting and reorganizing is semantic sequence, can reduce the complexity of natural language, and The result set of theme training is the natural language sample collected in advance, is obtained after being trained by word sequence, by a large amount of Sample insertion word sequence be trained so that its have semanteme, the accuracy of theme training result collection can be improved;Then lead to Cross theme training result concentrate search with the most adjacent sample of language to be processed, and using the theme of sample as categorization results, So improving the accuracy sorted out, solves the technical problem that subject distillation result inaccuracy exists in the prior art.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of process of the classifying method of the space querying theme based on natural language processing in the embodiment of the present invention Figure;
Fig. 2 is to separate set of words exemplary diagram under a kind of application scenarios;
Fig. 3 is the exemplary diagram that natural language carries out partition test;
Fig. 4 is a kind of corresponding conceptual vocabulary exemplary diagram of application scenarios;
Fig. 5 is the test result exemplary diagram carried out after characteristic matching;
Fig. 6 is the theme trained sample graph;
Fig. 7 is the theme the result schematic diagram of classification;
Fig. 8 is a kind of structure of the categorization arrangement of the space querying theme based on natural language processing in the embodiment of the present invention Figure.
Specific embodiment
The classifying method and device of the embodiment of the invention provides a kind of space querying theme based on natural language processing, To improve method in the prior art, there are the technical problems of subject distillation result inaccuracy, reduce ambiguity to reach, improve The technical effect of subject distillation and the accuracy of classification.
In order to reach above-mentioned technical effect, general thought of the invention is as follows:
Natural language to be processed is divided into the set of partition word, characteristic matching is successively then carried out to the word in set And semantic sequence rearranges, and obtains word sequence, and word sequence is used for the theme training of a large amount of samples, obtains theme instruction Practice result set, searches the sample most adjacent with natural language to be processed further according to the result set of theme training, and return to theme It is sorting out that space querying theme of natural language is sorted out as a result, being reached with this.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Embodiment one
A kind of classifying method of space querying theme based on natural language processing is present embodiments provided, referring to Figure 1, This method comprises:
Step S1 is first carried out: natural language to be processed is divided into the set of word based on default partition word.
Wherein, it presets partition root and obtains according to analyzing to summarize after a large amount of sample, in present embodiment, preset and separate word packet Include actional verb, preposition, subject, Feature Words and interrogative.
In the specific implementation process, by taking scene of doing shopping as an example, the partition word being related to is as shown in Figure 2, wherein L refers to sentence There are other words on the left side of current word in son, and R, which refers to, other words on the right of word current in sentence, and LR refers to word current in sentence The left side and the right have other words.Actional verb includes:, to sell, buy, seeing, that preposition includes:, have, waiting, subject packet Include: I, we, you, he etc., Feature Words include: shop, shop, dining room etc., interrogative include:, where, what etc..
Next, being split for the natural language involved in the scene of doing shopping, as shown in figure 3, key represents identification Partition word out, unknow represent the word not identified.For example, this sentence that " will have a drink " is split, then can be divided into It drinks and beverage, wherein " drinking " is the actional verb identified." three buildings are what is sold ", the set of words being partitioned into include " three Building ", "Yes", " selling ", " what " " ".For example, " I wants to go to Nike purchase sport footwear " this sentence can be converted by segmentation At me, want, Nike, the set for buying these words of sport footwear.
Then it executes step S2: the word in the set of institute's predicate is subjected to feature with the conceptual lexicon constructed in advance Matching obtains word sequence corresponding with preset structure.
Wherein, the conceptual lexicon constructed in advance is gone out by a large amount of sample analysis and summary, wherein including feature Word.
In one embodiment, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
Specifically, in order to match as far as possible user input each word classification, the present embodiment be shopping scene structure Conceptual lexicon is built.Notional word can be classified as POI, business vocabulary, business assessment, spatial relationship, verb term, the time, Theme, where, how, Baggage Inquiry etc..As shown in figure 4, be the corresponding conceptual vocabulary exemplary diagram of shopping application scene, In, point of interest (POI Point Of Interest) includes: shop, shop, dining room, supermarket etc., and service attribute includes: men's clothing, female Dress, underwear etc., service attribute evaluation include: it is nice, be not very good eating, drink well, it is numerous to list herein.
In one embodiment, step S2 is specifically included:
Step S2.1: carrying out characteristic matching with the conceptual lexicon constructed in advance for the word in the set of institute's predicate, Obtain Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " form.
Specifically, as shown in figure 5, being characterized the test result exemplary diagram after matching.Wherein, SREL represents surrounding ring Border;GOODS represents commodity, cargo;Unknow represents unknown;CPOI represents important point of interest;Ude represents auxiliary word;V represents dynamic Word;COMM represents attribute, evaluation;GOODSQRY represents the query to commodity;COMMQRY represents the query to attribute;LOCQRY Represent the query to position;POI represents point of interest.According to the structure of " theme-verb-point of interest-verb-article " to feature Word sequence after matching is resequenced, the word sequence after being sorted.
Step S3 is executed again: being concentrated, is searched and the most adjacent sample of the natural language to be processed in theme training result This, wherein the theme training result collection is obtained after being trained by the word sequence by the natural language sample collected in advance, Comprising text and inquiry theme in the sample, the inquiry theme for including in the sample is returned, and the inquiry theme is made For categorization results.
Specifically, the natural language sample collected in advance can be gone out by artificial or equipment analysis varying environment or Great amount of samples under application scenarios, summary obtain.Such as shopping scene, tourism scene etc..Theme training result is concentrated comprising each The classification of kind theme.
Wherein, lookup and the most adjacent sample of the natural language to be processed, can be more similar by presetting method Degree obtains.
In one embodiment, theme training result collection passes through the word order by the natural language sample collected in advance It obtains, specifically includes after column training:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text, Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into It is trained into ElaticSearch, obtains the theme training result collection.
Specifically, ElasticSearch is the search server based on Lucene, it can provide distribution The full-text search of multi-user capability stablizes to reach real-time search, is reliable, quick technical effect.Training sample, that is, theme Set, each theme has been divided into the word sequence comprising " theme-verb-POI- verb-article ".Each sentence is (certainly Right language) there is a specific theme, refer to the real purpose of user.For example, " I wants to buy sport footwear " means to use Family goes for the information of POI relevant to sport footwear.After sentence is divided into word sequence, due to user input sentence (to The natural language of processing) step language words, it is therefore desirable to it is replaced using the Feature Words in the concept lexicon constructed in advance, from And achieve the effect that the standardization of word, preferably to search nearest sample, improve precision and accuracy.
In the specific implementation process, Fig. 6 is referred to, be the theme trained sample graph, and applicant is a large amount of by analysis Shopping consulting sample, summarizes five themes: 1) inquiry has the POI of specified (special) business function;2) it navigates to have and refer to Determine the POI of business function;3) the assessment information of POI is inquired;4) business function of POI is inquired;5) film is inquired.Theme trains rank The input of section is the training sample comprising subject information, i.e. the set of theme, and each theme includes to have been divided into multiple sentences Son, each sentence are inputted as sample, and sentence structure is shaped like (buying/V shoes/cargo).Fig. 7 is referred to, the result for the classification that is the theme Schematic diagram, second floor women's dress, have lunch, manicure etc. is classified as inquiring some POI for having FEATURE service function, I wants to drink drink Material, I to buy wallet etc. and be classified as navigating to the POI of some alternative particular service function, bright nice the, Home Alone of Huang note How the joyful web page interlinkage for being classified as some POI opens.
In addition, natural language to be processed includes spatial information, then in training process in method provided by the invention In, a large amount of samples including spatial positional information also can be used and carry out themes training, can make method of the invention can be with To contain in the theme inquiry of the natural language of spatial information, the accuracy of subject distillation is improved.
In one embodiment, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed Sample.
Specifically, pre-determined distance editor algorithm is the method in ElaticSearch, can be found out by this method It is closest with natural language to be processed.
In one embodiment, it is replaced in the conceptual lexicon constructed in advance by the word for including in training sample Feature Words before, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
Specifically, if the word in training sample corresponds to the plurality of classes class of corresponding concept lexicon, in order to avoid Mistake is not replaced the word then.
In order to illustrate more clearly of the realization process of scene recognition method of the invention, below by a specific example It is introduced,
Specifically, natural language to be processed is that " I will buy sneakers.", step S1 is first carried out, based on default partition Word is split it, obtained set of words are as follows: " I " " wanting " " buying " " sneakers ", then execute step S2 by set of words and in advance The conceptional features lexicon of building carries out characteristic matching, obtains following format: the word sequence of subject-V-V-goods, then Execute step S3, from theme training result Integrated query and [" I " " wanting " " buying " " sneakers ", subject-V-V-goods] most Close sample, e.g. [" I " " wanting " " buying " " shoes ", subject-V-V-goods, " navigating to specific function shopping area ", " I Buy shoes "], then " navigating to specific function shopping area " is as return theme, the result of classification.
The classifying method of space querying theme disclosed by the invention based on natural language processing, natural language to be processed Speech, which is divided and reconfigures, becomes the semantic sequence (word sequence) comprising " theme-verb-POI- verb-article ", and passes through These word sequences carry out theme training, available preferable subject distillation effect to the sample collected in advance.
Based on the same inventive concept, present invention also provides with the space querying based on natural language processing in embodiment one The corresponding device of the classifying method of theme, detailed in Example two.
Embodiment two
The present embodiment provides the categorization arrangements of the space querying theme based on natural language processing, refer to Fig. 8, the device Include:
Language divides module 801, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module 802, for by word and the conceptual lexicon that in advance constructs in the set of institute's predicate into Row characteristic matching obtains word sequence corresponding with preset structure;
Theme classifying module 803 is searched and the natural language to be processed most phase for concentrating in theme training result Adjacent sample, wherein the theme training result collection is by the natural language sample collected in advance, after word sequence training It obtains, comprising text and inquiry theme in the sample, returns to the inquiry theme for including in the sample, and the inquiry is led Topic is used as categorization results.
In one embodiment, the default partition word includes: actional verb, preposition, subject, Feature Words and query Word.
In one embodiment, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
In one embodiment, the preset structure is " theme-verb-point of interest-verb-article ", characteristic matching Module 802 is specifically used for:
Word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, obtains feature Word;
The Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
In one embodiment, theme training result collection passes through the word order by the natural language sample collected in advance It is obtained after column training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text, Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into It is trained into ElaticSearch, obtains the theme training result collection.
In one embodiment, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed Sample.
In one embodiment, device provided in this embodiment further includes judgment module, for will be in training sample The word for including replaces with before the Feature Words in the conceptual lexicon constructed in advance:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
By the device that the embodiment of the present invention two is introduced, to implement to be based on natural language processing in the embodiment of the present invention one Space querying theme classifying method used by device, so based on the method that the embodiment of the present invention one is introduced, ability The affiliated personnel in domain can understand specific structure and the deformation of the device, so details are not described herein.All embodiment of the present invention one Method used by device belong to the range to be protected of the invention.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.

Claims (9)

1. a kind of classifying method of the space querying theme based on natural language processing characterized by comprising
Step S1: natural language to be processed is divided into the set of word based on default partition word;
Step S2: carrying out characteristic matching with the conceptual lexicon in advance constructed for the word in the set of institute's predicate, obtain with The corresponding word sequence of preset structure;
Step S3: it concentrates, searches and the most adjacent sample of the natural language to be processed, wherein institute in theme training result Theme training result collection is stated by the natural language sample collected in advance, is obtained after being trained by the word sequence, in the sample Comprising text and inquiry theme, the inquiry theme for including in the sample is returned to, and using the inquiry theme as categorization results.
2. the method as described in claim 1, which is characterized in that the default partition word include: actional verb, preposition, subject, Feature Words and interrogative.
3. the method as described in claim 1, which is characterized in that the conceptual lexicon constructed in advance include point of interest, Service attribute, service attribute evaluation, spatial relationship, actional verb, time, personage, place query, evaluation query, business query.
4. the method as described in claim 1, which is characterized in that the preset structure is " theme-verb-point of interest-verb- Article ", step S2 are specifically included:
Step S2.1: the word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, is obtained Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
5. method as claimed in claim 4, which is characterized in that theme training result collection is by the natural language sample collected in advance Example, by being obtained after word sequence training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text, theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, and theme is the theme number of training sample, ID is the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the Feature Words in the conceptual lexicon constructed in advance, Obtain the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into It is trained in ElaticSearch, obtains the theme training result collection.
6. method as claimed in claim 5, which is characterized in that searched according to pre-determined distance editor's algorithm and described to be processed The most adjacent sample of natural language.
7. method as claimed in claim 5, which is characterized in that constructed in advance replacing with the word for including in training sample Before Feature Words in conceptual lexicon, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
8. a kind of categorization arrangement of the space querying theme based on natural language processing characterized by comprising
Language divides module, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module, for the word in the set of institute's predicate to be carried out feature with the conceptual lexicon constructed in advance Match, obtains word sequence corresponding with preset structure;
Theme classifying module is searched and the most adjacent sample of the natural language to be processed for concentrating in theme training result This, wherein the theme training result collection is obtained after being trained by the word sequence by the natural language sample collected in advance, Comprising text and inquiry theme in the sample, the inquiry theme for including in the sample is returned, and the inquiry theme is made For categorization results.
9. device as claimed in claim 8, which is characterized in that the default partition word include: actional verb, preposition, subject, Feature Words and interrogative.
CN201811116358.5A 2018-09-25 2018-09-25 Method and device for classifying spatial query topics based on natural language processing Active CN109271502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811116358.5A CN109271502B (en) 2018-09-25 2018-09-25 Method and device for classifying spatial query topics based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811116358.5A CN109271502B (en) 2018-09-25 2018-09-25 Method and device for classifying spatial query topics based on natural language processing

Publications (2)

Publication Number Publication Date
CN109271502A true CN109271502A (en) 2019-01-25
CN109271502B CN109271502B (en) 2020-08-07

Family

ID=65197490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811116358.5A Active CN109271502B (en) 2018-09-25 2018-09-25 Method and device for classifying spatial query topics based on natural language processing

Country Status (1)

Country Link
CN (1) CN109271502B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006031228A (en) * 2004-07-14 2006-02-02 Oki Electric Ind Co Ltd Morphemic analysis device, method, and program
CN103207917A (en) * 2013-04-25 2013-07-17 百度在线网络技术(北京)有限公司 Method for marking multimedia content and method and system for generating recommended content
US20170024376A1 (en) * 2015-07-21 2017-01-26 Facebook, Inc. Data sorting for language processing such as pos tagging
CN108009228A (en) * 2017-11-27 2018-05-08 咪咕互动娱乐有限公司 A kind of method to set up of content tab, device and storage medium
CN108052547A (en) * 2017-11-27 2018-05-18 华中科技大学 Natural language question-answering method and system based on question sentence and knowledge graph structural analysis
CN108121790A (en) * 2017-12-19 2018-06-05 百度在线网络技术(北京)有限公司 Chinese character by words querying method, device, server, equipment and storage medium
CN108197298A (en) * 2018-01-23 2018-06-22 北京知行信科技有限公司 A kind of smart shopper exchange method and system based on natural language processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006031228A (en) * 2004-07-14 2006-02-02 Oki Electric Ind Co Ltd Morphemic analysis device, method, and program
CN103207917A (en) * 2013-04-25 2013-07-17 百度在线网络技术(北京)有限公司 Method for marking multimedia content and method and system for generating recommended content
US20170024376A1 (en) * 2015-07-21 2017-01-26 Facebook, Inc. Data sorting for language processing such as pos tagging
CN108009228A (en) * 2017-11-27 2018-05-08 咪咕互动娱乐有限公司 A kind of method to set up of content tab, device and storage medium
CN108052547A (en) * 2017-11-27 2018-05-18 华中科技大学 Natural language question-answering method and system based on question sentence and knowledge graph structural analysis
CN108121790A (en) * 2017-12-19 2018-06-05 百度在线网络技术(北京)有限公司 Chinese character by words querying method, device, server, equipment and storage medium
CN108197298A (en) * 2018-01-23 2018-06-22 北京知行信科技有限公司 A kind of smart shopper exchange method and system based on natural language processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system

Also Published As

Publication number Publication date
CN109271502B (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN107633007B (en) Commodity comment data tagging system and method based on hierarchical AP clustering
Chang et al. Predicting aspect-based sentiment using deep learning and information visualization: The impact of COVID-19 on the airline industry
Kaushik et al. A comprehensive study of text mining approach
KR101955318B1 (en) The method for visualizing big data in cosmetic information providing mobile application
CN104484374A (en) Method and device for creating Internet encyclopedia entry
Tighe et al. Personality Trait Classification of Essays with the Application of Feature Reduction.
CN108959643A (en) Generate method, apparatus, server and the storage medium of label
Alzyout et al. Sentiment analysis of arabic tweets about violence against women using machine learning
Khedkar et al. Customer review analytics for business intelligence
CN104221012A (en) Document search device and document search method
Mozafari et al. Emotion detection by using similarity techniques
CN106951433B (en) Retrieval method and device
Cai et al. PURA: a product-and-user oriented approach for requirement analysis from online reviews
Izza et al. A qualitative review on halal tourism: NVivo approach
Gim et al. A trend analysis method for IoT technologies using patent dataset with goal and approach concepts
Guadarrama et al. Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
CN109271502A (en) A kind of classifying method and device of the space querying theme based on natural language processing
CN107861953B (en) Automatic name translation system and method
CN113298559A (en) Commodity applicable crowd recommendation method, system, device and storage medium
US20220207240A1 (en) System and method for analyzing similarity of natural language data
CN112084312A (en) Intelligent customer service system constructed based on knowledge graph
CN105975480A (en) Instruction identification method and system
Fattoh et al. Semantic question generation using artificial immunity
Kang et al. Recognising informative Web page blocks using visual segmentation for efficient information extraction.
Kuyten et al. A discourse search engine based on rhetorical structure theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant