CN109271502A - A kind of classifying method and device of the space querying theme based on natural language processing - Google Patents
A kind of classifying method and device of the space querying theme based on natural language processing Download PDFInfo
- Publication number
- CN109271502A CN109271502A CN201811116358.5A CN201811116358A CN109271502A CN 109271502 A CN109271502 A CN 109271502A CN 201811116358 A CN201811116358 A CN 201811116358A CN 109271502 A CN109271502 A CN 109271502A
- Authority
- CN
- China
- Prior art keywords
- theme
- word
- natural language
- sample
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The classifying method and device of the present invention provides a kind of space querying theme based on natural language processing, method therein includes: that the set of partition word is divided by the natural language for inputting user, then successively carries out characteristic matching to the word in set and rearranges with semantic sequence.The sample most adjacent with the natural language of input is searched further according to the result of theme training, and returns to theme, the space querying theme classification to natural language is reached with this.Realize the technical effect for improving subject distillation accuracy.
Description
Technical field
The present invention relates to natural language technical fields, and in particular to a kind of space querying theme based on natural language processing
Classifying method and device.
Background technique
With the rapid development of generation information technology industry, personal intelligence assistant has become the stream for improving quality of the life
Row application.According to the input of user, personal intelligence assistant can be completed by natural language understanding and automated information processing
Operational order.Natural language is the subdiscipline of artificial intelligence, and natural language processing is the reason using machine processing human language
By and technology, language is studied into corresponding algorithm as computing object.Purpose be allow the mankind can with natural language form with
Computer system carry out human-computer interaction, thus it is more convenient, information management is effectively performed.At the beginning of from the end of the nineties to 21 century, people
Gradually recognize, be all only successfully to carry out at natural language with Statistics-Based Method with rule-based method or only
Reason.Subsequent Case-based Reasoning and rule-based corpus technology are come into being.
Existing natural language processing technique, mainly include rule-based method and method two major classes type based on probability,
It is subdivided into based on Bayes principle method, based on Hidden Markov Model method, discourse analysis method, neural network method etc.
Deng.But the increasingly increase with people to information service demand, the semantic understanding of natural language there are still subject distillation difficulty with
The problems such as theme ambiguity.
With the continuous development of natural language understanding technology, the research of place query language there has also been it is certain into
Exhibition achieves many significant achievements.The main morphology being related to including place query language, syntax and semantic content and
The research of method and the research of place query language spatial relation semantics and refinement.Natural language cognition is related to processing
Oneself tends to be mature to theory and method, but also seldom for the research of the natural language of space field.Present invention applicant is implementing
In process of the invention, find existing method in, be primarily present following both sides problem: first is that due to natural language morphology,
Syntax and semantic is flexibly complicated, and interpretation process ambiguity situation is more, and existing research may be only available for some specific profession neck mostly
The GIS-Geographic Information System in domain, to be considered merely as the arbitrary way of other space querying modes for a long time.Second is that place
The research and accumulation of query language domain knowledge are less, lead to space dictionary summary and induction not system, space querying syntactic analysis
It concludes not perfect while less to the existing research achievement of Spatial Semantics.At present in natural language understanding technology, space is believed
The extraction of theme is ceased, ambiguity and mistake are more in explanation results.
From the foregoing, it will be observed that the technical problem of subject distillation result inaccuracy existing for method in the prior art.
Summary of the invention
In view of this, the classifying method and dress of the present invention provides a kind of space querying theme based on natural language processing
It sets, to solve or at least partly solve the technical problem of subject distillation result inaccuracy existing for method in the prior art.
First aspect present invention provides a kind of classifying method of space querying theme based on natural language processing, packet
It includes:
Step S1: natural language to be processed is divided into the set of word based on default partition word;
Step S2: the word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, is obtained
Obtain word sequence corresponding with preset structure;
Step S3: concentrating in theme training result, lookup and the most adjacent sample of the natural language to be processed,
In, the theme training result collection is obtained, the sample by the natural language sample collected in advance after being trained by the word sequence
Comprising text and inquiry theme in this, the inquiry theme for including in the sample is returned to, and using the inquiry theme as classification
As a result.
In one implementation, the default partition word includes: actional verb, preposition, subject, Feature Words and query
Word.
In one implementation, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category
Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
In one implementation, the preset structure is " theme-verb-point of interest-verb-article ", step S2 tool
Body includes:
Step S2.1: carrying out characteristic matching with the conceptual lexicon constructed in advance for the word in the set of institute's predicate,
Obtain Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
In one implementation, theme training result collection passes through the word order by the natural language sample collected in advance
It is obtained after column training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text,
Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample
Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance
Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into
It is trained into ElaticSearch, obtains the theme training result collection.
In one implementation, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed
Sample.
In one implementation, it is replaced in the conceptual lexicon constructed in advance by the word for including in training sample
Feature Words before, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
Based on same inventive concept, second aspect of the present invention provides a kind of space querying based on natural language processing
The categorization arrangement of theme, comprising:
Language divides module, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module, it is special for carrying out the word in the set of institute's predicate with the conceptual lexicon constructed in advance
Sign matching, obtains word sequence corresponding with preset structure;
Theme classifying module is searched most adjacent with the natural language to be processed for concentrating in theme training result
Sample, wherein the theme training result collection is by the natural language sample collected in advance, by obtaining after word sequence training
, comprising text and inquiry theme in the sample, return to the inquiry theme for including in the sample, and by the inquiry theme
As categorization results.
In one implementation, the default partition word includes: actional verb, preposition, subject, Feature Words and query
Word.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects
Fruit:
It, then will be in set by the way that natural language to be processed to be divided into the set of word in method provided by the invention
Word carry out characteristic matching with the conceptual lexicon that constructs in advance, obtain word sequence corresponding with preset structure, then according to
The result set of theme training searches the sample most adjacent with natural language to be processed, and returns to theme.By will be to be processed
Natural language is split, after characteristic matching, and splitting and reorganizing is semantic sequence, can reduce the complexity of natural language, and
The result set of theme training is the natural language sample collected in advance, is obtained after being trained by word sequence, by a large amount of
Sample insertion word sequence be trained so that its have semanteme, the accuracy of theme training result collection can be improved;Then lead to
Cross theme training result concentrate search with the most adjacent sample of language to be processed, and using the theme of sample as categorization results,
So improving the accuracy sorted out, solves the technical problem that subject distillation result inaccuracy exists in the prior art.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of process of the classifying method of the space querying theme based on natural language processing in the embodiment of the present invention
Figure;
Fig. 2 is to separate set of words exemplary diagram under a kind of application scenarios;
Fig. 3 is the exemplary diagram that natural language carries out partition test;
Fig. 4 is a kind of corresponding conceptual vocabulary exemplary diagram of application scenarios;
Fig. 5 is the test result exemplary diagram carried out after characteristic matching;
Fig. 6 is the theme trained sample graph;
Fig. 7 is the theme the result schematic diagram of classification;
Fig. 8 is a kind of structure of the categorization arrangement of the space querying theme based on natural language processing in the embodiment of the present invention
Figure.
Specific embodiment
The classifying method and device of the embodiment of the invention provides a kind of space querying theme based on natural language processing,
To improve method in the prior art, there are the technical problems of subject distillation result inaccuracy, reduce ambiguity to reach, improve
The technical effect of subject distillation and the accuracy of classification.
In order to reach above-mentioned technical effect, general thought of the invention is as follows:
Natural language to be processed is divided into the set of partition word, characteristic matching is successively then carried out to the word in set
And semantic sequence rearranges, and obtains word sequence, and word sequence is used for the theme training of a large amount of samples, obtains theme instruction
Practice result set, searches the sample most adjacent with natural language to be processed further according to the result set of theme training, and return to theme
It is sorting out that space querying theme of natural language is sorted out as a result, being reached with this.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Embodiment one
A kind of classifying method of space querying theme based on natural language processing is present embodiments provided, referring to Figure 1,
This method comprises:
Step S1 is first carried out: natural language to be processed is divided into the set of word based on default partition word.
Wherein, it presets partition root and obtains according to analyzing to summarize after a large amount of sample, in present embodiment, preset and separate word packet
Include actional verb, preposition, subject, Feature Words and interrogative.
In the specific implementation process, by taking scene of doing shopping as an example, the partition word being related to is as shown in Figure 2, wherein L refers to sentence
There are other words on the left side of current word in son, and R, which refers to, other words on the right of word current in sentence, and LR refers to word current in sentence
The left side and the right have other words.Actional verb includes:, to sell, buy, seeing, that preposition includes:, have, waiting, subject packet
Include: I, we, you, he etc., Feature Words include: shop, shop, dining room etc., interrogative include:, where, what etc..
Next, being split for the natural language involved in the scene of doing shopping, as shown in figure 3, key represents identification
Partition word out, unknow represent the word not identified.For example, this sentence that " will have a drink " is split, then can be divided into
It drinks and beverage, wherein " drinking " is the actional verb identified." three buildings are what is sold ", the set of words being partitioned into include " three
Building ", "Yes", " selling ", " what " " ".For example, " I wants to go to Nike purchase sport footwear " this sentence can be converted by segmentation
At me, want, Nike, the set for buying these words of sport footwear.
Then it executes step S2: the word in the set of institute's predicate is subjected to feature with the conceptual lexicon constructed in advance
Matching obtains word sequence corresponding with preset structure.
Wherein, the conceptual lexicon constructed in advance is gone out by a large amount of sample analysis and summary, wherein including feature
Word.
In one embodiment, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category
Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
Specifically, in order to match as far as possible user input each word classification, the present embodiment be shopping scene structure
Conceptual lexicon is built.Notional word can be classified as POI, business vocabulary, business assessment, spatial relationship, verb term, the time,
Theme, where, how, Baggage Inquiry etc..As shown in figure 4, be the corresponding conceptual vocabulary exemplary diagram of shopping application scene,
In, point of interest (POI Point Of Interest) includes: shop, shop, dining room, supermarket etc., and service attribute includes: men's clothing, female
Dress, underwear etc., service attribute evaluation include: it is nice, be not very good eating, drink well, it is numerous to list herein.
In one embodiment, step S2 is specifically included:
Step S2.1: carrying out characteristic matching with the conceptual lexicon constructed in advance for the word in the set of institute's predicate,
Obtain Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " form.
Specifically, as shown in figure 5, being characterized the test result exemplary diagram after matching.Wherein, SREL represents surrounding ring
Border;GOODS represents commodity, cargo;Unknow represents unknown;CPOI represents important point of interest;Ude represents auxiliary word;V represents dynamic
Word;COMM represents attribute, evaluation;GOODSQRY represents the query to commodity;COMMQRY represents the query to attribute;LOCQRY
Represent the query to position;POI represents point of interest.According to the structure of " theme-verb-point of interest-verb-article " to feature
Word sequence after matching is resequenced, the word sequence after being sorted.
Step S3 is executed again: being concentrated, is searched and the most adjacent sample of the natural language to be processed in theme training result
This, wherein the theme training result collection is obtained after being trained by the word sequence by the natural language sample collected in advance,
Comprising text and inquiry theme in the sample, the inquiry theme for including in the sample is returned, and the inquiry theme is made
For categorization results.
Specifically, the natural language sample collected in advance can be gone out by artificial or equipment analysis varying environment or
Great amount of samples under application scenarios, summary obtain.Such as shopping scene, tourism scene etc..Theme training result is concentrated comprising each
The classification of kind theme.
Wherein, lookup and the most adjacent sample of the natural language to be processed, can be more similar by presetting method
Degree obtains.
In one embodiment, theme training result collection passes through the word order by the natural language sample collected in advance
It obtains, specifically includes after column training:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text,
Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample
Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance
Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into
It is trained into ElaticSearch, obtains the theme training result collection.
Specifically, ElasticSearch is the search server based on Lucene, it can provide distribution
The full-text search of multi-user capability stablizes to reach real-time search, is reliable, quick technical effect.Training sample, that is, theme
Set, each theme has been divided into the word sequence comprising " theme-verb-POI- verb-article ".Each sentence is (certainly
Right language) there is a specific theme, refer to the real purpose of user.For example, " I wants to buy sport footwear " means to use
Family goes for the information of POI relevant to sport footwear.After sentence is divided into word sequence, due to user input sentence (to
The natural language of processing) step language words, it is therefore desirable to it is replaced using the Feature Words in the concept lexicon constructed in advance, from
And achieve the effect that the standardization of word, preferably to search nearest sample, improve precision and accuracy.
In the specific implementation process, Fig. 6 is referred to, be the theme trained sample graph, and applicant is a large amount of by analysis
Shopping consulting sample, summarizes five themes: 1) inquiry has the POI of specified (special) business function;2) it navigates to have and refer to
Determine the POI of business function;3) the assessment information of POI is inquired;4) business function of POI is inquired;5) film is inquired.Theme trains rank
The input of section is the training sample comprising subject information, i.e. the set of theme, and each theme includes to have been divided into multiple sentences
Son, each sentence are inputted as sample, and sentence structure is shaped like (buying/V shoes/cargo).Fig. 7 is referred to, the result for the classification that is the theme
Schematic diagram, second floor women's dress, have lunch, manicure etc. is classified as inquiring some POI for having FEATURE service function, I wants to drink drink
Material, I to buy wallet etc. and be classified as navigating to the POI of some alternative particular service function, bright nice the, Home Alone of Huang note
How the joyful web page interlinkage for being classified as some POI opens.
In addition, natural language to be processed includes spatial information, then in training process in method provided by the invention
In, a large amount of samples including spatial positional information also can be used and carry out themes training, can make method of the invention can be with
To contain in the theme inquiry of the natural language of spatial information, the accuracy of subject distillation is improved.
In one embodiment, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed
Sample.
Specifically, pre-determined distance editor algorithm is the method in ElaticSearch, can be found out by this method
It is closest with natural language to be processed.
In one embodiment, it is replaced in the conceptual lexicon constructed in advance by the word for including in training sample
Feature Words before, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
Specifically, if the word in training sample corresponds to the plurality of classes class of corresponding concept lexicon, in order to avoid
Mistake is not replaced the word then.
In order to illustrate more clearly of the realization process of scene recognition method of the invention, below by a specific example
It is introduced,
Specifically, natural language to be processed is that " I will buy sneakers.", step S1 is first carried out, based on default partition
Word is split it, obtained set of words are as follows: " I " " wanting " " buying " " sneakers ", then execute step S2 by set of words and in advance
The conceptional features lexicon of building carries out characteristic matching, obtains following format: the word sequence of subject-V-V-goods, then
Execute step S3, from theme training result Integrated query and [" I " " wanting " " buying " " sneakers ", subject-V-V-goods] most
Close sample, e.g. [" I " " wanting " " buying " " shoes ", subject-V-V-goods, " navigating to specific function shopping area ", " I
Buy shoes "], then " navigating to specific function shopping area " is as return theme, the result of classification.
The classifying method of space querying theme disclosed by the invention based on natural language processing, natural language to be processed
Speech, which is divided and reconfigures, becomes the semantic sequence (word sequence) comprising " theme-verb-POI- verb-article ", and passes through
These word sequences carry out theme training, available preferable subject distillation effect to the sample collected in advance.
Based on the same inventive concept, present invention also provides with the space querying based on natural language processing in embodiment one
The corresponding device of the classifying method of theme, detailed in Example two.
Embodiment two
The present embodiment provides the categorization arrangements of the space querying theme based on natural language processing, refer to Fig. 8, the device
Include:
Language divides module 801, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module 802, for by word and the conceptual lexicon that in advance constructs in the set of institute's predicate into
Row characteristic matching obtains word sequence corresponding with preset structure;
Theme classifying module 803 is searched and the natural language to be processed most phase for concentrating in theme training result
Adjacent sample, wherein the theme training result collection is by the natural language sample collected in advance, after word sequence training
It obtains, comprising text and inquiry theme in the sample, returns to the inquiry theme for including in the sample, and the inquiry is led
Topic is used as categorization results.
In one embodiment, the default partition word includes: actional verb, preposition, subject, Feature Words and query
Word.
In one embodiment, the conceptual lexicon constructed in advance includes point of interest, service attribute, business category
Property evaluation, spatial relationship, actional verb, the time, personage, place query, evaluation query, business query.
In one embodiment, the preset structure is " theme-verb-point of interest-verb-article ", characteristic matching
Module 802 is specifically used for:
Word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, obtains feature
Word;
The Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
In one embodiment, theme training result collection passes through the word order by the natural language sample collected in advance
It is obtained after column training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text,
Theme, ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, theme is the master of training sample
Question number, ID are the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the feature in the conceptual lexicon constructed in advance
Word obtains the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into
It is trained into ElaticSearch, obtains the theme training result collection.
In one embodiment, it is searched according to pre-determined distance editor's algorithm most adjacent with the natural language to be processed
Sample.
In one embodiment, device provided in this embodiment further includes judgment module, for will be in training sample
The word for including replaces with before the Feature Words in the conceptual lexicon constructed in advance:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
By the device that the embodiment of the present invention two is introduced, to implement to be based on natural language processing in the embodiment of the present invention one
Space querying theme classifying method used by device, so based on the method that the embodiment of the present invention one is introduced, ability
The affiliated personnel in domain can understand specific structure and the deformation of the device, so details are not described herein.All embodiment of the present invention one
Method used by device belong to the range to be protected of the invention.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.
Claims (9)
1. a kind of classifying method of the space querying theme based on natural language processing characterized by comprising
Step S1: natural language to be processed is divided into the set of word based on default partition word;
Step S2: carrying out characteristic matching with the conceptual lexicon in advance constructed for the word in the set of institute's predicate, obtain with
The corresponding word sequence of preset structure;
Step S3: it concentrates, searches and the most adjacent sample of the natural language to be processed, wherein institute in theme training result
Theme training result collection is stated by the natural language sample collected in advance, is obtained after being trained by the word sequence, in the sample
Comprising text and inquiry theme, the inquiry theme for including in the sample is returned to, and using the inquiry theme as categorization results.
2. the method as described in claim 1, which is characterized in that the default partition word include: actional verb, preposition, subject,
Feature Words and interrogative.
3. the method as described in claim 1, which is characterized in that the conceptual lexicon constructed in advance include point of interest,
Service attribute, service attribute evaluation, spatial relationship, actional verb, time, personage, place query, evaluation query, business query.
4. the method as described in claim 1, which is characterized in that the preset structure is " theme-verb-point of interest-verb-
Article ", step S2 are specifically included:
Step S2.1: the word in the set of institute's predicate is subjected to characteristic matching with the conceptual lexicon constructed in advance, is obtained
Feature Words;
Step S2.2: the Feature Words are converted to the word sequence of " theme-verb-point of interest-verb-article " structure.
5. method as claimed in claim 4, which is characterized in that theme training result collection is by the natural language sample collected in advance
Example, by being obtained after word sequence training, specifically:
Obtain the training sample comprising subject information;
Create the training sample ElaticSearch index and mapping, wherein it is described mapping include [the first text, theme,
ID, row], wherein the first text is the word sequence, is the partial list of space segmentation, and theme is the theme number of training sample,
ID is the ID of training sample, behavior natural language to be processed;
All mappings are traversed, the word for including in training sample is replaced with into the Feature Words in the conceptual lexicon constructed in advance,
Obtain the second text;
All mappings are traversed, to each training sample, are constructed [the first text, the second text, theme, ID], and be inserted into
It is trained in ElaticSearch, obtains the theme training result collection.
6. method as claimed in claim 5, which is characterized in that searched according to pre-determined distance editor's algorithm and described to be processed
The most adjacent sample of natural language.
7. method as claimed in claim 5, which is characterized in that constructed in advance replacing with the word for including in training sample
Before Feature Words in conceptual lexicon, the method also includes:
Whether the word in training of judgement sample corresponds at least two classifications of the concept lexicon constructed in advance,
If it is, for the word without replacement.
8. a kind of categorization arrangement of the space querying theme based on natural language processing characterized by comprising
Language divides module, for natural language to be processed to be divided into the set of word based on default partition word;
Characteristic matching module, for the word in the set of institute's predicate to be carried out feature with the conceptual lexicon constructed in advance
Match, obtains word sequence corresponding with preset structure;
Theme classifying module is searched and the most adjacent sample of the natural language to be processed for concentrating in theme training result
This, wherein the theme training result collection is obtained after being trained by the word sequence by the natural language sample collected in advance,
Comprising text and inquiry theme in the sample, the inquiry theme for including in the sample is returned, and the inquiry theme is made
For categorization results.
9. device as claimed in claim 8, which is characterized in that the default partition word include: actional verb, preposition, subject,
Feature Words and interrogative.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811116358.5A CN109271502B (en) | 2018-09-25 | 2018-09-25 | Method and device for classifying spatial query topics based on natural language processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811116358.5A CN109271502B (en) | 2018-09-25 | 2018-09-25 | Method and device for classifying spatial query topics based on natural language processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109271502A true CN109271502A (en) | 2019-01-25 |
CN109271502B CN109271502B (en) | 2020-08-07 |
Family
ID=65197490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811116358.5A Active CN109271502B (en) | 2018-09-25 | 2018-09-25 | Method and device for classifying spatial query topics based on natural language processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271502B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222709A (en) * | 2019-04-29 | 2019-09-10 | 上海暖哇科技有限公司 | A kind of multi-tag intelligence marking method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006031228A (en) * | 2004-07-14 | 2006-02-02 | Oki Electric Ind Co Ltd | Morphemic analysis device, method, and program |
CN103207917A (en) * | 2013-04-25 | 2013-07-17 | 百度在线网络技术(北京)有限公司 | Method for marking multimedia content and method and system for generating recommended content |
US20170024376A1 (en) * | 2015-07-21 | 2017-01-26 | Facebook, Inc. | Data sorting for language processing such as pos tagging |
CN108009228A (en) * | 2017-11-27 | 2018-05-08 | 咪咕互动娱乐有限公司 | A kind of method to set up of content tab, device and storage medium |
CN108052547A (en) * | 2017-11-27 | 2018-05-18 | 华中科技大学 | Natural language question-answering method and system based on question sentence and knowledge graph structural analysis |
CN108121790A (en) * | 2017-12-19 | 2018-06-05 | 百度在线网络技术(北京)有限公司 | Chinese character by words querying method, device, server, equipment and storage medium |
CN108197298A (en) * | 2018-01-23 | 2018-06-22 | 北京知行信科技有限公司 | A kind of smart shopper exchange method and system based on natural language processing |
-
2018
- 2018-09-25 CN CN201811116358.5A patent/CN109271502B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006031228A (en) * | 2004-07-14 | 2006-02-02 | Oki Electric Ind Co Ltd | Morphemic analysis device, method, and program |
CN103207917A (en) * | 2013-04-25 | 2013-07-17 | 百度在线网络技术(北京)有限公司 | Method for marking multimedia content and method and system for generating recommended content |
US20170024376A1 (en) * | 2015-07-21 | 2017-01-26 | Facebook, Inc. | Data sorting for language processing such as pos tagging |
CN108009228A (en) * | 2017-11-27 | 2018-05-08 | 咪咕互动娱乐有限公司 | A kind of method to set up of content tab, device and storage medium |
CN108052547A (en) * | 2017-11-27 | 2018-05-18 | 华中科技大学 | Natural language question-answering method and system based on question sentence and knowledge graph structural analysis |
CN108121790A (en) * | 2017-12-19 | 2018-06-05 | 百度在线网络技术(北京)有限公司 | Chinese character by words querying method, device, server, equipment and storage medium |
CN108197298A (en) * | 2018-01-23 | 2018-06-22 | 北京知行信科技有限公司 | A kind of smart shopper exchange method and system based on natural language processing |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222709A (en) * | 2019-04-29 | 2019-09-10 | 上海暖哇科技有限公司 | A kind of multi-tag intelligence marking method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109271502B (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107633007B (en) | Commodity comment data tagging system and method based on hierarchical AP clustering | |
Chang et al. | Predicting aspect-based sentiment using deep learning and information visualization: The impact of COVID-19 on the airline industry | |
Kaushik et al. | A comprehensive study of text mining approach | |
KR101955318B1 (en) | The method for visualizing big data in cosmetic information providing mobile application | |
CN104484374A (en) | Method and device for creating Internet encyclopedia entry | |
Tighe et al. | Personality Trait Classification of Essays with the Application of Feature Reduction. | |
CN108959643A (en) | Generate method, apparatus, server and the storage medium of label | |
Alzyout et al. | Sentiment analysis of arabic tweets about violence against women using machine learning | |
Khedkar et al. | Customer review analytics for business intelligence | |
CN104221012A (en) | Document search device and document search method | |
Mozafari et al. | Emotion detection by using similarity techniques | |
CN106951433B (en) | Retrieval method and device | |
Cai et al. | PURA: a product-and-user oriented approach for requirement analysis from online reviews | |
Izza et al. | A qualitative review on halal tourism: NVivo approach | |
Gim et al. | A trend analysis method for IoT technologies using patent dataset with goal and approach concepts | |
Guadarrama et al. | Understanding object descriptions in robotics by open-vocabulary object retrieval and detection | |
CN109271502A (en) | A kind of classifying method and device of the space querying theme based on natural language processing | |
CN107861953B (en) | Automatic name translation system and method | |
CN113298559A (en) | Commodity applicable crowd recommendation method, system, device and storage medium | |
US20220207240A1 (en) | System and method for analyzing similarity of natural language data | |
CN112084312A (en) | Intelligent customer service system constructed based on knowledge graph | |
CN105975480A (en) | Instruction identification method and system | |
Fattoh et al. | Semantic question generation using artificial immunity | |
Kang et al. | Recognising informative Web page blocks using visual segmentation for efficient information extraction. | |
Kuyten et al. | A discourse search engine based on rhetorical structure theory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |