CN110334110A - Natural language classification method, device, computer equipment and storage medium - Google Patents
Natural language classification method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110334110A CN110334110A CN201910449416.4A CN201910449416A CN110334110A CN 110334110 A CN110334110 A CN 110334110A CN 201910449416 A CN201910449416 A CN 201910449416A CN 110334110 A CN110334110 A CN 110334110A
- Authority
- CN
- China
- Prior art keywords
- natural language
- text data
- input
- word
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The embodiment of the invention discloses a kind of natural language classification method, device, computer equipment and storage mediums, wherein the described method includes: the natural language data of acquisition user's input, and by the natural language data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one or more word;Using the word in word segmentation result as input, the word segmentation result of text data is trained using default term vector model, obtains output as a result, the output result includes that the corresponding vector of each word indicates;Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification method, can carry out Accurate classification to Natural Query, provide diversification data base querying mode, improve the usage experience of user.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of natural language classification methods, device, computer equipment
And storage medium.
Background technique
Currently, the Natural Query of human oral, which is converted into computer, can identify query statement, typically will
Natural Query is converted into specific computer inquery sentence, this result in some databases can not identify converted it is specific
Computer inquery sentence, for example, the computer inquery sentence being currently converted into is SQL (Structured Query
Language, referred to as: structured query language) query statement, then can identify that SQL is looked into for relevant database
Sentence is ask, but for chart database, can not just identify SQL query statement, therefore, traditional Natural Query conversion
It is impossible to meet the market demands for mode.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of natural language classification method, device, computer equipment and storage
Medium can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body
It tests.
On the one hand, the embodiment of the invention provides a kind of natural language classification methods, this method comprises:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one
A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data
As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained
To the classification results for being directed to natural language data.
On the other hand, the embodiment of the invention provides a kind of natural language sorter, described device includes:
Converting unit, for acquiring the natural language data of user's input, and the natural language data conversion is pairs of
The text data answered;
Participle unit obtains the word segmentation result of the text data for segmenting the text data, and described point
Word result includes one or more word;
Training unit, for using the word in the word segmentation result as input, using default term vector model to described
The word segmentation result of text data is trained, and obtains output as a result, the output result includes the corresponding vector table of each word
Show;
Taxon, for term vector training result to be input to the mind for being used for natural language classification that training in advance obtains
Through network model, the classification results for natural language data are obtained.
Another aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in
On the memory and the computer program that can run on the processor, when the processor executes the computer program
Realize natural language classification method as described above.
It is described computer-readable to deposit in another aspect, the embodiment of the invention also provides a kind of computer readable storage medium
Storage media is stored with one or more than one computer program, and the one or more computer program can be by one
Or more than one processor executes, to realize natural language classification method as described above.
The embodiment of the present invention provides a kind of natural language classification method, device, computer equipment and storage medium, wherein
Method includes: the natural language data for acquiring user and inputting, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result is including one or more
A word;Using the word in the word segmentation result as input, using default term vector model to the participle of the text data
As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;By term vector training
As a result it is input to the neural network model for natural language classification that training in advance obtains, is obtained for natural language data
Classification results.The present invention is based on detection models to provide a kind of natural language classification method, can carry out to Natural Query quasi-
Really classification, provides diversification data base querying mode, improves the usage experience of user.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of application scenarios schematic diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 3 is a kind of another schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 4 is a kind of another schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 6 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 7 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 8 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 9 is a kind of structure composition schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded
Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment
And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the present invention illustrate in mountain and the appended claims used in term "and/or"
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Fig. 1 and Fig. 2 are please referred to, Fig. 1 is a kind of application scenarios of natural language classification method provided in an embodiment of the present invention
Schematic diagram, Fig. 2 are a kind of flow diagram of natural language classification method provided in an embodiment of the present invention.Natural language classification
Method is applied in server or terminal, wherein terminal can be smart phone, tablet computer, laptop, desktop computer,
Personal digital assistant and wearable device etc. have the electronic equipment of communication function.As an application, as shown in Figure 1, the nature
Language classification method is applied in server 10, which can be a server in Distributed Services platform, should
Server 10 executes natural language sort instructions, and by implementing result feedback in terminal 20.
It should be noted that only illustrate a terminal 20 in Fig. 1, in the actual operation process, server 10 can be with
Implementing result is fed back in more terminals 20.
Referring to Fig. 2, Fig. 2 is a kind of schematic flow diagram of natural language classification method provided in an embodiment of the present invention.Such as
Shown in Fig. 2, this approach includes the following steps S101~S104.
S101, the natural language data of acquisition user's input, and by the natural language data conversion at corresponding text
Data.
In embodiments of the present invention, the natural language data refer to the natural language retrieval for database of user's oral account
Language, such as: user oral account Natural Query are as follows: " this year insurance net profit be how many? ", more specifically, Ke Yitong
The natural language data of microphone acquisition user's input in terminal are crossed, and by natural language data conversion collected at corresponding
Text data.
Further, as shown in figure 3, it is described by the natural language data conversion at corresponding text data the step of,
Specifically include step S201~S204:
S201 utilizes the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained voice signal by S202;
S203 extracts the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model and is decoded by S204, to generate the text data.
In the same embodiment, by the natural language data conversion at corresponding text data, due to natural language
Data are voice signal, and voice messaging belongs to analog signal, it is therefore desirable to handle the voice signal of simulation, be counted
Word extracts the acoustic feature of voice signal.Wherein, such as mel-frequency cepstrum coefficient MFCC, linear prediction cepstrum coefficient can be used
The methods of coefficient LPCC, Multimedia Content Description Interface MPEG7 extract acoustic feature.Then, acoustic feature can be input to
Acoustic model is decoded, to obtain text data corresponding to voice signal.Namely the natural language data are turned
Change the process of corresponding text data into.
S102 segments the text data, obtains the word segmentation result of the text data, the word segmentation result packet
Include one or more word.
In embodiments of the present invention, described to segment the text data, comprising: using based on probability statistics model
Segmenting method the text data is segmented.For example, enabling C=C1C2...Cm, C is that text data to be segmented is corresponding
Chinese character string, enable W=W1W2...Wn, W be participle as a result, Wa, Wb ..., Wk are all possible participle schemes of C.That
, the participle model based on probability statistics is to find purpose word string W, so that W meets: P (W | C)=MAX (P (Wa | C),
P (Wb | C) ... P (Wk | C)) participle model, the word string W i.e. estimated probability that above-mentioned participle model obtains is the word string of maximum, and
Using word string W as the word segmentation result obtained after text data participle.Such as: " this year, the net profit of insurance was text data
How much? ", the word segmentation result that is obtained after being segmented by above-mentioned participle model are as follows: and " this year ", " insurance ", " net profit ",
"Yes", " how many ", "? ".
S103, using the word in the word segmentation result as input, using default term vector model to the text data
Word segmentation result be trained, obtain output as a result, the output result includes the corresponding vector expression of each word.
In embodiments of the present invention, the default term vector model refers to being based on word2vec deep learning model,
In the present embodiment, specific training process is to utilize the word2vec deep learning model in the Gensim in Python kit
The word segmentation result of the text data is trained, using the word in the word segmentation result as input, with term vector training
As a result as output, the term vector result includes that the corresponding vector of each word indicates.
Further, as shown in figure 4, the step S103 includes step S301~S302:
The word segmentation result of the text data is input in Python kit Gensim by S301;
S302, using in Python kit Gensim based on word2vec deep learning model to the text data
Word segmentation result be trained, to obtain the output result.
In the present embodiment, using the Gensim in Python kit and to the word2vec deep learning in kit
Model carries out following parameter setting:
After the completion of being trained by the word2vec model in Python kit Gensim, vectors.bin is obtained
This file includes each word and the corresponding term vector of each word of text data in vectors.bin, in the present embodiment
In, the dimension of term vector is preset using the size parameter in Python kit Gensim.
Term vector training result is input to the neural network mould for being used for natural language classification that training in advance obtains by S104
Type obtains the classification results for natural language data.
In embodiments of the present invention, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow
The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer
Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
It should be noted that needing neural network mould of the training for natural language classification in advance before step S104
Type, training process are as follows: in the screening model for being used for part-of-speech tagging that the input of history term vector data is constructed in advance, obtaining needle
Part of speech probability corresponding to each history term vector is set in advance if the corresponding part of speech probability of each history term vector is greater than or equal to
Corresponding history term vector is labeled as the term vector of target part of speech by the first probability set, if the corresponding part of speech of each term vector
Probability is greater than or equal to pre-set second probability, and corresponding history term vector is labeled as to the term vector of condition part of speech, if
The corresponding part of speech probability of each term vector is greater than or equal to pre-set third probability, and corresponding history term vector is labeled as
The term vector of time part of speech;More specifically, model is carried out to history term vector according to NB Algorithm in the present embodiment
The constructed screening model of training;The screening model is used to judge that the term vector of input to be the vector of target part of speech, condition word
The term vector of property or the term vector of time part of speech.
Wherein, it when building is used for the screening model of part-of-speech tagging, needs multiple term vectors included in training set
As the input of screening model, and using the corresponding part of speech of each term vector as the output of screening model, it is trained and is sieved
Modeling type.The model of the NB Algorithm of use is as follows:
Wherein,
NckIndicate c in training setkThe number of class document, N indicate term vector sum in training set;TjkIndicate lexical item tjIn class
The number occurred in other ck, V are the lexical item sets of all categories.Classification by above-mentioned screening model as the part of speech of term vector
Device can judge that the term vector of input is the term vector of the vector of target part of speech, the term vector of condition part of speech or time part of speech.
For example, each term vector is input in the model of NB Algorithm, when the data appear in the vector class of target part of speech
Other probability is greater than or equal to the vector that the data can be then considered as target part of speech by 50% (i.e. the first probability is set as 50%);When
The corresponding part of speech probability of term vector is greater than or equal to 50% (i.e. the second probability setting in the probability of the term vector classification of condition part of speech
50%), term vector to be labeled as to the term vector of condition part of speech;When the corresponding part of speech probability of term vector time part of speech word to
The probability for measuring classification is greater than or equal to 50% (i.e. third probability is set as 50%), by term vector be labeled as the word of time part of speech to
Amount.
Using the term vector result after progress part-of-speech tagging as the input of neural network, and corresponding term vector is classified and is tied
Output of the fruit as Recognition with Recurrent Neural Network, is trained to obtain neural network model, by the way that history term vector is carried out part of speech mark
Input of the word segmentation result as neural network after note, and using corresponding term vector classification results as the defeated of Recognition with Recurrent Neural Network
Out, can train obtain the first weight matrix, the second weight matrix and, neural network model obtains conduct in this way
The model of subsequent term vector classification.After obtaining preparatory trained neural network model, by the term vector training knot of user
Fruit is input in the neural network model that training obtains in advance, is carried out according to term vector of the preset neural network model to user
Quick and intelligentized classification.For example, for text data " this year insurance net profit be how many? ", by participle and vector
The term vector of 6 dimensions is input to preparatory trained neural network model by the term vector that 6 dimensions have been obtained after expression
Later, then the classification results exported are Account- net profit (Account expression belongs to target word), Entity- life insurance
(Entity condition word), NTR- this year (time word).
As seen from the above, the embodiment of the present invention is by the natural language data of acquisition user's input, and by the natural language
Say data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained,
The word segmentation result includes one or more word;Using the word in the word segmentation result as input, using default word to
Amount model is trained the word segmentation result of the text data, obtains output as a result, the output result includes each word
Corresponding vector indicates;Term vector training result is input to the neural network for being used for natural language classification that training in advance obtains
Model obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification side
Method can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body
It tests.
Referring to Fig. 5, a kind of corresponding above-mentioned natural language classification method, the embodiment of the present invention also propose a kind of natural language
Sorter, the device 100 include: converting unit 101, participle unit 102, training unit 103, taxon 104.
Wherein, converting unit 101, for acquiring the natural language data of user's input, and by the natural language data
It is converted into corresponding text data;
Participle unit 102 obtains the word segmentation result of the text data, institute for segmenting the text data
Stating word segmentation result includes one or more word;
Training unit 103, for using the word in the word segmentation result as input, using default term vector model to institute
The word segmentation result for stating text data is trained, and obtains output as a result, the output result includes the corresponding vector of each word
It indicates;
Taxon 104 is classified for term vector training result to be input to the natural language that is used for that training in advance obtains
Neural network model, obtain the classification results for natural language data.
As seen from the above, the embodiment of the present invention is by the natural language data of acquisition user's input, and by the natural language
Say data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained,
The word segmentation result includes one or more word;Using the word in the word segmentation result as input, using default word to
Amount model is trained the word segmentation result of the text data, obtains output as a result, the output result includes each word
Corresponding vector indicates;Term vector training result is input to the neural network for being used for natural language classification that training in advance obtains
Model obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification side
Method can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body
It tests.
Referring to Fig. 6, the converting unit 101, comprising:
Acquisition unit 101a, for the natural language data using microphone acquisition user's input;
Processing unit 101b, for natural language data progress digitized processing to be obtained voice signal;
Extraction unit 101c, for extracting the acoustic feature of the voice signal;
Generation unit 101d is decoded for the acoustic feature to be input to predetermined acoustic model, described in generating
Text data.
Referring to Fig. 7, the participle unit 102, comprising:
Subelement 102a is segmented, for using the segmenting method based on probability statistics model to divide the text data
Word.
Referring to Fig. 8, the training unit 103, comprising:
Input unit 103a, for the word segmentation result of the text data to be input in Python kit Gensim;
Training subelement 103b, for using in Python kit Gensim based on word2vec deep learning model
The word segmentation result of the text data is trained, to obtain the output result.
Above-mentioned natural language sorter and above-mentioned natural language classification method correspond, specific principle and process
It is identical as above-described embodiment the method, it repeats no more.
Above-mentioned natural language sorter can be implemented as a kind of form of computer program, and computer program can be such as
It is run in computer equipment shown in Fig. 9.
Fig. 9 is a kind of structure composition schematic diagram of computer equipment of the present invention.The equipment can be terminal, be also possible to take
Business device, wherein terminal can be smart phone, tablet computer, laptop, desktop computer, personal digital assistant and wearing
Formula device etc. has the electronic device of communication function and speech voice input function.Server can be independent server, can also be with
It is the server cluster of multiple server compositions.Referring to Fig. 9, which includes being connected by system bus 501
Processor 502, non-volatile memory medium 503, built-in storage 504 and network interface 505.Wherein, the computer equipment 500
Non-volatile memory medium 503 can storage program area 5031 and computer program 5032, which is performed
When, it may make processor 502 to execute a kind of natural language classification method.The processor 502 of the computer equipment 500 is for providing
Calculating and control ability, support the operation of entire computer equipment 500.The built-in storage 504 is non-volatile memory medium 503
In computer program 5032 operation provide environment, when which is executed by processor, processor 502 may make to hold
A kind of natural language classification method of row.The network interface 505 of computer equipment 500 is for carrying out network communication.Art technology
Personnel are appreciated that structure shown in Fig. 9, and only the block diagram of part-structure relevant to application scheme, is not constituted
Restriction to the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure
More or fewer components perhaps combine certain components or with different component layouts.
Wherein, following operation is realized when the processor 502 executes the computer program:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one
A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data
As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained
To the classification results for being directed to natural language data.
In one embodiment, the natural language data of the acquisition user input, and the natural language data are turned
Change corresponding text data into, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
It is in one embodiment, described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
In one embodiment, described to be instructed the word segmentation result of the text data using default term vector model
Practice, obtain term vector training result, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Utilize dividing based on word2vec deep learning model the text data in Python kit Gensim
Word result is trained, to obtain the output result.
In one embodiment, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow
The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer
Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
It will be understood by those skilled in the art that the embodiment of computer equipment shown in Fig. 9 is not constituted to computer
The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or
Person combines certain components or different component layouts.For example, in some embodiments, computer equipment only includes memory
And processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 9, herein
It repeats no more.
The present invention provides a kind of computer readable storage medium, computer-readable recording medium storage has one or one
A above computer program, the one or more computer program can be held by one or more than one processor
Row, to perform the steps of
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one
A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data
As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained
To the classification results for being directed to natural language data.
In one embodiment, the natural language data of the acquisition user input, and the natural language data are turned
Change corresponding text data into, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
It is in one embodiment, described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
In one embodiment, described to be instructed the word segmentation result of the text data using default term vector model
Practice, obtain term vector training result, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Utilize dividing based on word2vec deep learning model the text data in Python kit Gensim
Word result is trained, to obtain the output result.
In one embodiment, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow
The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer
Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
Present invention storage medium above-mentioned include: magnetic disk, CD, read-only memory (Read-Only Memory,
The various media that can store program code such as ROM).
Unit in all embodiments of the invention can pass through universal integrated circuit, such as CPU (Central
Processing Unit, central processing unit), or pass through ASIC (Application Specific Integrated
Circuit, specific integrated circuit) Lai Shixian.
Step in natural language classification method of the embodiment of the present invention can the adjustment of carry out sequence, merging according to actual needs
With delete.
Unit in natural language sorter of the embodiment of the present invention can be merged according to actual needs, divides and be deleted
Subtract.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of natural language classification method, which is characterized in that the described method includes:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, obtains the word segmentation result of the text data, the word segmentation result include one or
The multiple words of person;
Using the word in the word segmentation result as input, using default term vector model to the word segmentation result of the text data
It is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, obtains needle
To the classification results of natural language data.
2. the method as described in claim 1, which is characterized in that the natural language data of the acquisition user input, and by institute
Natural language data conversion is stated into corresponding text data, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
3. the method as described in claim 1, which is characterized in that described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
4. the method as described in claim 1, which is characterized in that the word using in the word segmentation result makes as input
The word segmentation result of the text data is trained with default term vector model, obtains output as a result, the output result packet
Including the corresponding vector of each word indicates, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Using in Python kit Gensim based on word2vec deep learning model to the participle knot of the text data
Fruit is trained, to obtain the output result.
5. the method as described in claim 1, which is characterized in that the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is circulation mind
Value through network output layer, U are first weight matrix of the input layer to hidden layer, and V is second weight of the hidden layer to output layer
Matrix, g () are nonlinear activation primitive, and f () is softmax function.
6. a kind of natural language sorter, which is characterized in that described device includes:
Converting unit, for acquiring the natural language data of user's input, and by the natural language data conversion at corresponding
Text data;
Participle unit obtains the word segmentation result of the text data, the participle knot for segmenting the text data
Fruit includes one or more word;
Training unit, for using the word in the word segmentation result as input, using default term vector model to the text
The word segmentation result of data is trained, and obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Taxon, for term vector training result to be input to the nerve net for being used for natural language classification that training in advance obtains
Network model obtains the classification results for natural language data.
7. device as claimed in claim 6, which is characterized in that the converting unit, comprising:
Acquisition unit, for the natural language data using microphone acquisition user's input;
Processing unit, for natural language data progress digitized processing to be obtained voice signal;
Extraction unit, for extracting the acoustic feature of the voice signal;
Generation unit is decoded for the acoustic feature to be input to predetermined acoustic model, to generate the text data.
8. device as claimed in claim 6, which is characterized in that the participle unit, comprising:
Subelement is segmented, for using the segmenting method based on probability statistics model to segment the text data.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes that claim 1-5 such as appoints when executing the computer program
Natural language classification method described in one.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage have one or
More than one computer program, the one or more computer program can be by one or more than one processors
It executes, to realize natural language classification method as described in any one in claim 1-5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910449416.4A CN110334110A (en) | 2019-05-28 | 2019-05-28 | Natural language classification method, device, computer equipment and storage medium |
PCT/CN2019/118236 WO2020238061A1 (en) | 2019-05-28 | 2019-11-14 | Natural language classification method and apparatus, computer device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910449416.4A CN110334110A (en) | 2019-05-28 | 2019-05-28 | Natural language classification method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334110A true CN110334110A (en) | 2019-10-15 |
Family
ID=68140162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910449416.4A Pending CN110334110A (en) | 2019-05-28 | 2019-05-28 | Natural language classification method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110334110A (en) |
WO (1) | WO2020238061A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177370A (en) * | 2019-12-03 | 2020-05-19 | 北京工商大学 | Algorithm for natural language processing |
CN111191449A (en) * | 2019-12-26 | 2020-05-22 | 航天信息股份有限公司 | Tax feedback information processing method and device |
CN111209297A (en) * | 2019-12-31 | 2020-05-29 | 深圳云天励飞技术有限公司 | Data query method and device, electronic equipment and storage medium |
CN112000803A (en) * | 2020-07-28 | 2020-11-27 | 北京小米松果电子有限公司 | Text classification method and device, electronic equipment and computer readable storage medium |
WO2020238061A1 (en) * | 2019-05-28 | 2020-12-03 | 平安科技(深圳)有限公司 | Natural language classification method and apparatus, computer device, and storage medium |
CN112350908A (en) * | 2020-11-10 | 2021-02-09 | 珠海格力电器股份有限公司 | Control method and device of intelligent household equipment |
CN113283232A (en) * | 2021-05-31 | 2021-08-20 | 支付宝(杭州)信息技术有限公司 | Method and device for automatically analyzing private information in text |
CN111209297B (en) * | 2019-12-31 | 2024-05-03 | 深圳云天励飞技术有限公司 | Data query method, device, electronic equipment and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112735376A (en) * | 2020-12-29 | 2021-04-30 | 竹间智能科技(上海)有限公司 | Self-learning platform |
CN113051875B (en) * | 2021-03-22 | 2024-02-02 | 北京百度网讯科技有限公司 | Training method of information conversion model, and text information conversion method and device |
CN113360602A (en) * | 2021-06-22 | 2021-09-07 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for outputting information |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106649561A (en) * | 2016-11-10 | 2017-05-10 | 复旦大学 | Intelligent question-answering system for tax consultation service |
CN107229684A (en) * | 2017-05-11 | 2017-10-03 | 合肥美的智能科技有限公司 | Statement classification method, system, electronic equipment, refrigerator and storage medium |
CN108124065A (en) * | 2017-12-05 | 2018-06-05 | 浙江鹏信信息科技股份有限公司 | A kind of method junk call content being identified with disposal |
CN109471937A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of file classification method and terminal device based on machine learning |
CN109492157A (en) * | 2018-10-24 | 2019-03-19 | 华侨大学 | Based on RNN, the news recommended method of attention mechanism and theme characterizing method |
US20190088251A1 (en) * | 2017-09-18 | 2019-03-21 | Samsung Electronics Co., Ltd. | Speech signal recognition system and method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503236B (en) * | 2016-10-28 | 2020-09-11 | 北京百度网讯科技有限公司 | Artificial intelligence based problem classification method and device |
CN109101481B (en) * | 2018-06-25 | 2022-07-22 | 北京奇艺世纪科技有限公司 | Named entity identification method and device and electronic equipment |
CN110334110A (en) * | 2019-05-28 | 2019-10-15 | 平安科技(深圳)有限公司 | Natural language classification method, device, computer equipment and storage medium |
-
2019
- 2019-05-28 CN CN201910449416.4A patent/CN110334110A/en active Pending
- 2019-11-14 WO PCT/CN2019/118236 patent/WO2020238061A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106649561A (en) * | 2016-11-10 | 2017-05-10 | 复旦大学 | Intelligent question-answering system for tax consultation service |
CN107229684A (en) * | 2017-05-11 | 2017-10-03 | 合肥美的智能科技有限公司 | Statement classification method, system, electronic equipment, refrigerator and storage medium |
US20190088251A1 (en) * | 2017-09-18 | 2019-03-21 | Samsung Electronics Co., Ltd. | Speech signal recognition system and method |
CN108124065A (en) * | 2017-12-05 | 2018-06-05 | 浙江鹏信信息科技股份有限公司 | A kind of method junk call content being identified with disposal |
CN109471937A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of file classification method and terminal device based on machine learning |
CN109492157A (en) * | 2018-10-24 | 2019-03-19 | 华侨大学 | Based on RNN, the news recommended method of attention mechanism and theme characterizing method |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020238061A1 (en) * | 2019-05-28 | 2020-12-03 | 平安科技(深圳)有限公司 | Natural language classification method and apparatus, computer device, and storage medium |
CN111177370A (en) * | 2019-12-03 | 2020-05-19 | 北京工商大学 | Algorithm for natural language processing |
CN111177370B (en) * | 2019-12-03 | 2023-08-11 | 北京工商大学 | Algorithm for natural language processing |
CN111191449A (en) * | 2019-12-26 | 2020-05-22 | 航天信息股份有限公司 | Tax feedback information processing method and device |
CN111209297A (en) * | 2019-12-31 | 2020-05-29 | 深圳云天励飞技术有限公司 | Data query method and device, electronic equipment and storage medium |
CN111209297B (en) * | 2019-12-31 | 2024-05-03 | 深圳云天励飞技术有限公司 | Data query method, device, electronic equipment and storage medium |
CN112000803A (en) * | 2020-07-28 | 2020-11-27 | 北京小米松果电子有限公司 | Text classification method and device, electronic equipment and computer readable storage medium |
CN112350908A (en) * | 2020-11-10 | 2021-02-09 | 珠海格力电器股份有限公司 | Control method and device of intelligent household equipment |
CN112350908B (en) * | 2020-11-10 | 2021-11-23 | 珠海格力电器股份有限公司 | Control method and device of intelligent household equipment |
CN113283232A (en) * | 2021-05-31 | 2021-08-20 | 支付宝(杭州)信息技术有限公司 | Method and device for automatically analyzing private information in text |
Also Published As
Publication number | Publication date |
---|---|
WO2020238061A1 (en) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110334110A (en) | Natural language classification method, device, computer equipment and storage medium | |
WO2020232861A1 (en) | Named entity recognition method, electronic device and storage medium | |
JP7302022B2 (en) | A text classification method, apparatus, computer readable storage medium and text classification program. | |
US11321363B2 (en) | Method and system for extracting information from graphs | |
CN104485105B (en) | A kind of electronic health record generation method and electronic medical record system | |
CN109460737A (en) | A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network | |
CN107220235A (en) | Speech recognition error correction method, device and storage medium based on artificial intelligence | |
CN111222305A (en) | Information structuring method and device | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
CN110363084A (en) | A kind of class state detection method, device, storage medium and electronics | |
CN107967250B (en) | Information processing method and device | |
CN111274797A (en) | Intention recognition method, device and equipment for terminal and storage medium | |
CN113032552B (en) | Text abstract-based policy key point extraction method and system | |
Tank et al. | Creation of speech corpus for emotion analysis in Gujarati language and its evaluation by various speech parameters. | |
CN105159927B (en) | Method and device for selecting subject term of target text and terminal | |
CN110309355A (en) | Generation method, device, equipment and the storage medium of content tab | |
CN110019556A (en) | A kind of topic news acquisition methods, device and its equipment | |
CN117313138A (en) | Social network privacy sensing system and method based on NLP | |
CN115169368B (en) | Machine reading understanding method and device based on multiple documents | |
CN110347696A (en) | Data transfer device, device, computer equipment and storage medium | |
CN110781327A (en) | Image searching method and device, terminal equipment and storage medium | |
CN113808577A (en) | Intelligent extraction method and device of voice abstract, electronic equipment and storage medium | |
CN113539234A (en) | Speech synthesis method, apparatus, system and storage medium | |
JP2000148770A (en) | Device and method for classifying question documents and record medium where program wherein same method is described is recorded | |
KR20220015129A (en) | Method and Apparatus for Providing Book Recommendation Service Based on Interactive Form |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |