CN110334110A - Natural language classification method, device, computer equipment and storage medium - Google Patents

Natural language classification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110334110A
CN110334110A CN201910449416.4A CN201910449416A CN110334110A CN 110334110 A CN110334110 A CN 110334110A CN 201910449416 A CN201910449416 A CN 201910449416A CN 110334110 A CN110334110 A CN 110334110A
Authority
CN
China
Prior art keywords
natural language
text data
input
word
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910449416.4A
Other languages
Chinese (zh)
Inventor
周罡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910449416.4A priority Critical patent/CN110334110A/en
Publication of CN110334110A publication Critical patent/CN110334110A/en
Priority to PCT/CN2019/118236 priority patent/WO2020238061A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention discloses a kind of natural language classification method, device, computer equipment and storage mediums, wherein the described method includes: the natural language data of acquisition user's input, and by the natural language data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one or more word;Using the word in word segmentation result as input, the word segmentation result of text data is trained using default term vector model, obtains output as a result, the output result includes that the corresponding vector of each word indicates;Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification method, can carry out Accurate classification to Natural Query, provide diversification data base querying mode, improve the usage experience of user.

Description

Natural language classification method, device, computer equipment and storage medium
Technical field
The present invention relates to field of computer technology more particularly to a kind of natural language classification methods, device, computer equipment And storage medium.
Background technique
Currently, the Natural Query of human oral, which is converted into computer, can identify query statement, typically will Natural Query is converted into specific computer inquery sentence, this result in some databases can not identify converted it is specific Computer inquery sentence, for example, the computer inquery sentence being currently converted into is SQL (Structured Query Language, referred to as: structured query language) query statement, then can identify that SQL is looked into for relevant database Sentence is ask, but for chart database, can not just identify SQL query statement, therefore, traditional Natural Query conversion It is impossible to meet the market demands for mode.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of natural language classification method, device, computer equipment and storage Medium can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body It tests.
On the one hand, the embodiment of the invention provides a kind of natural language classification methods, this method comprises:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained To the classification results for being directed to natural language data.
On the other hand, the embodiment of the invention provides a kind of natural language sorter, described device includes:
Converting unit, for acquiring the natural language data of user's input, and the natural language data conversion is pairs of The text data answered;
Participle unit obtains the word segmentation result of the text data for segmenting the text data, and described point Word result includes one or more word;
Training unit, for using the word in the word segmentation result as input, using default term vector model to described The word segmentation result of text data is trained, and obtains output as a result, the output result includes the corresponding vector table of each word Show;
Taxon, for term vector training result to be input to the mind for being used for natural language classification that training in advance obtains Through network model, the classification results for natural language data are obtained.
Another aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in On the memory and the computer program that can run on the processor, when the processor executes the computer program Realize natural language classification method as described above.
It is described computer-readable to deposit in another aspect, the embodiment of the invention also provides a kind of computer readable storage medium Storage media is stored with one or more than one computer program, and the one or more computer program can be by one Or more than one processor executes, to realize natural language classification method as described above.
The embodiment of the present invention provides a kind of natural language classification method, device, computer equipment and storage medium, wherein Method includes: the natural language data for acquiring user and inputting, and by the natural language data conversion at corresponding text data; The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result is including one or more A word;Using the word in the word segmentation result as input, using default term vector model to the participle of the text data As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;By term vector training As a result it is input to the neural network model for natural language classification that training in advance obtains, is obtained for natural language data Classification results.The present invention is based on detection models to provide a kind of natural language classification method, can carry out to Natural Query quasi- Really classification, provides diversification data base querying mode, improves the usage experience of user.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of application scenarios schematic diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 3 is a kind of another schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 4 is a kind of another schematic flow diagram of natural language classification method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 6 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 7 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 8 is a kind of another schematic block diagram of natural language sorter provided in an embodiment of the present invention;
Fig. 9 is a kind of structure composition schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the present invention illustrate in mountain and the appended claims used in term "and/or" Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Fig. 1 and Fig. 2 are please referred to, Fig. 1 is a kind of application scenarios of natural language classification method provided in an embodiment of the present invention Schematic diagram, Fig. 2 are a kind of flow diagram of natural language classification method provided in an embodiment of the present invention.Natural language classification Method is applied in server or terminal, wherein terminal can be smart phone, tablet computer, laptop, desktop computer, Personal digital assistant and wearable device etc. have the electronic equipment of communication function.As an application, as shown in Figure 1, the nature Language classification method is applied in server 10, which can be a server in Distributed Services platform, should Server 10 executes natural language sort instructions, and by implementing result feedback in terminal 20.
It should be noted that only illustrate a terminal 20 in Fig. 1, in the actual operation process, server 10 can be with Implementing result is fed back in more terminals 20.
Referring to Fig. 2, Fig. 2 is a kind of schematic flow diagram of natural language classification method provided in an embodiment of the present invention.Such as Shown in Fig. 2, this approach includes the following steps S101~S104.
S101, the natural language data of acquisition user's input, and by the natural language data conversion at corresponding text Data.
In embodiments of the present invention, the natural language data refer to the natural language retrieval for database of user's oral account Language, such as: user oral account Natural Query are as follows: " this year insurance net profit be how many? ", more specifically, Ke Yitong The natural language data of microphone acquisition user's input in terminal are crossed, and by natural language data conversion collected at corresponding Text data.
Further, as shown in figure 3, it is described by the natural language data conversion at corresponding text data the step of, Specifically include step S201~S204:
S201 utilizes the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained voice signal by S202;
S203 extracts the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model and is decoded by S204, to generate the text data.
In the same embodiment, by the natural language data conversion at corresponding text data, due to natural language Data are voice signal, and voice messaging belongs to analog signal, it is therefore desirable to handle the voice signal of simulation, be counted Word extracts the acoustic feature of voice signal.Wherein, such as mel-frequency cepstrum coefficient MFCC, linear prediction cepstrum coefficient can be used The methods of coefficient LPCC, Multimedia Content Description Interface MPEG7 extract acoustic feature.Then, acoustic feature can be input to Acoustic model is decoded, to obtain text data corresponding to voice signal.Namely the natural language data are turned Change the process of corresponding text data into.
S102 segments the text data, obtains the word segmentation result of the text data, the word segmentation result packet Include one or more word.
In embodiments of the present invention, described to segment the text data, comprising: using based on probability statistics model Segmenting method the text data is segmented.For example, enabling C=C1C2...Cm, C is that text data to be segmented is corresponding Chinese character string, enable W=W1W2...Wn, W be participle as a result, Wa, Wb ..., Wk are all possible participle schemes of C.That , the participle model based on probability statistics is to find purpose word string W, so that W meets: P (W | C)=MAX (P (Wa | C), P (Wb | C) ... P (Wk | C)) participle model, the word string W i.e. estimated probability that above-mentioned participle model obtains is the word string of maximum, and Using word string W as the word segmentation result obtained after text data participle.Such as: " this year, the net profit of insurance was text data How much? ", the word segmentation result that is obtained after being segmented by above-mentioned participle model are as follows: and " this year ", " insurance ", " net profit ", "Yes", " how many ", "? ".
S103, using the word in the word segmentation result as input, using default term vector model to the text data Word segmentation result be trained, obtain output as a result, the output result includes the corresponding vector expression of each word.
In embodiments of the present invention, the default term vector model refers to being based on word2vec deep learning model, In the present embodiment, specific training process is to utilize the word2vec deep learning model in the Gensim in Python kit The word segmentation result of the text data is trained, using the word in the word segmentation result as input, with term vector training As a result as output, the term vector result includes that the corresponding vector of each word indicates.
Further, as shown in figure 4, the step S103 includes step S301~S302:
The word segmentation result of the text data is input in Python kit Gensim by S301;
S302, using in Python kit Gensim based on word2vec deep learning model to the text data Word segmentation result be trained, to obtain the output result.
In the present embodiment, using the Gensim in Python kit and to the word2vec deep learning in kit Model carries out following parameter setting:
After the completion of being trained by the word2vec model in Python kit Gensim, vectors.bin is obtained This file includes each word and the corresponding term vector of each word of text data in vectors.bin, in the present embodiment In, the dimension of term vector is preset using the size parameter in Python kit Gensim.
Term vector training result is input to the neural network mould for being used for natural language classification that training in advance obtains by S104 Type obtains the classification results for natural language data.
In embodiments of the present invention, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
It should be noted that needing neural network mould of the training for natural language classification in advance before step S104 Type, training process are as follows: in the screening model for being used for part-of-speech tagging that the input of history term vector data is constructed in advance, obtaining needle Part of speech probability corresponding to each history term vector is set in advance if the corresponding part of speech probability of each history term vector is greater than or equal to Corresponding history term vector is labeled as the term vector of target part of speech by the first probability set, if the corresponding part of speech of each term vector Probability is greater than or equal to pre-set second probability, and corresponding history term vector is labeled as to the term vector of condition part of speech, if The corresponding part of speech probability of each term vector is greater than or equal to pre-set third probability, and corresponding history term vector is labeled as The term vector of time part of speech;More specifically, model is carried out to history term vector according to NB Algorithm in the present embodiment The constructed screening model of training;The screening model is used to judge that the term vector of input to be the vector of target part of speech, condition word The term vector of property or the term vector of time part of speech.
Wherein, it when building is used for the screening model of part-of-speech tagging, needs multiple term vectors included in training set As the input of screening model, and using the corresponding part of speech of each term vector as the output of screening model, it is trained and is sieved Modeling type.The model of the NB Algorithm of use is as follows:
Wherein,
NckIndicate c in training setkThe number of class document, N indicate term vector sum in training set;TjkIndicate lexical item tjIn class The number occurred in other ck, V are the lexical item sets of all categories.Classification by above-mentioned screening model as the part of speech of term vector Device can judge that the term vector of input is the term vector of the vector of target part of speech, the term vector of condition part of speech or time part of speech. For example, each term vector is input in the model of NB Algorithm, when the data appear in the vector class of target part of speech Other probability is greater than or equal to the vector that the data can be then considered as target part of speech by 50% (i.e. the first probability is set as 50%);When The corresponding part of speech probability of term vector is greater than or equal to 50% (i.e. the second probability setting in the probability of the term vector classification of condition part of speech 50%), term vector to be labeled as to the term vector of condition part of speech;When the corresponding part of speech probability of term vector time part of speech word to The probability for measuring classification is greater than or equal to 50% (i.e. third probability is set as 50%), by term vector be labeled as the word of time part of speech to Amount.
Using the term vector result after progress part-of-speech tagging as the input of neural network, and corresponding term vector is classified and is tied Output of the fruit as Recognition with Recurrent Neural Network, is trained to obtain neural network model, by the way that history term vector is carried out part of speech mark Input of the word segmentation result as neural network after note, and using corresponding term vector classification results as the defeated of Recognition with Recurrent Neural Network Out, can train obtain the first weight matrix, the second weight matrix and, neural network model obtains conduct in this way The model of subsequent term vector classification.After obtaining preparatory trained neural network model, by the term vector training knot of user Fruit is input in the neural network model that training obtains in advance, is carried out according to term vector of the preset neural network model to user Quick and intelligentized classification.For example, for text data " this year insurance net profit be how many? ", by participle and vector The term vector of 6 dimensions is input to preparatory trained neural network model by the term vector that 6 dimensions have been obtained after expression Later, then the classification results exported are Account- net profit (Account expression belongs to target word), Entity- life insurance (Entity condition word), NTR- this year (time word).
As seen from the above, the embodiment of the present invention is by the natural language data of acquisition user's input, and by the natural language Say data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained, The word segmentation result includes one or more word;Using the word in the word segmentation result as input, using default word to Amount model is trained the word segmentation result of the text data, obtains output as a result, the output result includes each word Corresponding vector indicates;Term vector training result is input to the neural network for being used for natural language classification that training in advance obtains Model obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification side Method can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body It tests.
Referring to Fig. 5, a kind of corresponding above-mentioned natural language classification method, the embodiment of the present invention also propose a kind of natural language Sorter, the device 100 include: converting unit 101, participle unit 102, training unit 103, taxon 104.
Wherein, converting unit 101, for acquiring the natural language data of user's input, and by the natural language data It is converted into corresponding text data;
Participle unit 102 obtains the word segmentation result of the text data, institute for segmenting the text data Stating word segmentation result includes one or more word;
Training unit 103, for using the word in the word segmentation result as input, using default term vector model to institute The word segmentation result for stating text data is trained, and obtains output as a result, the output result includes the corresponding vector of each word It indicates;
Taxon 104 is classified for term vector training result to be input to the natural language that is used for that training in advance obtains Neural network model, obtain the classification results for natural language data.
As seen from the above, the embodiment of the present invention is by the natural language data of acquisition user's input, and by the natural language Say data conversion at corresponding text data;The text data is segmented, the word segmentation result of the text data is obtained, The word segmentation result includes one or more word;Using the word in the word segmentation result as input, using default word to Amount model is trained the word segmentation result of the text data, obtains output as a result, the output result includes each word Corresponding vector indicates;Term vector training result is input to the neural network for being used for natural language classification that training in advance obtains Model obtains the classification results for natural language data.The present invention is based on detection models to provide a kind of natural language classification side Method can carry out Accurate classification to Natural Query, provide diversification data base querying mode, and improve user uses body It tests.
Referring to Fig. 6, the converting unit 101, comprising:
Acquisition unit 101a, for the natural language data using microphone acquisition user's input;
Processing unit 101b, for natural language data progress digitized processing to be obtained voice signal;
Extraction unit 101c, for extracting the acoustic feature of the voice signal;
Generation unit 101d is decoded for the acoustic feature to be input to predetermined acoustic model, described in generating Text data.
Referring to Fig. 7, the participle unit 102, comprising:
Subelement 102a is segmented, for using the segmenting method based on probability statistics model to divide the text data Word.
Referring to Fig. 8, the training unit 103, comprising:
Input unit 103a, for the word segmentation result of the text data to be input in Python kit Gensim;
Training subelement 103b, for using in Python kit Gensim based on word2vec deep learning model The word segmentation result of the text data is trained, to obtain the output result.
Above-mentioned natural language sorter and above-mentioned natural language classification method correspond, specific principle and process It is identical as above-described embodiment the method, it repeats no more.
Above-mentioned natural language sorter can be implemented as a kind of form of computer program, and computer program can be such as It is run in computer equipment shown in Fig. 9.
Fig. 9 is a kind of structure composition schematic diagram of computer equipment of the present invention.The equipment can be terminal, be also possible to take Business device, wherein terminal can be smart phone, tablet computer, laptop, desktop computer, personal digital assistant and wearing Formula device etc. has the electronic device of communication function and speech voice input function.Server can be independent server, can also be with It is the server cluster of multiple server compositions.Referring to Fig. 9, which includes being connected by system bus 501 Processor 502, non-volatile memory medium 503, built-in storage 504 and network interface 505.Wherein, the computer equipment 500 Non-volatile memory medium 503 can storage program area 5031 and computer program 5032, which is performed When, it may make processor 502 to execute a kind of natural language classification method.The processor 502 of the computer equipment 500 is for providing Calculating and control ability, support the operation of entire computer equipment 500.The built-in storage 504 is non-volatile memory medium 503 In computer program 5032 operation provide environment, when which is executed by processor, processor 502 may make to hold A kind of natural language classification method of row.The network interface 505 of computer equipment 500 is for carrying out network communication.Art technology Personnel are appreciated that structure shown in Fig. 9, and only the block diagram of part-structure relevant to application scheme, is not constituted Restriction to the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure More or fewer components perhaps combine certain components or with different component layouts.
Wherein, following operation is realized when the processor 502 executes the computer program:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained To the classification results for being directed to natural language data.
In one embodiment, the natural language data of the acquisition user input, and the natural language data are turned Change corresponding text data into, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
It is in one embodiment, described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
In one embodiment, described to be instructed the word segmentation result of the text data using default term vector model Practice, obtain term vector training result, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Utilize dividing based on word2vec deep learning model the text data in Python kit Gensim Word result is trained, to obtain the output result.
In one embodiment, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
It will be understood by those skilled in the art that the embodiment of computer equipment shown in Fig. 9 is not constituted to computer The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or Person combines certain components or different component layouts.For example, in some embodiments, computer equipment only includes memory And processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 9, herein It repeats no more.
The present invention provides a kind of computer readable storage medium, computer-readable recording medium storage has one or one A above computer program, the one or more computer program can be held by one or more than one processor Row, to perform the steps of
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, the word segmentation result of the text data is obtained, the word segmentation result includes one A or multiple words;
Using the word in the word segmentation result as input, using default term vector model to the participle of the text data As a result it is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, is obtained To the classification results for being directed to natural language data.
In one embodiment, the natural language data of the acquisition user input, and the natural language data are turned Change corresponding text data into, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
It is in one embodiment, described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
In one embodiment, described to be instructed the word segmentation result of the text data using default term vector model Practice, obtain term vector training result, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Utilize dividing based on word2vec deep learning model the text data in Python kit Gensim Word result is trained, to obtain the output result.
In one embodiment, the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is to follow The value of ring neural network output layer, U are first weight matrix of the input layer to hidden layer, and V is hidden layer to the second of output layer Weight matrix, g () are nonlinear activation primitive, and f () is softmax function.
Present invention storage medium above-mentioned include: magnetic disk, CD, read-only memory (Read-Only Memory, The various media that can store program code such as ROM).
Unit in all embodiments of the invention can pass through universal integrated circuit, such as CPU (Central Processing Unit, central processing unit), or pass through ASIC (Application Specific Integrated Circuit, specific integrated circuit) Lai Shixian.
Step in natural language classification method of the embodiment of the present invention can the adjustment of carry out sequence, merging according to actual needs With delete.
Unit in natural language sorter of the embodiment of the present invention can be merged according to actual needs, divides and be deleted Subtract.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims (10)

1. a kind of natural language classification method, which is characterized in that the described method includes:
The natural language data of user's input are acquired, and by the natural language data conversion at corresponding text data;
The text data is segmented, obtains the word segmentation result of the text data, the word segmentation result include one or The multiple words of person;
Using the word in the word segmentation result as input, using default term vector model to the word segmentation result of the text data It is trained, obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Term vector training result is input to the neural network model for being used for natural language classification that training in advance obtains, obtains needle To the classification results of natural language data.
2. the method as described in claim 1, which is characterized in that the natural language data of the acquisition user input, and by institute Natural language data conversion is stated into corresponding text data, comprising:
Utilize the natural language data of microphone acquisition user's input;
Natural language data progress digitized processing is obtained into voice signal;
Extract the acoustic feature of the voice signal;
The acoustic feature is input to predetermined acoustic model to be decoded, to generate the text data.
3. the method as described in claim 1, which is characterized in that described to segment the text data, comprising:
The text data is segmented using the segmenting method based on probability statistics model.
4. the method as described in claim 1, which is characterized in that the word using in the word segmentation result makes as input The word segmentation result of the text data is trained with default term vector model, obtains output as a result, the output result packet Including the corresponding vector of each word indicates, comprising:
The word segmentation result of the text data is input in Python kit Gensim;
Using in Python kit Gensim based on word2vec deep learning model to the participle knot of the text data Fruit is trained, to obtain the output result.
5. the method as described in claim 1, which is characterized in that the neural network model are as follows:
Ot=g (VSt)
St=f (UXt+St-1);
Wherein, XtIt is the value of Recognition with Recurrent Neural Network input layer, St、St-1It is the value of Recognition with Recurrent Neural Network hidden layer, OtIt is circulation mind Value through network output layer, U are first weight matrix of the input layer to hidden layer, and V is second weight of the hidden layer to output layer Matrix, g () are nonlinear activation primitive, and f () is softmax function.
6. a kind of natural language sorter, which is characterized in that described device includes:
Converting unit, for acquiring the natural language data of user's input, and by the natural language data conversion at corresponding Text data;
Participle unit obtains the word segmentation result of the text data, the participle knot for segmenting the text data Fruit includes one or more word;
Training unit, for using the word in the word segmentation result as input, using default term vector model to the text The word segmentation result of data is trained, and obtains output as a result, the output result includes that the corresponding vector of each word indicates;
Taxon, for term vector training result to be input to the nerve net for being used for natural language classification that training in advance obtains Network model obtains the classification results for natural language data.
7. device as claimed in claim 6, which is characterized in that the converting unit, comprising:
Acquisition unit, for the natural language data using microphone acquisition user's input;
Processing unit, for natural language data progress digitized processing to be obtained voice signal;
Extraction unit, for extracting the acoustic feature of the voice signal;
Generation unit is decoded for the acoustic feature to be input to predetermined acoustic model, to generate the text data.
8. device as claimed in claim 6, which is characterized in that the participle unit, comprising:
Subelement is segmented, for using the segmenting method based on probability statistics model to segment the text data.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes that claim 1-5 such as appoints when executing the computer program Natural language classification method described in one.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage have one or More than one computer program, the one or more computer program can be by one or more than one processors It executes, to realize natural language classification method as described in any one in claim 1-5.
CN201910449416.4A 2019-05-28 2019-05-28 Natural language classification method, device, computer equipment and storage medium Pending CN110334110A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910449416.4A CN110334110A (en) 2019-05-28 2019-05-28 Natural language classification method, device, computer equipment and storage medium
PCT/CN2019/118236 WO2020238061A1 (en) 2019-05-28 2019-11-14 Natural language classification method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910449416.4A CN110334110A (en) 2019-05-28 2019-05-28 Natural language classification method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110334110A true CN110334110A (en) 2019-10-15

Family

ID=68140162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910449416.4A Pending CN110334110A (en) 2019-05-28 2019-05-28 Natural language classification method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110334110A (en)
WO (1) WO2020238061A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177370A (en) * 2019-12-03 2020-05-19 北京工商大学 Algorithm for natural language processing
CN111191449A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司 Tax feedback information processing method and device
CN111209297A (en) * 2019-12-31 2020-05-29 深圳云天励飞技术有限公司 Data query method and device, electronic equipment and storage medium
CN112000803A (en) * 2020-07-28 2020-11-27 北京小米松果电子有限公司 Text classification method and device, electronic equipment and computer readable storage medium
WO2020238061A1 (en) * 2019-05-28 2020-12-03 平安科技(深圳)有限公司 Natural language classification method and apparatus, computer device, and storage medium
CN112350908A (en) * 2020-11-10 2021-02-09 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN113283232A (en) * 2021-05-31 2021-08-20 支付宝(杭州)信息技术有限公司 Method and device for automatically analyzing private information in text
CN111209297B (en) * 2019-12-31 2024-05-03 深圳云天励飞技术有限公司 Data query method, device, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735376A (en) * 2020-12-29 2021-04-30 竹间智能科技(上海)有限公司 Self-learning platform
CN113051875B (en) * 2021-03-22 2024-02-02 北京百度网讯科技有限公司 Training method of information conversion model, and text information conversion method and device
CN113360602A (en) * 2021-06-22 2021-09-07 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for outputting information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106649561A (en) * 2016-11-10 2017-05-10 复旦大学 Intelligent question-answering system for tax consultation service
CN107229684A (en) * 2017-05-11 2017-10-03 合肥美的智能科技有限公司 Statement classification method, system, electronic equipment, refrigerator and storage medium
CN108124065A (en) * 2017-12-05 2018-06-05 浙江鹏信信息科技股份有限公司 A kind of method junk call content being identified with disposal
CN109471937A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal device based on machine learning
CN109492157A (en) * 2018-10-24 2019-03-19 华侨大学 Based on RNN, the news recommended method of attention mechanism and theme characterizing method
US20190088251A1 (en) * 2017-09-18 2019-03-21 Samsung Electronics Co., Ltd. Speech signal recognition system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503236B (en) * 2016-10-28 2020-09-11 北京百度网讯科技有限公司 Artificial intelligence based problem classification method and device
CN109101481B (en) * 2018-06-25 2022-07-22 北京奇艺世纪科技有限公司 Named entity identification method and device and electronic equipment
CN110334110A (en) * 2019-05-28 2019-10-15 平安科技(深圳)有限公司 Natural language classification method, device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106649561A (en) * 2016-11-10 2017-05-10 复旦大学 Intelligent question-answering system for tax consultation service
CN107229684A (en) * 2017-05-11 2017-10-03 合肥美的智能科技有限公司 Statement classification method, system, electronic equipment, refrigerator and storage medium
US20190088251A1 (en) * 2017-09-18 2019-03-21 Samsung Electronics Co., Ltd. Speech signal recognition system and method
CN108124065A (en) * 2017-12-05 2018-06-05 浙江鹏信信息科技股份有限公司 A kind of method junk call content being identified with disposal
CN109471937A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal device based on machine learning
CN109492157A (en) * 2018-10-24 2019-03-19 华侨大学 Based on RNN, the news recommended method of attention mechanism and theme characterizing method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020238061A1 (en) * 2019-05-28 2020-12-03 平安科技(深圳)有限公司 Natural language classification method and apparatus, computer device, and storage medium
CN111177370A (en) * 2019-12-03 2020-05-19 北京工商大学 Algorithm for natural language processing
CN111177370B (en) * 2019-12-03 2023-08-11 北京工商大学 Algorithm for natural language processing
CN111191449A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司 Tax feedback information processing method and device
CN111209297A (en) * 2019-12-31 2020-05-29 深圳云天励飞技术有限公司 Data query method and device, electronic equipment and storage medium
CN111209297B (en) * 2019-12-31 2024-05-03 深圳云天励飞技术有限公司 Data query method, device, electronic equipment and storage medium
CN112000803A (en) * 2020-07-28 2020-11-27 北京小米松果电子有限公司 Text classification method and device, electronic equipment and computer readable storage medium
CN112350908A (en) * 2020-11-10 2021-02-09 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN112350908B (en) * 2020-11-10 2021-11-23 珠海格力电器股份有限公司 Control method and device of intelligent household equipment
CN113283232A (en) * 2021-05-31 2021-08-20 支付宝(杭州)信息技术有限公司 Method and device for automatically analyzing private information in text

Also Published As

Publication number Publication date
WO2020238061A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN110334110A (en) Natural language classification method, device, computer equipment and storage medium
WO2020232861A1 (en) Named entity recognition method, electronic device and storage medium
JP7302022B2 (en) A text classification method, apparatus, computer readable storage medium and text classification program.
US11321363B2 (en) Method and system for extracting information from graphs
CN104485105B (en) A kind of electronic health record generation method and electronic medical record system
CN109460737A (en) A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network
CN107220235A (en) Speech recognition error correction method, device and storage medium based on artificial intelligence
CN111222305A (en) Information structuring method and device
CN109271493A (en) A kind of language text processing method, device and storage medium
CN110363084A (en) A kind of class state detection method, device, storage medium and electronics
CN107967250B (en) Information processing method and device
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN113032552B (en) Text abstract-based policy key point extraction method and system
Tank et al. Creation of speech corpus for emotion analysis in Gujarati language and its evaluation by various speech parameters.
CN105159927B (en) Method and device for selecting subject term of target text and terminal
CN110309355A (en) Generation method, device, equipment and the storage medium of content tab
CN110019556A (en) A kind of topic news acquisition methods, device and its equipment
CN117313138A (en) Social network privacy sensing system and method based on NLP
CN115169368B (en) Machine reading understanding method and device based on multiple documents
CN110347696A (en) Data transfer device, device, computer equipment and storage medium
CN110781327A (en) Image searching method and device, terminal equipment and storage medium
CN113808577A (en) Intelligent extraction method and device of voice abstract, electronic equipment and storage medium
CN113539234A (en) Speech synthesis method, apparatus, system and storage medium
JP2000148770A (en) Device and method for classifying question documents and record medium where program wherein same method is described is recorded
KR20220015129A (en) Method and Apparatus for Providing Book Recommendation Service Based on Interactive Form

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination