CN108628974A

CN108628974A - Public feelings information sorting technique, device, computer equipment and storage medium

Info

Publication number: CN108628974A
Application number: CN201810380769.9A
Authority: CN
Inventors: 金鑫; 赵媛媛; 杨雨芬
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2018-10-09
Anticipated expiration: 2038-04-25
Also published as: WO2019205318A1; CN108628974B

Abstract

This application involves a kind of public feelings information sorting technique, device, computer equipment and storage mediums.The method includes：Disaggregated model is established, the disaggregated model includes term vector model and multilayer circulation neural network；Public feelings information is obtained, the public feelings information includes multiple sentences；The corresponding sentence vector of multiple sentences is obtained using term vector model training, weight matrix is generated using the corresponding sentence vector of multiple sentences；The corresponding coding of the multiple sentence is obtained, by the multilayer circulation neural network after the coding input to training of multiple sentences；By the multilayer circulation neural network after the training, coding and the weight matrix based on multiple sentences carry out operation, export the classification of multiple sentences；The corresponding classification of the public feelings information is determined according to the classification of multiple sentences.Can effectively it be classified to a large amount of public feelings informations using this method.

Description

Public feelings information sorting technique, device, computer equipment and storage medium

Technical field

This application involves field of computer technology, are set more particularly to a kind of public feelings information sorting technique, device, computer Standby and storage medium.

Background technology

With the development of Internet technology, people can understand focus incident at any time.Usual focus incident can all generate greatly The public feelings information of amount analyzes public feelings information the development trend that can understand fully focus incident.Public feelings information can there are many, For example, microblogging, comment etc..Before analyzing public feelings information, need to carry out proper classification.Usual public feelings information content compared with Short, text size is different.Traditional semantic meaning representation model is difficult effectively to be classified to it.Therefore, how effectively to a large amount of carriages Feelings information, which carries out classification, becomes a technical problem for needing to solve at present.

Invention content

Based on this, it is necessary to which in view of the above technical problems, providing one kind can effectively classify to a large amount of public feelings informations Public feelings information sorting technique, device, computer equipment and storage medium.

A kind of public feelings information sorting technique, the method includes：

Disaggregated model is established, the disaggregated model includes term vector model and multilayer circulation neural network；

Public feelings information is obtained, the public feelings information includes multiple sentences；

Obtain multiple sentences corresponding sentence vector using term vector model training, using the corresponding sentence of multiple sentences to Amount generates weight matrix；

The corresponding coding of the multiple sentence is obtained, by the multilayer circulation after the coding input to training of multiple sentences Neural network；

By the multilayer circulation neural network after the training, coding and the weight matrix based on multiple sentences into Row operation exports the classification of multiple sentences；

The corresponding classification of the public feelings information is determined according to the classification of multiple sentences.

The method further includes in one of the embodiments,：

Training set corresponding with public feelings information is obtained, the training set includes a plurality of sample information, and sample information includes Multiple trained sentences and multiple trained words corresponding with training sentence；

Term vector model is trained by the trained word, obtains the corresponding term vector of the trained word；

The term vector model is trained by multiple trained sentences corresponding term vector, obtains the trained sentence Corresponding sentence vector；

Multilayer circulation neural network is trained by multiple trained sentences corresponding sentence vector, obtains multiple training The corresponding classification of sentence.

In one of the embodiments, it is described using the trained word to term vector model be trained including：

Maximum vocabulary number tag is the first input parameter by the vocabulary quantity for counting training word in multiple trained sentences；

According to the difference of the vocabulary quantity of the trained sentence maximum vocabulary quantity corresponding with the first input parameter, in institute State the preset characters for increasing respective numbers in trained sentence；

The term vector model is trained by training word in multiple trained sentences and the preset characters filled into, Obtain the corresponding term vector of multiple trained words.

In one of the embodiments, it is described by the corresponding term vector of multiple trained sentences to the term vector model into Row training include：

The sentence quantity of training sentence in statistical sample information, is the second input parameter by maximum sentence number tag；

According to the difference of the sentence quantity and the second input parameter of sample information, using preset characters in the sample information The middle sentence for increasing respective numbers；

The term vector model is trained by multiple trained sentences and newly-increased sentence, obtains multiple trained sentences Corresponding sentence vector.

In one of the embodiments, it is described by multiple trained sentences and newly-increased sentence to the term vector model into Row training include：

The corresponding mapped file of the trained sentence is obtained, the corresponding class of trained sentence is had recorded in the mapped file Not；

Sentence vector corresponding to multiple trained sentences and newly-increased sentence generates training weight matrix, the training Weight matrix is corresponding with the sample information after increase sentence quantity；

Using multiple trained sentences, newly-increased sentence and corresponding trained weight matrix, pass through multilayer circulation nerve Network is trained, and exports the corresponding classification of training sentence.

The multilayer circulation neural network nerve includes multiple hidden layers in one of the embodiments,；The utilization is more A trained sentence, newly-increased sentence and corresponding trained weight matrix, packet is trained by the multilayer circulation neural network It includes：

Initial weight matrix to every layer of implicit Layer assignment random vector as hidden layer；

It is arranged between the input layer and first layer hidden layer according to second input parameter and increases sentence quantity The corresponding trained weight matrix of sample information afterwards；

By the coding input of multiple trained corresponding codings of sentence and newly-increased sentence to the multilayer circulation neural network Input layer；

Multilayer hidden layer is trained using the initial weight matrix and training weight matrix, is exported by output layer The corresponding classification of training sentence.

A kind of public feelings information sorter, described device include：

Model building module, for establishing disaggregated model, the disaggregated model includes term vector model and multilayer circulation god Through network；

Data obtaining module, for obtaining public feelings information, the public feelings information includes multiple sentences；

Weight matrix generation module, for obtaining the corresponding sentence vector of multiple sentences, profit using term vector model training Weight matrix is generated with the corresponding sentence vector of multiple sentences；

Sort module, for obtaining the corresponding coding of the multiple sentence, by the coding input of multiple sentences to institute State the multilayer circulation neural network after training；Coding based on multiple sentences of multilayer circulation neural network after the training and The weight matrix carries out operation, exports the classification of multiple sentences；The public feelings information pair is determined according to the classification of multiple sentences The classification answered.

Described device further includes in one of the embodiments,：

First training module, for obtaining training set corresponding with public feelings information, the training set includes a plurality of sample Information, sample information include multiple trained sentences and multiple trained words corresponding with training sentence；Pass through the trained word pair Term vector model is trained, and obtains the corresponding term vector of the trained word；Pass through the corresponding term vector pair of multiple trained sentences The term vector model is trained, and obtains the corresponding sentence vector of the trained sentence；

Second training module, for by the corresponding sentence vector of multiple trained sentences to the progress of multilayer circulation neural network Training, obtains the corresponding classification of multiple trained sentences.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes the step in above-mentioned public feelings information sorting technique embodiment when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step in above-mentioned public feelings information sorting technique embodiment is realized when row.

Above-mentioned public feelings information sorting technique, device, computer equipment and storage medium, when needing to divide public feelings information When class, server can be obtained multiple sentences in public feelings information by term vector model training and obtain corresponding weight vectors, Then the corresponding weight matrix of multiple sentences is generated.Server is refreshing by the multilayer circulation after the coding input to training of multiple sentences Through network, operation is carried out using the coding and weight matrix of multiple sentences by the multilayer circulation neural network after training, it is defeated Go out the classification of each sentence.Server obtains the classification of public feelings information according to the classification of multiple sentences.Due to each The weight vectors of sentence are obtained by term vector model training, and multilayer circulation neural network is the weight for magnanimity sentence What matrix obtained after being trained.By the way that the description of natural language is effectively mapped to vector space, multilayer circulation nerve is improved The convergence efficiency of network improves the accuracy of classifying quality.So as to swash to network a large amount of public feelings information got into Row effectively classification.

Description of the drawings

Fig. 1 is the application scenario diagram of public feelings information sorting technique in one embodiment；

Fig. 2 is the flow diagram of public feelings information sorting technique in one embodiment；

Fig. 3 is the expanded view of 2 layers of Recognition with Recurrent Neural Network in time in one embodiment；

Fig. 4 is the expanded view of 4 layers of Recognition with Recurrent Neural Network in time in one embodiment；

Fig. 5 is the expanded view of 6 layers of Recognition with Recurrent Neural Network in time in one embodiment；

Fig. 6 is that term vector model training and flow the step of multilayer circulation neural metwork training are shown in one embodiment It is intended to；

Fig. 7 is the structure diagram of public feelings information sorter in one embodiment；

Fig. 8 is the internal structure chart of one embodiment Computer equipment.

Specific implementation mode

It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Public feelings information sorting technique provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, it takes Business device 102 passes through network connection with multiple Website servers 104.Wherein, server 102 can be with independent server either The server cluster of multiple servers composition is realized.Server 102 can be according to predeterminated frequency from multiple Website servers 104 In crawl a variety of public feelings informations.Server 102 can identify the sentence of every public feelings information according to punctuation mark.Server 102 In establish disaggregated model, disaggregated model includes term vector model and multilayer circulation neural network.The acquisition of server 102 passes through word The corresponding sentence vector of multiple sentences that vector model is trained, weight matrix is generated using multiple sentence vectors.Server 102 call the multilayer circulation neural network after training, the corresponding coding of sentence are obtained, by the coding input of multiple sentences to training Multilayer circulation neural network afterwards.Multilayer circulation neural network after training using multiple sentences coding and weight matrix into Row operation exports the classification of multiple sentences.Server 102 determines the corresponding classification of public feelings information according to the classification of multiple sentences. It is thus achieved that effectively being classified to a large amount of public feelings information.

In one embodiment, as shown in Fig. 2, providing a kind of public feelings information sorting technique, it is applied to Fig. 1 in this way In server for illustrate, include the following steps：

Step 202, disaggregated model is established, disaggregated model includes term vector model and multilayer circulation neural network.

Disaggregated model can be pre-established in server, disaggregated model includes term vector model and multilayer circulation nerve net Network.Skip-Gram models may be used in term vector model, i.e. the model may be used neural network structure, including input vector, Hidden layer and output layer.It is final result to be exported by the output layer of the model, and final result is in traditional mode One probability distribution.This probability distribution is not particularly suited for multilayer circulation neural network.Therefore, in the present embodiment, only with this The input vector of model and the structure of hidden layer, the weight vectors of multiple words are exported by hidden layer, do not continue to pass through Output layer carries out operation.

Can include multilayer hidden layer in multilayer circulation neural network, hidden layer includes calculating layer forward and to pusher Calculate layer, this be referred to as be two-way reckoning hidden layer.The hidden layer of first layer include first forward calculate layer and first to After calculate layer, the hidden layer of the second layer includes second calculating that layer and second calculates that layer, the hidden layer of third layer include backward forward Third calculates that forward layer and third calculate layer backward, and so on.The hidden layer of first layer can also be referred to as the first hidden layer, And so on.Be provided with corresponding weight matrix between input layer and the hidden layer of first layer, i.e., input layer with first to being pushed forward It calculates and is respectively provided with corresponding weight matrix between layer and input layer and first backward reckoning layer.

Step 204, public feelings information is obtained, public feelings information includes multiple sentences.

Server can crawl a variety of public feelings informations according to predeterminated frequency from multiple websites.The type of public feelings information can be with It is a variety of including sport, finance and economics, amusement, education etc..It may include multiple sentences in every public feelings information, wrapped again in each sentence Include multiple words.Server can identify the sentence of every public feelings information according to punctuation mark.Server can also be to each sentence Word segmentation processing is carried out, the word in each sentence is obtained.

Step 206, the corresponding sentence vector of multiple sentences is obtained using term vector model training, is corresponded to using multiple sentences Sentence vector generate weight matrix.

In traditional mode, first calculates that forward layer and first calculates that the weight matrix corresponding to layer is initial backward Turn to random vector, but this may result in multilayer circulation neural network convergence effect it is poor, output result, which cannot be satisfied, to be wanted It asks.

In the present embodiment, server uses the corresponding weight matrix of multiple sentences as defeated in multilayer circulation neural network Enter the weight matrix between layer and the first hidden layer.The weight matrix is by obtaining term vector model training.It can incite somebody to action The description of natural language effectively maps to vector space, the convergence efficiency of multilayer circulation neural network is improved, so as to improve The accuracy of output effect.

Wherein, first calculate that layer and first calculates that the weight matrix corresponding to layer is different backward forward.Server is pressed The weight vectors of corresponding each sentence can be obtained according to the description order of public feelings information, the corresponding weight vectors of each sentence can be with It is a vectorial array.The corresponding weight vectors of the multiple sentences of server by utilizing generate the corresponding weight matrix calculated forward. Server can obtain the weight vectors of corresponding each sentence according to the opposite description order of multiple sentences in public feelings information again, Generate the corresponding weight matrix calculated backward of multiple sentences.The weight matrix calculated forward is in multilayer circulation neural network Input layer and first forward calculate layer between weight matrix.The weight matrix calculated backward is in multilayer circulation neural network Input layer and first backward calculate layer between weight matrix.

Using public feelings information as microblogging for example, public sentiment can be that " the Pingchang winter, Austria just terminated, and Winter Olympic Games comes into north The capital time.2022 Beijing winter Austria refuel.China refuels." server can be according to " the Pingchang winter, Austria just terminated, and Winter Olympic Games is Into Beijing time ", " 2022 Beijing winters Austria refuel ", " China refuels " positive description order, generate the weight calculated forward Matrix.Server can also according to " China refuel ", " 2022 Beijing winters Austria refuel ", " the Pingchang winter, Austria just terminated, and Winter Olympic Games is Through entering Beijing time " reversed description order, generate the weight matrix that calculates backward.

Step 208, the corresponding coding of sentence is obtained, by the multilayer circulation after the coding input to training of multiple sentences Neural network；Multilayer circulation neural network after training carries out operation, output using the coding and weight matrix of multiple sentences The classification of multiple sentences.

Step 210, the corresponding classification of public feelings information is determined according to the classification of multiple sentences.

Multilayer hidden layer in multilayer circulation neural network can be 2 layers, 4 layers or 6 layers etc..Wherein, each layer is implicit Layer is all including calculating forward layer and calculating layer backward.As in Figure 3-5, respectively 2 layers, 4 layers, 6 layers of Recognition with Recurrent Neural Network exist Temporal expanded view.Wherein, Relu indicates that activation primitive, Lstm indicate long mnemon in short-term, Softmax presentation class letters Number.W* (* indicates positive integer) indicates weight matrix.Each layer calculates forward layer and each layer it can be seen from expanded view Calculate that layer is all provided with corresponding initial weight matrix backward.Such as w2, w5 in Fig. 3, w3, w5, w6, w8 in Fig. 4, Yi Jitu W3, w5, w7, w8, w10, w12 in 5.

Multilayer circulation neural network can be trained in advance.Multilayer circulation neural network can be utilized in training The corresponding mapped file of public feelings information is trained, and the corresponding type of multiple sentences is had recorded in mapped file.Since multilayer is followed Ring neural network only receives numerical value input, therefore in training, server can compile multiple sentences of every public feelings information Code.Specifically, server is before training, sample information can be utilized to generate training table.Multiple trained sentences are had recorded in training table Son, each training sentence correspond to multiple trained words.Server encodes each trained word, further according to the coding pair of training word Each sentence is encoded.

Multilayer circulation neural network after server calls training, at most by the coding input of multiple sentences in public feelings information The input layer of layer Recognition with Recurrent Neural Network.Input layer calculates forward the weight matrix of layer by activation primitive activation first, and swashs Living first calculates the weight matrix of layer backward, calculates the initial weight matrix of layer forward in conjunction with first and first calculates layer backward Initial weight matrix proceed by operation.Wherein, layer is calculated forward and calculate between layer do not have information flow backward.

It is illustrated so that the multilayer circulation neural network after training is 4 layers of Recognition with Recurrent Neural Network as an example.It is inputted in input layer Can be " the Pingchang winter, Austria just terminated, and Winter Olympic Games comes into Beijing time ", " 2022 Beijing winter Austria refuel ", " China refuels " Coding.W1 is the first weight matrix for calculating layer forward, and w3 is the first initial weight matrix for calculating layer forward, by Lstm After operation, exporting the weight matrix w3 that calculates forward respectively, (w3 at this time and initial w3 are different, are in order to succinct here Description use identical label) and second forward calculate layer corresponding to weight matrix w4.W2 calculates layer backward for first Weight matrix, w6 be first backward calculate layer initial weight matrix export calculate backward respectively after Lstm operations Weight matrix w6 (w6 at this time and initial w6 are different, and identical label is used also for succinct description) and Second calculates the weight matrix w7 corresponding to layer backward.And so on recycled, until output layer by classification function successively Export the classification of each sentence.

Server counts the classification of multiple sentences in public feelings information, and classification statistical magnitude is ranked up.According to Sequence from high to low, using one or more classifications as the corresponding classification of public feelings information.For example, a microblogging, corresponding Classification can be sport, can also be news etc..

In the present embodiment, when needing to classify to public feelings information, server can be obtained by term vector model training Corresponding weight vectors are obtained to multiple sentences in public feelings information, then generate the corresponding weight matrix of multiple sentences.Service Multilayer circulation neural network after the coding input to training of multiple sentences is passed through the multilayer circulation neural network after training by device Operation is carried out using the coding and weight matrix of multiple sentences, exports the classification of each sentence.Server is according to multiple sentences Classification so as to obtaining the classification of public feelings information.Since the weight vectors of each sentence are obtained by term vector model training Arrive, multilayer circulation neural network be for magnanimity sentence weight matrix be trained after obtain.By by natural language Description effectively map to vector space, improve the convergence efficiency of multilayer circulation neural network, improve the accuracy of classifying quality. So as to swash to network, a large amount of public feelings information got effectively is classified.

In one embodiment, this method further includes：Term vector model training and multilayer circulation neural metwork training Step.As shown in fig. 6, including following：

Step 602, training set corresponding with public feelings information is obtained, training set includes a plurality of sample information, sample information Including multiple trained sentences and with the corresponding multiple trained words of training sentence.

Step 604, term vector model is trained by training word, obtains training the corresponding term vector of word.

Step 606, term vector model is trained by multiple trained sentences corresponding term vector, obtains training sentence Corresponding sentence vector.

Step 608, multilayer circulation neural network is trained by multiple trained sentences corresponding sentence vector, is obtained The corresponding classification of multiple trained sentences.

Server can crawl a variety of public feelings informations in multiple websites, and the public feelings information crawled is stored in database. Server pre-processes the public feelings information crawled as language material, including subordinate sentence, participle, cleaning etc..Server by utilizing is pre- Treated, and language material establishes corpus.Pretreated language material is labeled as sample by server in corpus according to preset ratio Information.Server by utilizing sample information generates training set.Training set includes the corresponding trained sentence of a plurality of sample information, and Trained word corresponding with training sentence.Term vector model can in advance be instructed with multilayer circulation neural network by training set Practice.Multilayer circulation neural network needs to rely on the sentence vector that term vector model training obtains in training.Term vector model profit When training the sentence vector of multiple sentences with training set, the term vector for relying on each sentence is needed.

Skip-Gram models may be used in term vector model, i.e., neural network structure, including input may be used in the model Vector, hidden layer and output layer.It is the output layer output final result by the model in traditional mode, and it is final The result is that a probability distribution.This probability distribution is not particularly suited for multilayer circulation neural network.Therefore, in the present embodiment, only Using the input vector of the model and the structure of hidden layer, the weight vectors of multiple words are exported by hidden layer, not followed by It is continuous that operation is carried out by output layer.

Since term vector model and multilayer circulation neural network only receive numerical value input, in training, server Training table is generated using sample information.Multiple trained sentences are had recorded in training table.Server can also generate phase according to training word The training vocabulary answered.Server encodes each trained word, is compiled to each sentence further according to the coding of training word Code.

When to disaggregated model training, server is used as input vector by the coding of multiple trained words in training set first Term vector model is trained, obtains training the corresponding term vector of word.Secondly, each sentence in server by utilizing sample information Coding and the term vectors of corresponding multiple words term vector model is trained again, obtain training the corresponding sentence of sentence Vector.Then, the multiple trained sentences of server by utilizing sentence vector generate training weight matrix, using training weight matrix with And the coding of multiple sentences is trained multilayer circulation neural network, obtains the corresponding classification of each trained sentence.

In traditional mode, calculate that layer and first calculates layer institute backward forward by the first of multilayer circulation neural network Corresponding weight matrix is initialized to random vector, and the convergence effect that may result in multilayer circulation neural network is poor, Can not effectively it be classified to sentence.And in the present embodiment, it, can be accurate by being trained to the training word in sample information Obtain the term vector of each trained word.It is trained again using the corresponding term vector of training word, accurately obtains each trained sentence The corresponding sentence vector of son.To which natural language is mapped to vector space, and then multilayer circulation nerve net can be effectively improved The convergence effect of network realizes effective classification to multiple sentences.

In one of the embodiments, using training word to term vector model be trained including：Count multiple trained sentences The vocabulary quantity of training word in son, the maximum vocabulary number tag by training word in multiple trained sentences are the first input parameter； According to the difference of the vocabulary quantity of training sentence maximum vocabulary quantity corresponding with the first input parameter, increase in training sentence The preset characters of respective numbers；By training word in multiple trained sentences and the preset characters filled into term vector model into Row training, obtains the corresponding term vector of multiple trained words.

Since the vocabulary quantity of different sentences in public feelings information is different, in order to enable the term vector model after training can be applicable in The first input parameter is provided with to term vector model in diversified sentence, the present embodiment.Server can count multiple instructions The vocabulary quantity for practicing training word in sentence, obtains the vocabulary quantity of the corresponding trained word of each trained sentence, by multiple trained sentences The maximum vocabulary number tag of training word is the first input parameter in son.It is less than the training of the first input parameter for vocabulary quantity Sentence, server can increase the pre- of respective numbers according to the difference of the vocabulary quantity and the first input parameter of the training sentence If character.Preset characters can be the character not conflicted with public feelings information, such as null character.For example, the first input parameter is 20, Corresponding first output parameter is also 20, it is assumed that the vocabulary quantity of some training sentence is 10, then server increases for the sentence 10 preset characters.The corresponding coding of server by utilizing training word and the coding of preset characters that fills into term vector model into Row training, thus obtains each trained word and the corresponding weight vectors of preset characters.The preset characters filled into are referred to as Newly-increased character.

Packet is trained to term vector model by the corresponding term vector of multiple trained sentences in one of the embodiments, It includes：The sentence quantity of training sentence in statistical sample information, is the second input parameter by maximum sentence number tag；According to sample The difference of the sentence quantity and the second input parameter of information, increases the sentence of respective numbers using preset characters in sample information Son；Term vector model is trained by multiple trained sentences and newly-increased sentence, obtains the corresponding sentence of multiple trained sentences Subvector.

Since the sentence quantity in different public feelings informations is different, in order to enable term vector model can be suitably used for diversified carriage Feelings information is provided with the second input parameter in the present embodiment to term vector model.Server can count in a plurality of sample information Maximum sentence number tag is the second input parameter by the sentence quantity of training sentence.Second input is less than for sentence quantity The sample information of parameter, server can increase corresponding according to the difference of the sentence quantity and the second input parameter of sample information The sentence of quantity.It can be made of preset characters in the sentence being increased.Preset characters can not conflict with public feelings information Character, such as null character.The multiple trained sentences of server by utilizing and the corresponding term vector of sentence that fills into are again to term vector Model is trained, and thus obtains the corresponding weight vectors of each trained sentence.Wherein, the sentence filled into is referred to as increasing newly Sentence.

Further, it before server is trained training sentence, will can also each be instructed according to the first input parameter The vocabulary quantity for practicing training word in sentence is increased so that the vocabulary quantity after each training sentence increase preset characters reaches The value of first input parameter.Server according to the second input parameter to each of sample information training sentence sentence quantity into Row increases so that the sentence quantity in every sample information reaches the value of the second input parameter.Server by utilizing increases vocabulary number Training sentence after amount is trained again by term vector model, obtains the corresponding sentence vector of multiple trained sentences.From And term vector model can be further fixed, the versatility of the term vector model after training is effectively promoted.

In one embodiment, by multiple trained sentences and newly-increased sentence to term vector model be trained including： The corresponding mapped file of training sentence is obtained, the corresponding classification of trained sentence is had recorded in mapped file；According to multiple trained sentences Sentence vector corresponding to sub and newly-increased sentence, which generates, trains weight matrix, after training weight matrix and increase sentence quantity Sample information it is corresponding；Using multiple trained sentences, newly-increased sentence and corresponding trained weight matrix, pass through multilayer circulation Neural network is trained, and exports the corresponding classification of training sentence.

In order to fix the model structure of multilayer circulation neural network so that multilayer circulation neural network has general after training Property.The second input parameter is provided with to multilayer circulation neural network in the present embodiment.Server is referred to above-described embodiment It generates corresponding to the sample information (filling into the sample information after sentence according to the second input parameter) after each increase sentence The training weight matrix calculated forward, and the training weight matrix that calculates backward.

With reference to the mode in above-described embodiment, it is corresponding that server obtains each coding of training sentence and newly-increased sentence Corresponding encoded, is input to the input layer of multilayer circulation neural network, sets the training weight matrix calculated forward to by coding First calculates forward the weight matrix of layer, sets the training weight matrix calculated backward to the first weight square for calculating layer backward Battle array.Server is provided with multiple weights calculated forward according to the second input parameter between input layer and first calculate forward layer Matrix.Server is provided with multiple power calculated backward according to the second input parameter between input layer and first calculate layer backward Weight matrix.For example, the second input parameter be 10, then server input layer and first forward calculate layer between be provided with 10 forward The weight matrix of reckoning, server input layer and first backward calculate layer between be provided with 10 weight matrix calculated backward. That is, 10 w1 and 10 w2 can be arranged in server in Fig. 4.Include 10 trained sentences in sample information in w1 The weight matrix calculated forward corresponding to sub and newly-increased sentence.Include in w2 in sample information 10 trained sentences and The weight matrix calculated backward corresponding to newly-increased sentence.Server calculates forward each layer in hidden layer the initial weight square of layer Battle array is initialized, and calculates that the initial weight matrix of layer initializes backward to each layer in hidden layer.In initialization Afterwards, server is trained multilayer circulation neural network, exports the corresponding classification of each training sentence.For preset characters Output, can also be preset characters.Training result will not be impacted.

During training, the sentence vector of training sentence each of is obtained as a result of term vector model training, Thus, it is possible to more accurately the vector situation of each training sentence of reflection, the convergence for effectively improving multilayer circulation neural network are imitated Fruit, so as to improve the accuracy of multilayer circulation neural metwork training.By the way that the second input parameter is arranged so that every sample The corresponding sentence quantity of information is identical, so that the multilayer circulation neural network after the term vector model and training after training With versatility.Without a variety of models of training, the workload of developer is effectively reduced.

It further, can also be with reference to being provided in above-described embodiment before being trained to multilayer circulation neural network Mode, the first input parameter is arranged to term vector model so that the vocabulary quantity of each training sentence is identical.By training institute Not only sentence quantity is identical in the multiple sample informations used, but also the vocabulary quantity of each sentence is identical, so as into one Step improves the versatility of the term vector model after training and the multilayer circulation neural network after training.

Multilayer circulation neural network nerve includes multiple hidden layers in one of the embodiments,；Utilize multiple trained sentences Son, newly-increased sentence and corresponding trained weight matrix, by multilayer circulation neural network be trained including：It is implicit to every layer Initial weight matrix of the Layer assignment random vector as hidden layer；According to the second input parameter in input layer and first layer hidden layer Between setting and the corresponding trained weight matrix of sample information after increase sentence quantity；By the corresponding volume of multiple trained sentences Code and newly-increased sentence coding input to multilayer Recognition with Recurrent Neural Network input layer；Multilayer hidden layer utilizes initial weight matrix And training weight matrix is trained, and the corresponding classification of training sentence is exported by output layer.

When server is trained multilayer circulation neural network by training word, need to carry out every layer of hidden layer initial Change.Every layer of hidden layer can be including calculating forward layer and calculating layer backward.The layer of reckoning forward of every layer of hidden layer and to pusher Layer is calculated to be required for being initialized.In traditional mode, the layer of reckoning forward of every layer of hidden layer and calculate that layer is corresponding backward Initial weight matrix is initialized to 0, but the generalization ability of multilayer circulation neural network trained of this mode by Limit, if having the public feelings information of more different-formats in the future, it is possible to need re -training.

In the present embodiment, in initialization, the layer of reckoning forward from server to every layer of hidden layer and calculate Layer assignment backward Random vector is as initial weight matrix.Random vector can be the array of preset length, for example, it may be 200 dimensions or 300 Dimension.After the initialization is completed, server is arranged and the sample after increase sentence quantity between input layer and first layer hidden layer The corresponding trained weight matrix of this information.Server is defeated by the coding of multiple trained corresponding codings of sentence and newly-increased sentence Enter to the input layer of multilayer Recognition with Recurrent Neural Network.Multilayer hidden layer profit can be passed through in a manner of being provided in parameter above-described embodiment It is trained with initial weight matrix and training weight matrix, the classification of each training sentence is exported by output layer.

Since for configuration random vector as initial weight matrix, thus, it is possible to effectively improve in initialization for every layer of hidden layer The generalization ability of multilayer circulation neural network can be suitable for more diversified public feelings information in future.It is a variety of without training Model effectively reduces the workload of developer.

It should be understood that although each step in the flow chart of Fig. 2 and Fig. 6 is shown successively according to the instruction of arrow, But these steps are not the inevitable sequence indicated according to arrow to be executed successively.Unless expressly state otherwise herein, these There is no stringent sequences to limit for the execution of step, these steps can execute in other order.Moreover, in Fig. 2 and Fig. 6 At least part step may include that either these sub-steps of multiple stages or stage are not necessarily same to multiple sub-steps One moment executed completion, but can execute at different times, and the execution in these sub-steps or stage sequence is also not necessarily Be carry out successively, but can with other steps either the sub-step of other steps or at least part in stage in turn or Alternately execute.

In one embodiment, as shown in fig. 7, providing a kind of public feelings information sorter, including：Model building module 702, data obtaining module 704, weight matrix generation module 706 and sort module 708, wherein：

Model building module 702, for establishing disaggregated model, disaggregated model includes term vector model and multilayer circulation nerve Network.

Data obtaining module 704, for obtaining public feelings information, public feelings information includes multiple sentences.

Weight matrix generation module 706, for obtaining the corresponding sentence vector of multiple sentences using term vector model training, Weight matrix is generated using the corresponding sentence vector of multiple sentences.

Sort module 708, for obtaining the corresponding coding of multiple sentences, by the coding input of multiple sentences to training Multilayer circulation neural network afterwards；Coding based on multiple sentences of multilayer circulation neural network after training and weight matrix into Row operation exports the classification of multiple sentences；The corresponding classification of public feelings information is determined according to the classification of multiple sentences.

In one embodiment, which further includes：First training module 710 and the second training module 712, wherein：

First training module 710, for obtaining training set corresponding with public feelings information, training set includes a plurality of sample letter Breath, sample information include multiple trained sentences and multiple trained words corresponding with training sentence；By training word to term vector Model is trained, and obtains training the corresponding term vector of word；By the corresponding term vector of multiple trained sentences to term vector model It is trained, obtains training the corresponding sentence vector of sentence；

Second training module 712, for by the corresponding sentence vector of multiple trained sentences to multilayer circulation neural network It is trained, obtains the corresponding classification of multiple trained sentences.

In one embodiment, the first training module 710 is additionally operable to count the vocabulary number of training word in multiple trained sentences Maximum vocabulary number tag is the first input parameter by amount；Vocabulary quantity according to training sentence is corresponding with the first input parameter Maximum vocabulary quantity difference, training sentence in increase respective numbers preset characters；By in multiple trained sentences Training word and the preset characters filled into are trained term vector model, obtain the corresponding term vector of multiple trained words.

In one embodiment, the first training module 710 is additionally operable to the sentence quantity of training sentence in statistical sample information, It is the second input parameter by maximum sentence number tag；According to the difference of the sentence quantity and the second input parameter of sample information, Increase the sentence of respective numbers in sample information using preset characters；By multiple trained sentences and newly-increased sentence to word to Amount model is trained, and obtains the corresponding sentence vector of multiple trained sentences.

In one embodiment, the second training module 712 is additionally operable to obtain the corresponding mapped file of trained sentence, mapping text The corresponding classification of trained sentence is had recorded in part；Sentence vector corresponding to multiple trained sentences and newly-increased sentence generates Training weight matrix, training weight matrix are corresponding with the sample information after increase sentence quantity；Using multiple trained sentences, Newly-increased sentence and corresponding trained weight matrix, are trained by multilayer circulation neural network, are exported training sentence and are corresponded to Classification.

In one embodiment, the second training module 712 is additionally operable to every layer of implicit Layer assignment random vector as implicit The initial weight matrix of layer；It is arranged between input layer and first layer hidden layer according to the second input parameter and increases sentence quantity The corresponding trained weight matrix of sample information afterwards；The coding of multiple trained corresponding codings of sentence and newly-increased sentence is defeated Enter to the input layer of multilayer Recognition with Recurrent Neural Network；Multilayer hidden layer is instructed using initial weight matrix and training weight matrix Practice, the corresponding classification of training sentence is exported by output layer.

Specific about public feelings information sorter limits the limit that may refer to above for public feelings information sorting technique Fixed, details are not described herein.Modules in above-mentioned public feelings information sorter can fully or partially through software, hardware and its It combines to realize.Above-mentioned each module can be embedded in or in the form of hardware independently of in the processor in computer equipment, can also It is stored in a software form in the memory in computer equipment, in order to which processor calls the above modules of execution corresponding Operation.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include the processor connected by system bus, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing public feelings information and sample information etc..The network interface of the computer equipment is used for and outside Server communicated by network connection.To realize a kind of public feelings information classification side when the computer program is executed by processor Method.

It will be understood by those skilled in the art that structure shown in Fig. 8, is only tied with the relevant part of application scheme The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor realize the step in above-mentioned each embodiment of the method when executing computer program.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the step in above-mentioned each embodiment of the method when being executed by processor.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims

1. a kind of public feelings information sorting technique, the method includes：

The corresponding sentence vector of multiple sentences is obtained using term vector model training, is given birth to using the corresponding sentence vector of multiple sentences At weight matrix；

The corresponding coding of the multiple sentence is obtained, by the multilayer circulation nerve after the coding input to training of multiple sentences Network；Coding and the weight matrix of the multilayer circulation neural network based on multiple sentences after the training carry out operation, Export the classification of multiple sentences；

2. according to the method described in claim 1, it is characterized in that, the method further includes：

Training set corresponding with public feelings information is obtained, the training set includes a plurality of sample information, and sample information includes multiple Training sentence and multiple trained words corresponding with training sentence；

The term vector model is trained by multiple trained sentences corresponding term vector, the trained sentence is obtained and corresponds to Sentence vector；

Multilayer circulation neural network is trained by multiple trained sentences corresponding sentence vector, obtains multiple trained sentences Corresponding classification.

3. according to claim 2 method, which is characterized in that it is described term vector model to be trained using the trained word Including：

According to the difference of the vocabulary quantity of the trained sentence maximum vocabulary quantity corresponding with the first input parameter, in the instruction Practice the preset characters for increasing respective numbers in sentence；

The term vector model is trained by training word in multiple trained sentences and the preset characters filled into, is obtained The corresponding term vector of multiple trained words.

4. according to the method described in claim 2, it is characterized in that, it is described by the corresponding term vector of multiple trained sentences to institute Predicate vector model be trained including：

According to the difference of the sentence quantity and the second input parameter of sample information, increased in the sample information using preset characters Add the sentence of respective numbers；

The term vector model is trained by multiple trained sentences and newly-increased sentence, multiple trained sentences is obtained and corresponds to Sentence vector.

5. according to the method described in claim 4, it is characterized in that, it is described by multiple trained sentences and newly-increased sentence to institute Predicate vector model be trained including：

The corresponding mapped file of the trained sentence is obtained, the corresponding classification of trained sentence is had recorded in the mapped file；

Sentence vector corresponding to multiple trained sentences and newly-increased sentence generates training weight matrix, the trained weight Matrix is corresponding with the sample information after increase sentence quantity；

Using multiple trained sentences, newly-increased sentence and corresponding trained weight matrix, pass through the multilayer circulation neural network It is trained, exports the corresponding classification of training sentence.

6. according to the method described in claim 5, it is characterized in that, the multilayer circulation neural network nerve includes multiple implicit Layer；It is described to utilize multiple trained sentences, newly-increased sentence and corresponding trained weight matrix, pass through the multilayer circulation nerve net Network be trained including：

After being arranged between the input layer and first layer hidden layer according to second input parameter and increase sentence quantity The corresponding trained weight matrix of sample information；

By the defeated of the coding input of multiple trained corresponding codings of sentence and newly-increased sentence to the multilayer circulation neural network Enter layer；

Multilayer hidden layer is trained using the initial weight matrix and training weight matrix, is exported and is trained by output layer The corresponding classification of sentence.

7. a kind of public feelings information sorter, which is characterized in that described device includes：

Model building module, for establishing disaggregated model, the disaggregated model includes term vector model and multilayer circulation nerve net Network；

Weight matrix generation module, for obtaining multiple sentences corresponding sentence vector using term vector model training, using more The corresponding sentence vector of a sentence generates weight matrix；

Sort module, for obtaining the corresponding coding of the multiple sentence, by the coding input of multiple sentences to the instruction Multilayer circulation neural network after white silk；Coding based on multiple sentences of multilayer circulation neural network after the training and described Weight matrix carries out operation, exports the classification of multiple sentences；Determine that the public feelings information is corresponding according to the classification of multiple sentences Classification.

8. device according to claim 7, which is characterized in that described device further includes：

First training module, for obtaining training set corresponding with public feelings information, the training set includes a plurality of sample information, Sample information includes multiple trained sentences and multiple trained words corresponding with training sentence；By the trained word to term vector Model is trained, and obtains the corresponding term vector of the trained word；By the corresponding term vector of multiple trained sentences to institute's predicate Vector model is trained, and obtains the corresponding sentence vector of the trained sentence；

Second training module, for being instructed to multilayer circulation neural network by the corresponding sentence vector of multiple trained sentences Practice, obtains the corresponding classification of multiple trained sentences.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In when the processor executes the computer program the step of any one of realization claim 1 to 6 the method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method according to any one of claims 1 to 6 is realized when being executed by processor.