CN108388942A

CN108388942A - Information intelligent processing method based on big data

Info

Publication number: CN108388942A
Application number: CN201810163995.1A
Authority: CN
Inventors: 王兰鹰
Original assignee: Sichuan Songyuan Cloud Technology Co Ltd
Current assignee: Sichuan Songyuan Cloud Technology Co Ltd
Priority date: 2018-02-27
Filing date: 2018-02-27
Publication date: 2018-08-10

Abstract

The information intelligent processing method based on big data that the present invention provides a kind of, this method include：The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data；It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained；Convolutional neural networks parameter is updated according to next audio block, while reconstruction error is rebuild and calculated to the audio block, if error is more than given threshold, which is added summary speech data.The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall rate, significantly improves the efficiency that user obtains knowledge.

Description

Information intelligent processing method based on big data

Technical field

The present invention relates to language data process, more particularly to a kind of information intelligent processing method based on big data.

Background technology

With the development of scientific and technological progress and network technology, in network, there are magnanimity remembers using media information, such as call Record, wechat voice, minutes face a large amount of audio data, and user needs more rapidly to understand voice messaging, when saving user Between, improve working efficiency.With the fast development of information retrieval technique, voice summary generation technique is also increasingly mature.From initial Method based on word frequency, to machine learning is introduced, performance has greatly improved.Existing scheme generally uses supervised learning side Method is trained training set using disaggregated model, obtains optimal weight vector, and to test set in training set Carry out classification prediction；But supervised learning model is relied on, labeled data is needed, usually by manually marking realization, is taken very much, and have There is subjectivity, and be easy to ignore the semantic similarity between sentence, reduces the accuracy of result of calculation.

Invention content

To solve the problems of above-mentioned prior art, the present invention proposes at a kind of information intelligent based on big data Reason method, including：

The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data； It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained；According to Next audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, if error is big In given threshold, then summary speech data are added in the audio block.

Preferably, in the trained primary voice data efficient voice block multilayer convolutional neural networks, obtain each frame Polynomial repressentation further comprises：

Utilize denoising encoder initial training multilayer convolutional neural networks；

Each frame audio is proceeded as follows in each layer：

First, by adding Gaussian noise, the random input variable that sets as each frame audio-frequency noise of arbitrary value generation；

Then, audio-frequency noise is mapped to obtain its polynomial repressentation；

Update is adjusted to each layer parameter of convolutional neural networks.

Preferably, the described pair of audio block is rebuild, and is further comprised：

The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, X_kCorresponding k-th original Audio block；

Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y₁,Y₂,…,Y_k,…Y_m, Y_kIt is corresponding The polynomial repressentation of k-th of audio block；

If initial speech library D is by n_dA element composition, i.e. D={ d_j}_j∈[1,nd], d_jCorresponding j-th of element；If reconstructed coefficients For C, element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ C_i}_i∈[1,nf], C_kCorresponding k-th of sound Frequency block coefficient, cⁱCorresponding i-th frame voice；

Initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,：

Wherein, symbol | | | |₂Indicate the l of variable₂Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Y_k,C_k, D) be embodied as：

Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library It builds, specially：

First preset parameter D, makes above-mentioned object function become the convex function of parameter C；Then preset parameter C makes above-mentioned target Function becomes the convex function of parameter D, and iteration alternately updates two parameters.

The present invention compared with prior art, has the following advantages：

The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall Rate significantly improves the efficiency that user obtains knowledge.

Description of the drawings

Fig. 1 is the flow chart of the information intelligent processing method according to the ... of the embodiment of the present invention based on big data.

Specific implementation mode

Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.

An aspect of of the present present invention provides a kind of information intelligent processing method based on big data.Fig. 1 is according to the present invention The information intelligent process flow figure based on big data of embodiment.

The present invention obtains primary voice data first, carries out following operation：

1) it is multiple audio blocks by phonetic segmentation, each audio block includes multiframe, extracts the statistical nature of each frame audio, shape At corresponding feature vector；

2) training efficient voice block multilayer convolutional neural networks, obtain the polynomial repressentation of each frame；

3) m audio block is initial results before choosing, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained；

4) convolutional neural networks parameter is updated according to next audio block, while the audio block is rebuild and calculates reconstruction The audio block is added in summary speech library if error is more than given threshold and updates the library by error；

5) according to the new audio block of step 4) successively online processing until terminating, newer summary speech data are to generate Summary speech data.

The statistical nature of each frame audio of extraction described in step 1) forms individual features vector, specifically：

1) it sets raw tone and is uniformly divided into n audio block, i.e., each audio block includes t frame audios, and each frame audio is converted At unified code check and keep crude sampling rate；

2) local feature of each frame, including zero-crossing rate average amplitude difference and LPC coefficient are extracted；

3) the above-mentioned audio frequency characteristics for sequentially combining each frame form the feature vector that dimension is nf.

Initial training efficient voice block multilayer convolutional neural networks in step 2) obtain the polynomial repressentation of each frame, specifically It is：

A, each frame audio is proceeded as follows in each layer：First, by adding Gaussian noise, setting input variable at random Each frame audio-frequency noise is generated for arbitrary value；Then, audio-frequency noise is mapped to obtain its polynomial repressentation；

B, update is adjusted to each layer parameter of convolutional neural networks；

Summary speech data are rebuild in step 3), specifically：

1) summary speech data by raw tone preceding m audio block set at, m is positive integer, i.e., shared m × t frame audios, X_kCorresponding k-th of original audio block；Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y₁,Y₂,…, Y_k,Y_m, Y_kThe polynomial repressentation of corresponding k-th of audio block；

2) initial speech library D is set by n_dA element composition, i.e. D={ d_j}_j∈[1,nd], d_jCorresponding j-th of element；If rebuilding system Number is C, and element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ C_i}_i∈[1,nf], C_kIt is k-th corresponding Audio block coefficient, cⁱCorresponding i-th frame voice；

3) initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,：

Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library It builds.Specially：First preset parameter D, makes above-mentioned object function become the convex function of parameter C；Then preset parameter C makes above-mentioned mesh Scalar functions become the convex function of parameter D, and iteration alternately updates two parameters.

Convolutional neural networks parameter is updated according to next audio block and the audio block is rebuild and counted in step 4) Reconstruction error is calculated, specifically：

1) each frame audio of the audio block is done as follows successively：

A. the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated；

B. the parameter of other layers in BP algorithm update convolutional neural networks is utilized；

2) polynomial repressentation of each frame audio is updated according to new parameter；

3) it is based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block X_k's Polynomial repressentation Y_kIt is rebuild, the specific steps are：First minimize function of many variables F (Y_k,C_k, D) and obtain optimal reconstructed coefficients then First item, that is, l of substitution₂In norm and to calculate its value be current reconstruction error ε.

Current audio block is added in summary speech library and is updated the library if error is more than given threshold in step 4), Specifically：

If 1) to current audio block X_kPolynomial repressentation Y_kThe reconstruction error ε being calculated is more than given threshold θ, then will Current audio block X_kIt is added in summary speech library S.

If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into y_q, then Y is used_k∈y_qIt updates library D and solves object function：

Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.

Wherein, when block of speech is extracted, the present invention extracts the LPC coefficient maximum value and frequency of analog voice signal time domain first The average amplitude in domain is poor, and the feature of extraction is then formed bivector as the input of convolutional neural networks, utilizes nerve net The output of network judges whether signal is analog voice signal.

After eliminating DC component, the LPC coefficient maximum value of voice and average amplitude difference are extracted.Setting network output valve is worked as In modeJudge again for the one-dimensional vector that threshold value exports network, be determined as voice segments optionally greater than this threshold value, It is determined as non-speech segment less than this threshold value.

Extract two features of LPC coefficient maximum value and average amplitude difference of analog voice signal.Analog voice signal s (n) LPC coefficient R_w(k)：

In formula, s_w(n) it is adding window voice；N is efficient voice block length；K is retardation；

To s_w(n) it is maximized, you can obtain LPC coefficient maximum value.

The average amplitude difference Ω of analog voice signal s (n) is given by：

In formula, N is frame length；S (k) is the FFT transform of s (n)；E is the mean value of analog voice signal frequency domain amplitude.

The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted The number of layer neuron is 2.Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output The number of layer neuron is 1.Hidden layer neuron number is 5.

In forward direction transmission, input signal is successively handled through hidden layer, until output layer.Each layer of neuron state is only Under the influence of one layer of neuron state.If w_ijIt is the connection weighting coefficient of input layer and hidden layer, w_jkIt is hidden layer and output layer Connect weighting coefficient, a_jIt is the threshold value of hidden layer, b_kIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold Value makes convolutional neural networks prediction output approach desired output.

Using the initial weighting coefficients and threshold value of genetic algorithm optimization convolutional neural networks, including：

(1) individual is encoded using coefficient coding, each individual is a numeric string, by input layer and hidden layer It connects weighting coefficient, hidden layer threshold value, hidden layer and output layer connection weighting coefficient and 4 part of output layer threshold value forms；It is a Body contains neural network whole weighting coefficient and threshold value, known to network structure, so that it may with constitute a structure, The neural network that weighting coefficient, threshold value determine.

(2) initial weighting coefficients and threshold value of convolutional neural networks are obtained according to individual, with training data training convolutional god Forecasting system exports after network, by the Error Absolute Value between prediction output and desired output and as ideal adaptation angle value, That is fitness function F is set as：

N is convolutional neural networks output node number；y_iFor the desired output of i-th of node of convolutional neural networks；o_iFor convolution The prediction of i-th of node of neural network exports；K is predefined coefficient.

(3) selection strategy based on the ratio of adaptation, the select probability p of each individual i_iFor：

f_i=k/F_i

In formula, F_iFor the fitness value of individual i；

K is coefficient, takes 10 here；

N is population at individual quantity, takes 10 here.

(4) coefficient interior extrapolation method, k-th of chromosome a are used in crossover operation_kWith first of chromosome a_lIn j intersections Operating method is as follows：

a_kj=a_kj(1-b)+a_ljb

a_lj=a_lj(1-b)+a_kjb

In formula, random numbers of the b between [0,1].

(5) j-th of gene a of i-th of individual is chosen_ijInto row variation：

a_ij=a_ij+(a_ij-a_max)*f(g)r>0.5

a_ij=a_ij+(a_min-a_ij)*f(g)r≤0.5

In formula, a_max, a_minIt is respectively gene a_ijThe upper bound and lower bound；

F (g)=r₂(1-g/G_max)²；

Wherein r₂For random number；G is current iteration number；G_maxFor maximum evolution number；R takes the random number between [0,1].

Voice critical points detection realizes that steps are as follows：

1) the convolutional neural networks structure for using 2-5-1, extracts the LPC coefficient maximum value of raw tone and average width first Poor two features are spent, using the bivector as the input of neural network.Judge whether the frame is efficient voice using output layer Block, random initializtion weighting coefficient and threshold value.

2) original audio block is randomly choosed, the classification of every frame signal is marked, if voice is then labeled as 1, if not then Labeled as 0.Extract the LPC coefficient maximum value of the section audio block and average amplitude difference respectively, formed a two-dimensional feature to Amount, the input vector as convolutional neural networks.

3) training sample is inputted into convolutional neural networks to train the parameter of network, and convolutional neural networks is carried out excellent Change, the error between network output valve and desired value is made to reach preset standard.

4) the LPC coefficient maximum value of each audio block and average amplitude difference are extracted respectively, form a two-dimensional feature Vector is tested as test sample input convolutional neural networks.Improved threshold value T, i.e. convolutional neural networks are used herein The mode of all elements, is then determined as efficient voice block more than or equal to T in 1 dimensional vector of output；Less than T, it is determined that have to be non- Imitate block of speech.The output valve of convolutional neural networks is compared with the value marked in advance, if accuracy is relatively low, to network into Row re -training.

5) determine whether voice segments using the output valve of network.

Block of speech input signal vector is X (n)=[x₁(n),x₂(n),..x_M(n)]^T, then X (n) filtered by speech enhan-cement The y (n) obtained after wave device is expressed as:

Y (n)=coef [β₁x(n)+β₂x(n)+…+β_nb+1x(n)]

In formula, B=[β₁,β₂,…β_nb+1] it is filter weighting coefficients coefficient vector, coef is auto-adaptive parameter

Then this auto-adaptive parameter coef is introduced into ILMS sef-adapting filter models and block of speech is carried out at denoising Reason.SNR is calculated to the voice after denoising_i,SNR_iCorresponding coef values are final coef training output valves when being maximized：

SNR_i=∏_snr(f_LM(coef_i,s(n)))

In formula, coef_iThe natural number for being 1 for step-length；S (n) is block of speech to be reinforced；f_LMFor adaptive filter algorithm letter Number, according to coefⁱValue to s (n) carry out speech de-noising enhancing；∏_snr() is the function for calculating segmental signal-to-noise ratio；

To SNR_iMaximizing, and the subscript i corresponding to maximum value is assigned to coef：

Coef=argmax (SNR₁,SNR₂,,…)

In formula, argmax is the lower target function sought corresponding to maximum value.

Finally, in adaptive noise filter, according to coef values, each block of speech to be reinforced is enhanced.It is based on Enhanced block of speech carries out speech-to-text conversion.

After block of speech is carried out text identification, so that it may be extracted automatically with carrying out summary speech, the invention firstly uses convolution The feature vector of neural network algorithm training characteristics word, and then similarity between sentence is accurately calculated, iterative calculation update sentence weighting system Number is then based between sentence similarity to eliminate the information redundancy simplified in voice, specifically includes following steps：

1, the feature vector that Feature Words are obtained using convolutional neural networks model training morpheme is indicated：From big data storage Acquisition morpheme collection simultaneously pre-processes the morpheme collection, and the pretreatment includes carrying out subordinate sentence processing to the morpheme that morpheme is concentrated, Obtain training characteristics morpheme collection；Training parameter is set, is integrated as training data using training characteristics morpheme, to convolutional neural networks model It is trained, is exported in the form of feature vector as Feature Words by training using each word for concentrating training characteristics morpheme, Obtain the feature vector representation of Feature Words；

Feature vector for the training characteristics word from a large amount of unstructured voice data indicates that the present invention utilizes current word Feature vector predicts the feature vector of specified window context.Given feature morpheme w₁,w₂,w₃,…,w_TAs training data, mesh Scalar functions are：

Wherein, c is the parameter for determining contextual window size, and the c the big, and it is training number to need more training datas, T According to number.

The present invention distributes shorter path using W word of output layer as leaf node, to high frequency words.Each feature morpheme w It can be accessed from the root node of tree along unique paths.If n (w, j) is the jth on from root node to the paths w A node, L (w) are the length of this paths, therefore n (w, 1)=root, n (w, L (w))=w.For any internal node n, Ch (n) is any child node of node n.Then define：

Wherein functionIt indicates：

After defining above formula, object function is solved using stochastic gradient descent method, the feature vector for ultimately producing word indicates Form.

S2, it is retrieved in the collected morpheme concentrations of step S1 according to default query word, the block of speech retrieved is made For candidate blocks collection, subordinate sentence processing is carried out to the candidate blocks collection and removes candidate blocks concentrating the sentence repeated, obtain candidate blocks collection；

Wherein S_iFor the arbitrary sentence in candidate blocks collection S, N is the sum of sentence；Utilize the spy of the obtained Feature Words of step S1 Sign vector is used as the weighting coefficient on side in figure by semantic similarity between calculating sentence, constitutes sentence DAG graph models；

Arbitrary two S are concentrated to candidate blocks_iAnd S_j, separately include Feature Words t_iAnd t_jFeature vectorWithI.e.WithCorresponding feature vector is obtained by the convolutional neural networks model training of step S1, then sentence S_iAnd S_jBetween semantic similarity Sim (Si, Sj) formula is：

Wherein, for sentence S_iIn Feature Words t_iFeature vector It indicates in sentence S_jNeutralize Feature Words t_iBelong to In identical part of speech all Feature Words feature vector withMaximum similarity value；|S_i| and | S_j| S is indicated respectively_iAnd S_jLength Degree；

S3, the DAG graph models that step S2 is obtained, according in step S2 average initial weighting coefficients and sentence between semantic phase Weighting coefficient weight (the S of each node are updated using following formula iteration like degree_i), until convergence, to which obtain can be anti- Reflect the score value of sentence importance：

Wherein d is damped coefficient, and value range is [0,1].

Assoc(S_i) indicate and S_iConnected sentence set, i.e., with sentence S_iSimilarity is more than 0 sentence set, | | Assoc (S_i)| | it is then sentence sum in the set；

Using the similarity matrix that semantic similarity is constituted between the average initial weighting coefficients and sentence of step S2 interior joints, The weighting coefficient of each node in DAG graph models is iterated to calculate, until convergence.Final each node will obtain a score, be Summary speech is generated in next step to prepare.

S4 weakens the sentence, i.e., if a sentence has higher similitude with existing sentence in summary set The maximum and irredundant sentence composition of selection weighting coefficient simplifies set, the specific steps are：

1) what, initialization was empty simplifies voice queue；Using the sentence corresponding to each node in DAG graph models as initial time Voice queue is simplified in choosing；

2) the sentence weighting coefficient corresponding to each DAG graph models node in voice queue, is simplified to candidate according to step S3 Descending arranges, using the sentence corresponding to each node after sequence as candidate summary statement sequence；

3) it, according to candidate summary statement sequence, primary sentence will be arranged in will be transferred to and simplify in voice queue, to candidate The remaining sentence simplified in voice queue updates their weighting coefficient using following formula：

Weight(S_j)=Weight (S_j)-ω×Sim(S_i,S_j)

Wherein, i ≠ j, ω are the reduction factor, when the sentence of weighting coefficient to be updated is deposited with the sentence simplified in voice queue In similitude, reduction factor ω is 1.0.Sim(S_i,S_j) it is the semantic similarity obtained in step S2；

4) step 2) and 3), is repeated, until to reach preset summary speech long for the sentence collection for simplifying in voice queue Degree.

Assuming that the sentence sum of raw tone T is m, the rate of simplifying of summary speech is set as λ, needs the digest sentence extracted Sum is n, then λ=n/m.M is the sentence sum that raw tone identifies.Text is the linear combination of sentence, and sentence is word Linear combination, and word can be considered the linear combination of morpheme, i.e., can obtain the important journey of sentence indirectly by the significance level of morpheme Degree.Therefore, as follows based on the predefined summary speech extraction process for simplifying rate：

1. the significance level of each node in morpheme network is calculated, with the average value of the significance level of each morpheme in sentence Instead of the significance level of corresponding sentence, the sentence cluster S={ S with significance level are thus obtained₁,S₂…,S_m}：

w(n_i)_tIt is w (n_i) the t times iteration, ε is attenuation factor, C (n_i) it is morpheme set, each morpheme in the set With node n_iThe morpheme of expression all exists while relationship occurs；Coexsit(n_i,n_j) it is morpheme nodes n_iAnd n_jIt is representative Morpheme while occurrence rate；N is the sum of morpheme included in morpheme network.

2. distich cluster S carries out multi-field division；Assuming that the sentence cluster of k subdomains is obtained, with language in each subdomains sentence cluster The synthesis significance level of sentence replaces the significance level of each subdomains sentence cluster, and according to significance level by k subdomains sentence cluster descending Arrangement, is denoted as MS₁,MS₂..., MS_k(k<M), the sentence in each subdomains sentence cluster is arranged also according to significance level height descending；

3. carrying out above-mentioned de-redundancy processing to the sentence in subdomains sentence cluster.Then rate λ is simplified according to summary speech, from each Before field sentence cluster is extracted respectively by significance level sequenceA sentence.If；λ × m can be divided exactly by k, then can be obtained and finally want λ × m of output simplify sentence；If cannot be divided exactly by k, then subordinate clause cluster MS₁,MS₂..., MS_{(λ × m%k)}It is middle to extract respectivelyIt a sentence and is extracted just nowA sentence forms the summary sentence of voice T together, in this way, just having obtained most Afterwards simplify a cluster, and be denoted as S '={ S '₁,S’₂…,S’_λ×m}；

4. the sentence in set S ' is sequentially output according to original sequence, summary speech is obtained.

Processing for social networks voice data, the present invention is after the sentence and word for identifying speech text, it is preferable that Two phrases adjacent in each sentence are further synthesized into a word pair, each sentence indicates sequence by a string of words.Word is to combining Contextual information, be mutually reinforcing importance of the other side as the possibility and whole sentence of keyword, and according to occurring jointly Word generates summary speech data to extracting summary sentence.

N number of word that can accurately reflect text collection some sub-topics is extracted respectively first to as keyword pair, obtaining one A keyword is to set.The weights of each word pair can be calculated by following formula：

W_TF(b_i)=fre (b_i)*log₂(ifre(b_i))

Wherein fre (b_i) be word to b_iWord frequency, that is, b_iThe frequency occurred in entire text collection.

ifre(b_i) it is sentence sum and b occur_iSentence quantity ratio.

By all words to according to its W_TFIt is worth descending arrangement, then takes top n as keyword pair.

Calculate the distribution matrix of theme and word pair.Every a line is the probability distribution that theme is closed in words pair set in the matrix, Each element characterizes the word to the significance level relative to the theme.It sums by row to the matrix, using obtained value as every A word is to the global score closed in theme collection.

Take top n word to constituting keyword to set descending sort word based on this global score.

Based on above-mentioned keyword to set, calculates candidate sentence and word be overlapped in set is entirely closing number with keyword Keyword is to the ratio in set.

Meanwhile in order to weaken long or too short sentence, regularization is carried out to the score value, and what regularization factors took is to wait Select the length of sentence itself and numerical value larger in the mean sentence length of sentence set.The candidate sentence score of calculating can formulate definition such as Under：

Wherein S indicates that candidate sentence, KBS indicate keyword to set, b_iThe keyword pair as occurred simultaneously.| S | and | KBS | expression candidate sentence length and keyword are to the size of set respectively, the average length that Avlen is all in sentence set.

The extraction of summary sentence is extracted from the forward sentence that sorts on the basis of introducing similarity threshold to prevent redundancy M meet the sentence of similarity condition as summary sentence.The flow for extracting summary sentence is as follows：

(1) what initialization was empty simplifies voice queue；And initialize candidate collection；

(2) sentence for taking current sequence the first is as candidate sentence Sc；

(3) when it is empty to simplify voice queue, directly candidate sentence is added to and simplifies voice queue；Otherwise it calculates and waits successively Select the similarity of sentence Sc and each summary sentence Ss：

Once there are sim (S_c, S_s) ＞ Sim_tdThe case where, directly turn (5), wherein Sim_tdFor similarity threshold；

(4) candidate sentence is added to and simplifies voice queue；

(5) current candidate sentence is removed from Candidate Set；

(6) if the sentence number simplified in voice queue is less than preset quantity M, turn (1), otherwise turn (7)；

(7) voice queue is simplified in output.

Wherein, it if summary sentence includes temporal information, chronologically combines；If a plurality of summary sentence belongs in morpheme Same subject, then according in raw tone statement sequence combine.

In conclusion the method for the present invention is based on the processing of voice big data, noise immunity is stronger, and accuracy rate higher has more High recall rate significantly improves the efficiency that user obtains knowledge.

Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.

It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims

1. a kind of information intelligent processing method based on big data, which is characterized in that including：

The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data；It chooses The audio block of predefined quantity is initial results, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained；According to next Audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, is set if error is more than Determine threshold value, then summary speech data is added in the audio block.

2. according to the method described in claim 1, it is characterized in that, in the trained primary voice data efficient voice block it is more Layer convolutional neural networks, obtain the polynomial repressentation of each frame, further comprise：

Each frame audio is proceeded as follows in each layer：

Update is adjusted to each layer parameter of convolutional neural networks.

3. according to the method described in claim 1, it is characterized in that, the described pair of audio block is rebuild, further comprise：

The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, X_kCorresponding k-th of original audio Block；

Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y₁,Y₂,…,Y_k,…Y_m, Y_kCorresponding kth The polynomial repressentation of a audio block；

If initial speech library D is by n_dA element composition, i.e. D={ d_j}_j∈[1,nd], d_jCorresponding j-th of element；If reconstructed coefficients are C, Its element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ C_i}_i∈[1,nf], C_kCorresponding k-th of audio block Coefficient, cⁱCorresponding i-th frame voice；

Wherein, symbol | | | |₂Indicate the l of variable₂Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Y_k,C_k, Being embodied as D)：

Wherein, parameter γ is the coefficient more than 0, and the mathematical expression expression in symbol is rebuild using the i-th frame audio of D pairs of library, is had Body is：

First preset parameter D, makes above-mentioned object function become the convex function of parameter C；Then preset parameter C makes above-mentioned object function Become the convex function of parameter D, iteration alternately updates two parameters.