CN108417206A - High speed information processing method based on big data - Google Patents
High speed information processing method based on big data Download PDFInfo
- Publication number
- CN108417206A CN108417206A CN201810161849.5A CN201810161849A CN108417206A CN 108417206 A CN108417206 A CN 108417206A CN 201810161849 A CN201810161849 A CN 201810161849A CN 108417206 A CN108417206 A CN 108417206A
- Authority
- CN
- China
- Prior art keywords
- layer
- sentence
- library
- audio block
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Abstract
The high speed information processing method based on big data that the present invention provides a kind of, this method include:To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;The parameter of other layers in convolutional neural networks is updated using BP algorithm;The polynomial repressentation of each frame audio is updated according to new parameter;Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkPolynomial repressentation YkIt is rebuild.The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall rate, significantly improves the efficiency that user obtains knowledge.
Description
Technical field
The present invention relates to language data process, more particularly to a kind of high speed information processing method based on big data.
Background technology
With the development of scientific and technological progress and network technology, in network, there are magnanimity remembers using media information, such as call
Record, wechat voice, minutes face a large amount of audio data, and user needs more rapidly to understand voice messaging, when saving user
Between, improve working efficiency.With the fast development of information retrieval technique, voice summary generation technique is also increasingly mature.From initial
Method based on word frequency, to machine learning is introduced, performance has greatly improved.Existing scheme generally uses supervised learning side
Method is trained training set using disaggregated model, obtains optimal weight vector, and to test set in training set
Carry out classification prediction;But supervised learning model is relied on, labeled data is needed, usually by manually marking realization, is taken very much, and have
There is subjectivity, and be easy to ignore the semantic similarity between sentence, reduces the accuracy of result of calculation.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes at a kind of high speed information based on big data
Reason method, including:
To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset are updated
Measure b;
The parameter of other layers in convolutional neural networks is updated using BP algorithm;
The polynomial repressentation of each frame audio is updated according to new parameter;
Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkIt is more
Item formula indicates YkIt is rebuild.
Preferably, described to current audio block XkPolynomial repressentation YkIt is rebuild, is further comprised:
First minimize function of many variables F (Yk,Ck, D) and obtain then first item i.e. l that optimal reconstructed coefficients substitute into2In norm simultaneously
It is current reconstruction error ε to calculate its value.
Preferably, if the error ε is more than given threshold, current audio block is added in summary speech library and is updated should
Library.
Preferably, described that current audio block is added in summary speech library and updates the library, further comprise:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will
Current audio block XkIt is added in summary speech library S;
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into
yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Preferably,
When block of speech is extracted, after eliminating DC component, extract analog voice signal time domain LPC coefficient maximum value and
The average amplitude of frequency domain is poor;
Mode in setting network output valveJudge again for the one-dimensional vector that threshold value exports network, is higher than
It is determined as voice segments equal to this threshold value, is determined as non-speech segment less than this threshold value;
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted
The number of layer neuron is 2;Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output
The number of layer neuron is 1;Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer;Each layer of neuron state is only
Under the influence of one layer of neuron state;If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer
Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated
Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold
Value makes convolutional neural networks prediction output approach desired output..
The present invention compared with prior art, has the following advantages:
The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall
Rate significantly improves the efficiency that user obtains knowledge.
Description of the drawings
Fig. 1 is the flow chart of the high speed information processing method according to the ... of the embodiment of the present invention based on big data.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention
It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right
Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of high speed information processing method based on big data.Fig. 1 is according to the present invention
The high speed information processing method flow diagram based on big data of embodiment.
The present invention obtains primary voice data first, carries out following operation:
1) it is multiple audio blocks by phonetic segmentation, each audio block includes multiframe, extracts the statistical nature of each frame audio, shape
At corresponding feature vector;
2) training efficient voice block multilayer convolutional neural networks, obtain the polynomial repressentation of each frame;
3) m audio block is initial results before choosing, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;
4) convolutional neural networks parameter is updated according to next audio block, while the audio block is rebuild and calculates reconstruction
The audio block is added in summary speech library if error is more than given threshold and updates the library by error;
5) according to the new audio block of step 4) successively online processing until terminating, newer summary speech data are to generate
Summary speech data.
The statistical nature of each frame audio of extraction described in step 1) forms individual features vector, specifically:
1) it sets raw tone and is uniformly divided into n audio block, i.e., each audio block includes t frame audios, and each frame audio is converted
At unified code check and keep crude sampling rate;
2) local feature of each frame, including zero-crossing rate average amplitude difference and LPC coefficient are extracted;
3) the above-mentioned audio frequency characteristics for sequentially combining each frame form the feature vector that dimension is nf.
Initial training efficient voice block multilayer convolutional neural networks in step 2) obtain the polynomial repressentation of each frame, specifically
It is:
Utilize denoising encoder initial training multilayer convolutional neural networks;
A, each frame audio is proceeded as follows in each layer:First, by adding Gaussian noise, setting input variable at random
Each frame audio-frequency noise is generated for arbitrary value;Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
B, update is adjusted to each layer parameter of convolutional neural networks;
Summary speech data are rebuild in step 3), specifically:
1) summary speech data by raw tone preceding m audio block set at, m is positive integer, i.e., shared m × t frame audios,
XkCorresponding k-th of original audio block;Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,
Yk,Ym, YkThe polynomial repressentation of corresponding k-th of audio block;
2) initial speech library D is set by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If rebuilding system
Number is C, and element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkIt is k-th corresponding
Audio block coefficient, ciCorresponding i-th frame voice;
3) initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F
(Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library
It builds.Specially:First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned mesh
Scalar functions become the convex function of parameter D, and iteration alternately updates two parameters.
Convolutional neural networks parameter is updated according to next audio block and the audio block is rebuild and counted in step 4)
Reconstruction error is calculated, specifically:
1) each frame audio of the audio block is done as follows successively:
A. the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
B. the parameter of other layers in BP algorithm update convolutional neural networks is utilized;
2) polynomial repressentation of each frame audio is updated according to new parameter;
3) it is based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block Xk's
Polynomial repressentation YkIt is rebuild, the specific steps are:First minimize function of many variables F (Yk,Ck, D) and obtain optimal reconstructed coefficients then
First item, that is, l of substitution2In norm and to calculate its value be current reconstruction error ε.
Current audio block is added in summary speech library and is updated the library if error is more than given threshold in step 4),
Specifically:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will
Current audio block XkIt is added in summary speech library S.
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into
yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Wherein, when block of speech is extracted, the present invention extracts the LPC coefficient maximum value and frequency of analog voice signal time domain first
The average amplitude in domain is poor, and the feature of extraction is then formed bivector as the input of convolutional neural networks, utilizes nerve net
The output of network judges whether signal is analog voice signal.
After eliminating DC component, the LPC coefficient maximum value of voice and average amplitude difference are extracted.Setting network output valve is worked as
In modeJudge again for the one-dimensional vector that threshold value exports network, is determined as voice optionally greater than this threshold value
Section, is determined as non-speech segment less than this threshold value.
Extract two features of LPC coefficient maximum value and average amplitude difference of analog voice signal.Analog voice signal s (n)
LPC coefficient Rw(k):
In formula, sw(n) it is adding window voice;N is efficient voice block length;K is retardation;
To sw(n) it is maximized, you can obtain LPC coefficient maximum value.
The average amplitude difference Ω of analog voice signal s (n) is given by:
In formula, N is frame length;S (k) is the FFT transform of s (n);E is the mean value of analog voice signal frequency domain amplitude.
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted
The number of layer neuron is 2.Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output
The number of layer neuron is 1.Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer.Each layer of neuron state is only
Under the influence of one layer of neuron state.If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer
Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated
Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold
Value makes convolutional neural networks prediction output approach desired output.
Using the initial weighting coefficients and threshold value of genetic algorithm optimization convolutional neural networks, including:
(1) individual is encoded using coefficient coding, each individual is a numeric string, by input layer and hidden layer
It connects weighting coefficient, hidden layer threshold value, hidden layer and output layer connection weighting coefficient and 4 part of output layer threshold value forms;It is a
Body contains neural network whole weighting coefficient and threshold value, known to network structure, so that it may with constitute a structure,
The neural network that weighting coefficient, threshold value determine.
(2) initial weighting coefficients and threshold value of convolutional neural networks are obtained according to individual, with training data training convolutional god
Forecasting system exports after network, by the Error Absolute Value between prediction output and desired output and as ideal adaptation angle value,
That is fitness function F is set as:
N is convolutional neural networks output node number;yiFor the desired output of i-th of node of convolutional neural networks;oiFor convolution
The prediction of i-th of node of neural network exports;K is predefined coefficient.
(3) selection strategy based on the ratio of adaptation, the select probability p of each individual iiFor:
fi=k/Fi
In formula, FiFor the fitness value of individual i;
K is coefficient, takes 10 here;
N is population at individual quantity, takes 10 here.
(4) coefficient interior extrapolation method, k-th of chromosome a are used in crossover operationkWith first of chromosome alIn j intersections
Operating method is as follows:
akj=akj(1-b)+aljb
alj=alj(1-b)+akjb
In formula, random numbers of the b between [0,1].
(5) j-th of gene a of i-th of individual is chosenijInto row variation:
aij=aij+(aij-amax)*f(g) r>0.5
aij=aij+(amin-aij)*f(g) r≤0.5
In formula, amax, aminIt is respectively gene aijThe upper bound and lower bound;
F (g)=r2(1-g/Gmax)2;
Wherein r2For random number;G is current iteration number;GmaxFor maximum evolution number;R takes the random number between [0,1].
Voice critical points detection realizes that steps are as follows:
1) the convolutional neural networks structure for using 2-5-1, extracts the LPC coefficient maximum value of raw tone and average width first
Poor two features are spent, using the bivector as the input of neural network.Judge whether the frame is efficient voice using output layer
Block, random initializtion weighting coefficient and threshold value.
2) original audio block is randomly choosed, the classification of every frame signal is marked, if voice is then labeled as 1, if not then
Labeled as 0.Extract the LPC coefficient maximum value of the section audio block and average amplitude difference respectively, formed a two-dimensional feature to
Amount, the input vector as convolutional neural networks.
3) training sample is inputted into convolutional neural networks to train the parameter of network, and convolutional neural networks is carried out excellent
Change, the error between network output valve and desired value is made to reach preset standard.
4) the LPC coefficient maximum value of each audio block and average amplitude difference are extracted respectively, form a two-dimensional feature
Vector is tested as test sample input convolutional neural networks.Improved threshold value T, i.e. convolutional neural networks are used herein
The mode of all elements, is then determined as efficient voice block more than or equal to T in 1 dimensional vector of output;Less than T, it is determined that have to be non-
Imitate block of speech.The output valve of convolutional neural networks is compared with the value marked in advance, if accuracy is relatively low, to network into
Row re -training.
5) determine whether voice segments using the output valve of network.
Block of speech input signal vector is X (n)=[x1(n),x2(n),..xM(n)]T, then X (n) filtered by speech enhan-cement
The y (n) obtained after wave device is expressed as:
Y (n)=coef [β1x(n)+β2x(n)+…+βnb+1x(n)]
In formula, B=[β1,β2,…βnb+1] it is filter weighting coefficients coefficient vector, coef is auto-adaptive parameter
Then this auto-adaptive parameter coef is introduced into ILMS sef-adapting filter models and block of speech is carried out at denoising
Reason.SNR is calculated to the voice after denoisingi,SNRiCorresponding coef values are final coef training output valves when being maximized:
SNRi=∏snr(fLM(coefi,s(n)))
In formula, coefiThe natural number for being 1 for step-length;S (n) is block of speech to be reinforced;fLMFor adaptive filter algorithm letter
Number, according to coefiValue to s (n) carry out speech de-noising enhancing;∏snr() is the function for calculating segmental signal-to-noise ratio;
To SNRiMaximizing, and the subscript i corresponding to maximum value is assigned to coef:
Coef=argmax (SNR1,SNR2,,…)
In formula, argmax is the lower target function sought corresponding to maximum value.
Finally, in adaptive noise filter, according to coef values, each block of speech to be reinforced is enhanced.It is based on
Enhanced block of speech carries out speech-to-text conversion.
After block of speech is carried out text identification, so that it may be extracted automatically with carrying out summary speech, the invention firstly uses convolution
The feature vector of neural network algorithm training characteristics word, and then similarity between sentence is accurately calculated, iterative calculation update sentence weighting system
Number is then based between sentence similarity to eliminate the information redundancy simplified in voice, specifically includes following steps:
1, the feature vector that Feature Words are obtained using convolutional neural networks model training morpheme is indicated:From big data storage
Acquisition morpheme collection simultaneously pre-processes the morpheme collection, and the pretreatment includes carrying out subordinate sentence processing to the morpheme that morpheme is concentrated,
Obtain training characteristics morpheme collection;Training parameter is set, is integrated as training data using training characteristics morpheme, to convolutional neural networks model
It is trained, is exported in the form of feature vector as Feature Words by training using each word for concentrating training characteristics morpheme,
Obtain the feature vector representation of Feature Words;
Feature vector for the training characteristics word from a large amount of unstructured voice data indicates that the present invention utilizes current word
Feature vector predicts the feature vector of specified window context.Given feature morpheme w1,w2,w3,…,wTAs training data, mesh
Scalar functions are:
Wherein, c is the parameter for determining contextual window size, and the c the big, and it is training number to need more training datas, T
According to number.
The present invention distributes shorter path using W word of output layer as leaf node, to high frequency words.Each feature morpheme w
It can be accessed from the root node of tree along unique paths.If n (w, j) is the jth on from root node to the paths w
A node, L (w) are the length of this paths, therefore n (w, 1)=root, n (w, L (w))=w.For any internal node n,
Ch (n) is any child node of node n.Then define:
Wherein functionIt indicates:
After defining above formula, object function is solved using stochastic gradient descent method, the feature vector for ultimately producing word indicates
Form.
S2, it is retrieved in the collected morpheme concentrations of step S1 according to default query word, the block of speech retrieved is made
For candidate blocks collection, subordinate sentence processing is carried out to the candidate blocks collection and removes candidate blocks concentrating the sentence repeated, obtain candidate blocks collection;
Wherein SiFor the arbitrary sentence in candidate blocks collection S, N is the sum of sentence;Utilize the spy of the obtained Feature Words of step S1
Sign vector is used as the weighting coefficient on side in figure by semantic similarity between calculating sentence, constitutes sentence DAG graph models;
Arbitrary two S are concentrated to candidate blocksiAnd Sj, separately include Feature Words tiAnd tjFeature vectorWithI.e.WithCorresponding feature vector is obtained by the convolutional neural networks model training of step S1, then sentence SiAnd SjBetween semantic similarity
Sim (Si, Sj) formula is:
Wherein, for sentence SiIn Feature Words tiFeature vector It indicates in sentence SjNeutralize Feature Words tiBelong to
In identical part of speech all Feature Words feature vector withMaximum similarity value;|Si| and | Sj| S is indicated respectivelyiAnd SjLength
Degree;
S3, the DAG graph models that step S2 is obtained, according in step S2 average initial weighting coefficients and sentence between semantic phase
Weighting coefficient weight (the S of each node are updated using following formula iteration like degreei), until convergence, to which obtain can be anti-
Reflect the score value of sentence importance:
Wherein d is damped coefficient, and value range is [0,1].
Assoc(Si) indicate and SiConnected sentence set, i.e., with sentence SiSimilarity is more than 0 sentence set, | | Assoc (Si)|
| it is then sentence sum in the set;
Using the similarity matrix that semantic similarity is constituted between the average initial weighting coefficients and sentence of step S2 interior joints,
The weighting coefficient of each node in DAG graph models is iterated to calculate, until convergence.Final each node will obtain a score, be
Summary speech is generated in next step to prepare.
S4 weakens the sentence, i.e., if a sentence has higher similitude with existing sentence in summary set
The maximum and irredundant sentence composition of selection weighting coefficient simplifies set, the specific steps are:
1) what, initialization was empty simplifies voice queue;Using the sentence corresponding to each node in DAG graph models as initial time
Voice queue is simplified in choosing;
2) the sentence weighting coefficient corresponding to each DAG graph models node in voice queue, is simplified to candidate according to step S3
Descending arranges, using the sentence corresponding to each node after sequence as candidate summary statement sequence;
3) it, according to candidate summary statement sequence, primary sentence will be arranged in will be transferred to and simplify in voice queue, to candidate
The remaining sentence simplified in voice queue updates their weighting coefficient using following formula:
Weight(Sj)=Weight (Sj)-ω×Sim(Si,Sj)
Wherein, i ≠ j, ω are the reduction factor, when the sentence of weighting coefficient to be updated is deposited with the sentence simplified in voice queue
In similitude, reduction factor ω is 1.0.Sim(Si,Sj) it is the semantic similarity obtained in step S2;
4) step 2) and 3), is repeated, until to reach preset summary speech long for the sentence collection for simplifying in voice queue
Degree.
Assuming that the sentence sum of raw tone T is m, the rate of simplifying of summary speech is set as λ, needs the digest sentence extracted
Sum is n, then λ=n/m.M is the sentence sum that raw tone identifies.Text is the linear combination of sentence, and sentence is word
Linear combination, and word can be considered the linear combination of morpheme, i.e., can obtain the important journey of sentence indirectly by the significance level of morpheme
Degree.Therefore, as follows based on the predefined summary speech extraction process for simplifying rate:
1. the significance level of each node in morpheme network is calculated, with the average value of the significance level of each morpheme in sentence
Instead of the significance level of corresponding sentence, the sentence cluster S={ S with significance level are thus obtained1,S2…,Sm}:
w(ni)tIt is w (ni) the t times iteration, ε is attenuation factor, C (ni) it is morpheme set, each morpheme in the set
With node niThe morpheme of expression all exists while relationship occurs;Coexsit(ni,nj) it is morpheme nodes niAnd njIt is representative
Morpheme while occurrence rate;N is the sum of morpheme included in morpheme network.
2. distich cluster S carries out multi-field division;Assuming that the sentence cluster of k subdomains is obtained, with language in each subdomains sentence cluster
The synthesis significance level of sentence replaces the significance level of each subdomains sentence cluster, and according to significance level by k subdomains sentence cluster descending
Arrangement, is denoted as MS1,MS2..., MSk(k<M), the sentence in each subdomains sentence cluster is arranged also according to significance level height descending;
3. carrying out above-mentioned de-redundancy processing to the sentence in subdomains sentence cluster.Then rate λ is simplified according to summary speech, from each
Before field sentence cluster is extracted respectively by significance level sequenceA sentence.If;λ × m can be divided exactly by k, then can be obtained and finally want
λ × m of output simplify sentence;If cannot be divided exactly by k, then subordinate clause cluster MS1,MS2..., MS(λ × m%k)It is middle to extract respectivelyIt a sentence and is extracted just nowA sentence forms the summary sentence of voice T together, in this way, just having obtained most
Afterwards simplify a cluster, and be denoted as S '={ S '1,S’2…,S’λ×m};
4. the sentence in set S ' is sequentially output according to original sequence, summary speech is obtained.
Processing for social networks voice data, the present invention is after the sentence and word for identifying speech text, it is preferable that
Two phrases adjacent in each sentence are further synthesized into a word pair, each sentence indicates sequence by a string of words.Word is to combining
Contextual information, be mutually reinforcing importance of the other side as the possibility and whole sentence of keyword, and according to occurring jointly
Word generates summary speech data to extracting summary sentence.
N number of word that can accurately reflect text collection some sub-topics is extracted respectively first to as keyword pair, obtaining one
A keyword is to set.The weights of each word pair can be calculated by following formula:
WTF(bi)=fre (bi)*log2(ifre(bi))
Wherein fre (bi) be word to biWord frequency, that is, biThe frequency occurred in entire text collection.
ifre(bi) it is sentence sum and b occuriSentence quantity ratio.
By all words to according to its WTFIt is worth descending arrangement, then takes top n as keyword pair.
Calculate the distribution matrix of theme and word pair.Every a line is the probability distribution that theme is closed in words pair set in the matrix,
Each element characterizes the word to the significance level relative to the theme.It sums by row to the matrix, using obtained value as every
A word is to the global score closed in theme collection.
Take top n word to constituting keyword to set descending sort word based on this global score.
Based on above-mentioned keyword to set, calculates candidate sentence and word be overlapped in set is entirely closing number with keyword
Keyword is to the ratio in set.
Meanwhile in order to weaken long or too short sentence, regularization is carried out to the score value, and what regularization factors took is to wait
Select the length of sentence itself and numerical value larger in the mean sentence length of sentence set.The candidate sentence score of calculating can formulate definition such as
Under:
Wherein S indicates that candidate sentence, KBS indicate keyword to set, biThe keyword pair as occurred simultaneously.| S | and |
KBS | expression candidate sentence length and keyword are to the size of set respectively, the average length that Avlen is all in sentence set.
The extraction of summary sentence is extracted from the forward sentence that sorts on the basis of introducing similarity threshold to prevent redundancy
M meet the sentence of similarity condition as summary sentence.The flow for extracting summary sentence is as follows:
(1) what initialization was empty simplifies voice queue;And initialize candidate collection;
(2) sentence for taking current sequence the first is as candidate sentence Sc;
(3) when it is empty to simplify voice queue, directly candidate sentence is added to and simplifies voice queue;Otherwise it calculates and waits successively
Select the similarity of sentence Sc and each summary sentence Ss:
Once there are sim (Sc, Ss) > SimtdThe case where, directly turn (5), wherein SimtdFor similarity threshold;
(4) candidate sentence is added to and simplifies voice queue;
(5) current candidate sentence is removed from Candidate Set;
(6) if the sentence number simplified in voice queue is less than preset quantity M, turn (1), otherwise turn (7);
(7) voice queue is simplified in output.
Wherein, it if summary sentence includes temporal information, chronologically combines;If a plurality of summary sentence belongs in morpheme
Same subject, then according in raw tone statement sequence combine.
In conclusion the method for the present invention is based on the processing of voice big data, noise immunity is stronger, and accuracy rate higher has more
High recall rate significantly improves the efficiency that user obtains knowledge.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed
Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored
It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's
Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing
Change example.
Claims (5)
1. a kind of high speed information processing method based on big data, which is characterized in that including:
To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
The parameter of other layers in convolutional neural networks is updated using BP algorithm;
The polynomial repressentation of each frame audio is updated according to new parameter;
Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkPolynomial table
Show YkIt is rebuild.
2. according to the method described in claim 1, it is characterized in that, described to current audio block XkPolynomial repressentation YkIt carries out
It rebuilds, further comprises:
First minimize function of many variables F (Yk,Ck, D) and obtain then first item i.e. l that optimal reconstructed coefficients substitute into2In norm and calculate
Its value is current reconstruction error ε.
3. according to the method described in claim 1, it is characterized in that, further including:
If the error ε is more than given threshold, current audio block is added in summary speech library and updates the library.
4. according to the method described in claim 2, it is characterized in that, in the library by current audio block addition summary speech and more
The new library, further comprises:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will be current
Audio block XkIt is added in summary speech library S;
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into yq, then
Use Yk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
5. according to the method described in claim 2, it is characterized in that, further including:
When block of speech is extracted, after eliminating DC component, the LPC coefficient maximum value and frequency domain of analog voice signal time domain are extracted
Average amplitude it is poor;
Mode in setting network output valveJudge again for the one-dimensional vector that threshold value exports network, is optionally greater than
This threshold value is determined as voice segments, is determined as non-speech segment less than this threshold value;
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, i.e. input layer god
Number through member is 2;Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output layer god
Number through member is 1;Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer;Each layer of neuron state only influences
Next layer of neuron state;If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is the connection of hidden layer and output layer
Weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents output
Layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold value
Convolutional neural networks prediction output is set to approach desired output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810161849.5A CN108417206A (en) | 2018-02-27 | 2018-02-27 | High speed information processing method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810161849.5A CN108417206A (en) | 2018-02-27 | 2018-02-27 | High speed information processing method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108417206A true CN108417206A (en) | 2018-08-17 |
Family
ID=63129113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810161849.5A Pending CN108417206A (en) | 2018-02-27 | 2018-02-27 | High speed information processing method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108417206A (en) |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1819017A (en) * | 2004-12-13 | 2006-08-16 | Lg电子株式会社 | Method for extracting feature vectors for speech recognition |
CN101303857A (en) * | 2007-11-05 | 2008-11-12 | 华为技术有限公司 | Encoding method and encoder |
CN101393545A (en) * | 2008-11-06 | 2009-03-25 | 新百丽鞋业(深圳)有限公司 | Method for implementing automatic abstracting by utilizing association model |
CN101398814A (en) * | 2007-09-26 | 2009-04-01 | 北京大学 | Method and system for simultaneously abstracting document summarization and key words |
CN102411621A (en) * | 2011-11-22 | 2012-04-11 | 华中师范大学 | Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode |
CN103246687A (en) * | 2012-06-13 | 2013-08-14 | 苏州大学 | Method for automatically abstracting Blog on basis of feature information |
CN103699873A (en) * | 2013-09-22 | 2014-04-02 | 杭州电子科技大学 | Lower-limb flat ground walking gait recognition method based on GA-BP (Genetic Algorithm-Back Propagation) neural network |
CN103699525A (en) * | 2014-01-03 | 2014-04-02 | 江苏金智教育信息技术有限公司 | Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104216875A (en) * | 2014-09-26 | 2014-12-17 | 中国科学院自动化研究所 | Automatic microblog text abstracting method based on unsupervised key bigram extraction |
CN104679730A (en) * | 2015-02-13 | 2015-06-03 | 刘秀磊 | Webpage summarization extraction method and device thereof |
US20150161995A1 (en) * | 2013-12-06 | 2015-06-11 | Nuance Communications, Inc. | Learning front-end speech recognition parameters within neural network training |
CN104778157A (en) * | 2015-03-02 | 2015-07-15 | 华南理工大学 | Multi-document abstract sentence generating method |
CN105320642A (en) * | 2014-06-30 | 2016-02-10 | 中国科学院声学研究所 | Automatic abstract generation method based on concept semantic unit |
CN105611477A (en) * | 2015-12-27 | 2016-05-25 | 北京工业大学 | Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid |
CN106407178A (en) * | 2016-08-25 | 2017-02-15 | 中国科学院计算技术研究所 | Session abstract generation method and device |
CN106446109A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Acquiring method and device for audio file abstract |
CN106709254A (en) * | 2016-12-29 | 2017-05-24 | 天津中科智能识别产业技术研究院有限公司 | Medical diagnostic robot system |
CN106898350A (en) * | 2017-01-16 | 2017-06-27 | 华南理工大学 | A kind of interaction of intelligent industrial robot voice and control method based on deep learning |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
CN107423398A (en) * | 2017-07-26 | 2017-12-01 | 腾讯科技(上海)有限公司 | Exchange method, device, storage medium and computer equipment |
-
2018
- 2018-02-27 CN CN201810161849.5A patent/CN108417206A/en active Pending
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1819017A (en) * | 2004-12-13 | 2006-08-16 | Lg电子株式会社 | Method for extracting feature vectors for speech recognition |
CN101398814A (en) * | 2007-09-26 | 2009-04-01 | 北京大学 | Method and system for simultaneously abstracting document summarization and key words |
CN101303857A (en) * | 2007-11-05 | 2008-11-12 | 华为技术有限公司 | Encoding method and encoder |
CN101393545A (en) * | 2008-11-06 | 2009-03-25 | 新百丽鞋业(深圳)有限公司 | Method for implementing automatic abstracting by utilizing association model |
CN102411621A (en) * | 2011-11-22 | 2012-04-11 | 华中师范大学 | Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode |
CN103246687A (en) * | 2012-06-13 | 2013-08-14 | 苏州大学 | Method for automatically abstracting Blog on basis of feature information |
CN103699873A (en) * | 2013-09-22 | 2014-04-02 | 杭州电子科技大学 | Lower-limb flat ground walking gait recognition method based on GA-BP (Genetic Algorithm-Back Propagation) neural network |
US20150161995A1 (en) * | 2013-12-06 | 2015-06-11 | Nuance Communications, Inc. | Learning front-end speech recognition parameters within neural network training |
CN103699525A (en) * | 2014-01-03 | 2014-04-02 | 江苏金智教育信息技术有限公司 | Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text |
CN105320642A (en) * | 2014-06-30 | 2016-02-10 | 中国科学院声学研究所 | Automatic abstract generation method based on concept semantic unit |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104216875A (en) * | 2014-09-26 | 2014-12-17 | 中国科学院自动化研究所 | Automatic microblog text abstracting method based on unsupervised key bigram extraction |
CN104679730A (en) * | 2015-02-13 | 2015-06-03 | 刘秀磊 | Webpage summarization extraction method and device thereof |
CN104778157A (en) * | 2015-03-02 | 2015-07-15 | 华南理工大学 | Multi-document abstract sentence generating method |
CN105611477A (en) * | 2015-12-27 | 2016-05-25 | 北京工业大学 | Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid |
CN106407178A (en) * | 2016-08-25 | 2017-02-15 | 中国科学院计算技术研究所 | Session abstract generation method and device |
CN106446109A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Acquiring method and device for audio file abstract |
CN106709254A (en) * | 2016-12-29 | 2017-05-24 | 天津中科智能识别产业技术研究院有限公司 | Medical diagnostic robot system |
CN106898350A (en) * | 2017-01-16 | 2017-06-27 | 华南理工大学 | A kind of interaction of intelligent industrial robot voice and control method based on deep learning |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
CN107423398A (en) * | 2017-07-26 | 2017-12-01 | 腾讯科技(上海)有限公司 | Exchange method, device, storage medium and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hinton et al. | Improving neural networks by preventing co-adaptation of feature detectors | |
CN110442684A (en) | A kind of class case recommended method based on content of text | |
CN109767759A (en) | End-to-end speech recognition methods based on modified CLDNN structure | |
CN108647226B (en) | Hybrid recommendation method based on variational automatic encoder | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN107608953B (en) | Word vector generation method based on indefinite-length context | |
CN111127146A (en) | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder | |
CN109214004B (en) | Big data processing method based on machine learning | |
CN108268449A (en) | A kind of text semantic label abstracting method based on lexical item cluster | |
CN111816156A (en) | Many-to-many voice conversion method and system based on speaker style feature modeling | |
CN109857457B (en) | Function level embedding representation method in source code learning in hyperbolic space | |
CN109933808A (en) | One kind is based on the decoded neural machine translation method of dynamic configuration | |
CN112232087A (en) | Transformer-based specific aspect emotion analysis method of multi-granularity attention model | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN112884149B (en) | Random sensitivity ST-SM-based deep neural network pruning method and system | |
CN110097096A (en) | A kind of file classification method based on TF-IDF matrix and capsule network | |
CN114120041A (en) | Small sample classification method based on double-pair anti-variation self-encoder | |
CN115689008A (en) | CNN-BilSTM short-term photovoltaic power prediction method and system based on ensemble empirical mode decomposition | |
CN114118369A (en) | Image classification convolution neural network design method based on group intelligent optimization | |
CN116226626A (en) | Multi-source heterogeneous data association method | |
CN108388942A (en) | Information intelligent processing method based on big data | |
CN108417204A (en) | Information security processing method based on big data | |
CN109241298A (en) | Semantic data stores dispatching method | |
CN113920379A (en) | Zero sample image classification method based on knowledge assistance | |
CN116756303A (en) | Automatic generation method and system for multi-topic text abstract |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180817 |
|
RJ01 | Rejection of invention patent application after publication |