CN108417206A - High speed information processing method based on big data - Google Patents

High speed information processing method based on big data Download PDF

Info

Publication number
CN108417206A
CN108417206A CN201810161849.5A CN201810161849A CN108417206A CN 108417206 A CN108417206 A CN 108417206A CN 201810161849 A CN201810161849 A CN 201810161849A CN 108417206 A CN108417206 A CN 108417206A
Authority
CN
China
Prior art keywords
layer
sentence
library
audio block
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810161849.5A
Other languages
Chinese (zh)
Inventor
王兰鹰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Songyuan Cloud Technology Co Ltd
Original Assignee
Sichuan Songyuan Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Songyuan Cloud Technology Co Ltd filed Critical Sichuan Songyuan Cloud Technology Co Ltd
Priority to CN201810161849.5A priority Critical patent/CN108417206A/en
Publication of CN108417206A publication Critical patent/CN108417206A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The high speed information processing method based on big data that the present invention provides a kind of, this method include:To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;The parameter of other layers in convolutional neural networks is updated using BP algorithm;The polynomial repressentation of each frame audio is updated according to new parameter;Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkPolynomial repressentation YkIt is rebuild.The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall rate, significantly improves the efficiency that user obtains knowledge.

Description

High speed information processing method based on big data
Technical field
The present invention relates to language data process, more particularly to a kind of high speed information processing method based on big data.
Background technology
With the development of scientific and technological progress and network technology, in network, there are magnanimity remembers using media information, such as call Record, wechat voice, minutes face a large amount of audio data, and user needs more rapidly to understand voice messaging, when saving user Between, improve working efficiency.With the fast development of information retrieval technique, voice summary generation technique is also increasingly mature.From initial Method based on word frequency, to machine learning is introduced, performance has greatly improved.Existing scheme generally uses supervised learning side Method is trained training set using disaggregated model, obtains optimal weight vector, and to test set in training set Carry out classification prediction;But supervised learning model is relied on, labeled data is needed, usually by manually marking realization, is taken very much, and have There is subjectivity, and be easy to ignore the semantic similarity between sentence, reduces the accuracy of result of calculation.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes at a kind of high speed information based on big data Reason method, including:
To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset are updated Measure b;
The parameter of other layers in convolutional neural networks is updated using BP algorithm;
The polynomial repressentation of each frame audio is updated according to new parameter;
Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkIt is more Item formula indicates YkIt is rebuild.
Preferably, described to current audio block XkPolynomial repressentation YkIt is rebuild, is further comprised:
First minimize function of many variables F (Yk,Ck, D) and obtain then first item i.e. l that optimal reconstructed coefficients substitute into2In norm simultaneously It is current reconstruction error ε to calculate its value.
Preferably, if the error ε is more than given threshold, current audio block is added in summary speech library and is updated should Library.
Preferably, described that current audio block is added in summary speech library and updates the library, further comprise:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will Current audio block XkIt is added in summary speech library S;
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Preferably,
When block of speech is extracted, after eliminating DC component, extract analog voice signal time domain LPC coefficient maximum value and The average amplitude of frequency domain is poor;
Mode in setting network output valveJudge again for the one-dimensional vector that threshold value exports network, is higher than It is determined as voice segments equal to this threshold value, is determined as non-speech segment less than this threshold value;
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted The number of layer neuron is 2;Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output The number of layer neuron is 1;Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer;Each layer of neuron state is only Under the influence of one layer of neuron state;If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold Value makes convolutional neural networks prediction output approach desired output..
The present invention compared with prior art, has the following advantages:
The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall Rate significantly improves the efficiency that user obtains knowledge.
Description of the drawings
Fig. 1 is the flow chart of the high speed information processing method according to the ... of the embodiment of the present invention based on big data.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of high speed information processing method based on big data.Fig. 1 is according to the present invention The high speed information processing method flow diagram based on big data of embodiment.
The present invention obtains primary voice data first, carries out following operation:
1) it is multiple audio blocks by phonetic segmentation, each audio block includes multiframe, extracts the statistical nature of each frame audio, shape At corresponding feature vector;
2) training efficient voice block multilayer convolutional neural networks, obtain the polynomial repressentation of each frame;
3) m audio block is initial results before choosing, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;
4) convolutional neural networks parameter is updated according to next audio block, while the audio block is rebuild and calculates reconstruction The audio block is added in summary speech library if error is more than given threshold and updates the library by error;
5) according to the new audio block of step 4) successively online processing until terminating, newer summary speech data are to generate Summary speech data.
The statistical nature of each frame audio of extraction described in step 1) forms individual features vector, specifically:
1) it sets raw tone and is uniformly divided into n audio block, i.e., each audio block includes t frame audios, and each frame audio is converted At unified code check and keep crude sampling rate;
2) local feature of each frame, including zero-crossing rate average amplitude difference and LPC coefficient are extracted;
3) the above-mentioned audio frequency characteristics for sequentially combining each frame form the feature vector that dimension is nf.
Initial training efficient voice block multilayer convolutional neural networks in step 2) obtain the polynomial repressentation of each frame, specifically It is:
Utilize denoising encoder initial training multilayer convolutional neural networks;
A, each frame audio is proceeded as follows in each layer:First, by adding Gaussian noise, setting input variable at random Each frame audio-frequency noise is generated for arbitrary value;Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
B, update is adjusted to each layer parameter of convolutional neural networks;
Summary speech data are rebuild in step 3), specifically:
1) summary speech data by raw tone preceding m audio block set at, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th of original audio block;Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…, Yk,Ym, YkThe polynomial repressentation of corresponding k-th of audio block;
2) initial speech library D is set by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If rebuilding system Number is C, and element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkIt is k-th corresponding Audio block coefficient, ciCorresponding i-th frame voice;
3) initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library It builds.Specially:First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned mesh Scalar functions become the convex function of parameter D, and iteration alternately updates two parameters.
Convolutional neural networks parameter is updated according to next audio block and the audio block is rebuild and counted in step 4) Reconstruction error is calculated, specifically:
1) each frame audio of the audio block is done as follows successively:
A. the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
B. the parameter of other layers in BP algorithm update convolutional neural networks is utilized;
2) polynomial repressentation of each frame audio is updated according to new parameter;
3) it is based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block Xk's Polynomial repressentation YkIt is rebuild, the specific steps are:First minimize function of many variables F (Yk,Ck, D) and obtain optimal reconstructed coefficients then First item, that is, l of substitution2In norm and to calculate its value be current reconstruction error ε.
Current audio block is added in summary speech library and is updated the library if error is more than given threshold in step 4), Specifically:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will Current audio block XkIt is added in summary speech library S.
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Wherein, when block of speech is extracted, the present invention extracts the LPC coefficient maximum value and frequency of analog voice signal time domain first The average amplitude in domain is poor, and the feature of extraction is then formed bivector as the input of convolutional neural networks, utilizes nerve net The output of network judges whether signal is analog voice signal.
After eliminating DC component, the LPC coefficient maximum value of voice and average amplitude difference are extracted.Setting network output valve is worked as In modeJudge again for the one-dimensional vector that threshold value exports network, is determined as voice optionally greater than this threshold value Section, is determined as non-speech segment less than this threshold value.
Extract two features of LPC coefficient maximum value and average amplitude difference of analog voice signal.Analog voice signal s (n) LPC coefficient Rw(k):
In formula, sw(n) it is adding window voice;N is efficient voice block length;K is retardation;
To sw(n) it is maximized, you can obtain LPC coefficient maximum value.
The average amplitude difference Ω of analog voice signal s (n) is given by:
In formula, N is frame length;S (k) is the FFT transform of s (n);E is the mean value of analog voice signal frequency domain amplitude.
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted The number of layer neuron is 2.Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output The number of layer neuron is 1.Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer.Each layer of neuron state is only Under the influence of one layer of neuron state.If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold Value makes convolutional neural networks prediction output approach desired output.
Using the initial weighting coefficients and threshold value of genetic algorithm optimization convolutional neural networks, including:
(1) individual is encoded using coefficient coding, each individual is a numeric string, by input layer and hidden layer It connects weighting coefficient, hidden layer threshold value, hidden layer and output layer connection weighting coefficient and 4 part of output layer threshold value forms;It is a Body contains neural network whole weighting coefficient and threshold value, known to network structure, so that it may with constitute a structure, The neural network that weighting coefficient, threshold value determine.
(2) initial weighting coefficients and threshold value of convolutional neural networks are obtained according to individual, with training data training convolutional god Forecasting system exports after network, by the Error Absolute Value between prediction output and desired output and as ideal adaptation angle value, That is fitness function F is set as:
N is convolutional neural networks output node number;yiFor the desired output of i-th of node of convolutional neural networks;oiFor convolution The prediction of i-th of node of neural network exports;K is predefined coefficient.
(3) selection strategy based on the ratio of adaptation, the select probability p of each individual iiFor:
fi=k/Fi
In formula, FiFor the fitness value of individual i;
K is coefficient, takes 10 here;
N is population at individual quantity, takes 10 here.
(4) coefficient interior extrapolation method, k-th of chromosome a are used in crossover operationkWith first of chromosome alIn j intersections Operating method is as follows:
akj=akj(1-b)+aljb
alj=alj(1-b)+akjb
In formula, random numbers of the b between [0,1].
(5) j-th of gene a of i-th of individual is chosenijInto row variation:
aij=aij+(aij-amax)*f(g) r>0.5
aij=aij+(amin-aij)*f(g) r≤0.5
In formula, amax, aminIt is respectively gene aijThe upper bound and lower bound;
F (g)=r2(1-g/Gmax)2
Wherein r2For random number;G is current iteration number;GmaxFor maximum evolution number;R takes the random number between [0,1].
Voice critical points detection realizes that steps are as follows:
1) the convolutional neural networks structure for using 2-5-1, extracts the LPC coefficient maximum value of raw tone and average width first Poor two features are spent, using the bivector as the input of neural network.Judge whether the frame is efficient voice using output layer Block, random initializtion weighting coefficient and threshold value.
2) original audio block is randomly choosed, the classification of every frame signal is marked, if voice is then labeled as 1, if not then Labeled as 0.Extract the LPC coefficient maximum value of the section audio block and average amplitude difference respectively, formed a two-dimensional feature to Amount, the input vector as convolutional neural networks.
3) training sample is inputted into convolutional neural networks to train the parameter of network, and convolutional neural networks is carried out excellent Change, the error between network output valve and desired value is made to reach preset standard.
4) the LPC coefficient maximum value of each audio block and average amplitude difference are extracted respectively, form a two-dimensional feature Vector is tested as test sample input convolutional neural networks.Improved threshold value T, i.e. convolutional neural networks are used herein The mode of all elements, is then determined as efficient voice block more than or equal to T in 1 dimensional vector of output;Less than T, it is determined that have to be non- Imitate block of speech.The output valve of convolutional neural networks is compared with the value marked in advance, if accuracy is relatively low, to network into Row re -training.
5) determine whether voice segments using the output valve of network.
Block of speech input signal vector is X (n)=[x1(n),x2(n),..xM(n)]T, then X (n) filtered by speech enhan-cement The y (n) obtained after wave device is expressed as:
Y (n)=coef [β1x(n)+β2x(n)+…+βnb+1x(n)]
In formula, B=[β12,…βnb+1] it is filter weighting coefficients coefficient vector, coef is auto-adaptive parameter
Then this auto-adaptive parameter coef is introduced into ILMS sef-adapting filter models and block of speech is carried out at denoising Reason.SNR is calculated to the voice after denoisingi,SNRiCorresponding coef values are final coef training output valves when being maximized:
SNRi=∏snr(fLM(coefi,s(n)))
In formula, coefiThe natural number for being 1 for step-length;S (n) is block of speech to be reinforced;fLMFor adaptive filter algorithm letter Number, according to coefiValue to s (n) carry out speech de-noising enhancing;∏snr() is the function for calculating segmental signal-to-noise ratio;
To SNRiMaximizing, and the subscript i corresponding to maximum value is assigned to coef:
Coef=argmax (SNR1,SNR2,,…)
In formula, argmax is the lower target function sought corresponding to maximum value.
Finally, in adaptive noise filter, according to coef values, each block of speech to be reinforced is enhanced.It is based on Enhanced block of speech carries out speech-to-text conversion.
After block of speech is carried out text identification, so that it may be extracted automatically with carrying out summary speech, the invention firstly uses convolution The feature vector of neural network algorithm training characteristics word, and then similarity between sentence is accurately calculated, iterative calculation update sentence weighting system Number is then based between sentence similarity to eliminate the information redundancy simplified in voice, specifically includes following steps:
1, the feature vector that Feature Words are obtained using convolutional neural networks model training morpheme is indicated:From big data storage Acquisition morpheme collection simultaneously pre-processes the morpheme collection, and the pretreatment includes carrying out subordinate sentence processing to the morpheme that morpheme is concentrated, Obtain training characteristics morpheme collection;Training parameter is set, is integrated as training data using training characteristics morpheme, to convolutional neural networks model It is trained, is exported in the form of feature vector as Feature Words by training using each word for concentrating training characteristics morpheme, Obtain the feature vector representation of Feature Words;
Feature vector for the training characteristics word from a large amount of unstructured voice data indicates that the present invention utilizes current word Feature vector predicts the feature vector of specified window context.Given feature morpheme w1,w2,w3,…,wTAs training data, mesh Scalar functions are:
Wherein, c is the parameter for determining contextual window size, and the c the big, and it is training number to need more training datas, T According to number.
The present invention distributes shorter path using W word of output layer as leaf node, to high frequency words.Each feature morpheme w It can be accessed from the root node of tree along unique paths.If n (w, j) is the jth on from root node to the paths w A node, L (w) are the length of this paths, therefore n (w, 1)=root, n (w, L (w))=w.For any internal node n, Ch (n) is any child node of node n.Then define:
Wherein functionIt indicates:
After defining above formula, object function is solved using stochastic gradient descent method, the feature vector for ultimately producing word indicates Form.
S2, it is retrieved in the collected morpheme concentrations of step S1 according to default query word, the block of speech retrieved is made For candidate blocks collection, subordinate sentence processing is carried out to the candidate blocks collection and removes candidate blocks concentrating the sentence repeated, obtain candidate blocks collection;
Wherein SiFor the arbitrary sentence in candidate blocks collection S, N is the sum of sentence;Utilize the spy of the obtained Feature Words of step S1 Sign vector is used as the weighting coefficient on side in figure by semantic similarity between calculating sentence, constitutes sentence DAG graph models;
Arbitrary two S are concentrated to candidate blocksiAnd Sj, separately include Feature Words tiAnd tjFeature vectorWithI.e.WithCorresponding feature vector is obtained by the convolutional neural networks model training of step S1, then sentence SiAnd SjBetween semantic similarity Sim (Si, Sj) formula is:
Wherein, for sentence SiIn Feature Words tiFeature vector It indicates in sentence SjNeutralize Feature Words tiBelong to In identical part of speech all Feature Words feature vector withMaximum similarity value;|Si| and | Sj| S is indicated respectivelyiAnd SjLength Degree;
S3, the DAG graph models that step S2 is obtained, according in step S2 average initial weighting coefficients and sentence between semantic phase Weighting coefficient weight (the S of each node are updated using following formula iteration like degreei), until convergence, to which obtain can be anti- Reflect the score value of sentence importance:
Wherein d is damped coefficient, and value range is [0,1].
Assoc(Si) indicate and SiConnected sentence set, i.e., with sentence SiSimilarity is more than 0 sentence set, | | Assoc (Si)| | it is then sentence sum in the set;
Using the similarity matrix that semantic similarity is constituted between the average initial weighting coefficients and sentence of step S2 interior joints, The weighting coefficient of each node in DAG graph models is iterated to calculate, until convergence.Final each node will obtain a score, be Summary speech is generated in next step to prepare.
S4 weakens the sentence, i.e., if a sentence has higher similitude with existing sentence in summary set The maximum and irredundant sentence composition of selection weighting coefficient simplifies set, the specific steps are:
1) what, initialization was empty simplifies voice queue;Using the sentence corresponding to each node in DAG graph models as initial time Voice queue is simplified in choosing;
2) the sentence weighting coefficient corresponding to each DAG graph models node in voice queue, is simplified to candidate according to step S3 Descending arranges, using the sentence corresponding to each node after sequence as candidate summary statement sequence;
3) it, according to candidate summary statement sequence, primary sentence will be arranged in will be transferred to and simplify in voice queue, to candidate The remaining sentence simplified in voice queue updates their weighting coefficient using following formula:
Weight(Sj)=Weight (Sj)-ω×Sim(Si,Sj)
Wherein, i ≠ j, ω are the reduction factor, when the sentence of weighting coefficient to be updated is deposited with the sentence simplified in voice queue In similitude, reduction factor ω is 1.0.Sim(Si,Sj) it is the semantic similarity obtained in step S2;
4) step 2) and 3), is repeated, until to reach preset summary speech long for the sentence collection for simplifying in voice queue Degree.
Assuming that the sentence sum of raw tone T is m, the rate of simplifying of summary speech is set as λ, needs the digest sentence extracted Sum is n, then λ=n/m.M is the sentence sum that raw tone identifies.Text is the linear combination of sentence, and sentence is word Linear combination, and word can be considered the linear combination of morpheme, i.e., can obtain the important journey of sentence indirectly by the significance level of morpheme Degree.Therefore, as follows based on the predefined summary speech extraction process for simplifying rate:
1. the significance level of each node in morpheme network is calculated, with the average value of the significance level of each morpheme in sentence Instead of the significance level of corresponding sentence, the sentence cluster S={ S with significance level are thus obtained1,S2…,Sm}:
w(ni)tIt is w (ni) the t times iteration, ε is attenuation factor, C (ni) it is morpheme set, each morpheme in the set With node niThe morpheme of expression all exists while relationship occurs;Coexsit(ni,nj) it is morpheme nodes niAnd njIt is representative Morpheme while occurrence rate;N is the sum of morpheme included in morpheme network.
2. distich cluster S carries out multi-field division;Assuming that the sentence cluster of k subdomains is obtained, with language in each subdomains sentence cluster The synthesis significance level of sentence replaces the significance level of each subdomains sentence cluster, and according to significance level by k subdomains sentence cluster descending Arrangement, is denoted as MS1,MS2..., MSk(k<M), the sentence in each subdomains sentence cluster is arranged also according to significance level height descending;
3. carrying out above-mentioned de-redundancy processing to the sentence in subdomains sentence cluster.Then rate λ is simplified according to summary speech, from each Before field sentence cluster is extracted respectively by significance level sequenceA sentence.If;λ × m can be divided exactly by k, then can be obtained and finally want λ × m of output simplify sentence;If cannot be divided exactly by k, then subordinate clause cluster MS1,MS2..., MS(λ × m%k)It is middle to extract respectivelyIt a sentence and is extracted just nowA sentence forms the summary sentence of voice T together, in this way, just having obtained most Afterwards simplify a cluster, and be denoted as S '={ S '1,S’2…,S’λ×m};
4. the sentence in set S ' is sequentially output according to original sequence, summary speech is obtained.
Processing for social networks voice data, the present invention is after the sentence and word for identifying speech text, it is preferable that Two phrases adjacent in each sentence are further synthesized into a word pair, each sentence indicates sequence by a string of words.Word is to combining Contextual information, be mutually reinforcing importance of the other side as the possibility and whole sentence of keyword, and according to occurring jointly Word generates summary speech data to extracting summary sentence.
N number of word that can accurately reflect text collection some sub-topics is extracted respectively first to as keyword pair, obtaining one A keyword is to set.The weights of each word pair can be calculated by following formula:
WTF(bi)=fre (bi)*log2(ifre(bi))
Wherein fre (bi) be word to biWord frequency, that is, biThe frequency occurred in entire text collection.
ifre(bi) it is sentence sum and b occuriSentence quantity ratio.
By all words to according to its WTFIt is worth descending arrangement, then takes top n as keyword pair.
Calculate the distribution matrix of theme and word pair.Every a line is the probability distribution that theme is closed in words pair set in the matrix, Each element characterizes the word to the significance level relative to the theme.It sums by row to the matrix, using obtained value as every A word is to the global score closed in theme collection.
Take top n word to constituting keyword to set descending sort word based on this global score.
Based on above-mentioned keyword to set, calculates candidate sentence and word be overlapped in set is entirely closing number with keyword Keyword is to the ratio in set.
Meanwhile in order to weaken long or too short sentence, regularization is carried out to the score value, and what regularization factors took is to wait Select the length of sentence itself and numerical value larger in the mean sentence length of sentence set.The candidate sentence score of calculating can formulate definition such as Under:
Wherein S indicates that candidate sentence, KBS indicate keyword to set, biThe keyword pair as occurred simultaneously.| S | and | KBS | expression candidate sentence length and keyword are to the size of set respectively, the average length that Avlen is all in sentence set.
The extraction of summary sentence is extracted from the forward sentence that sorts on the basis of introducing similarity threshold to prevent redundancy M meet the sentence of similarity condition as summary sentence.The flow for extracting summary sentence is as follows:
(1) what initialization was empty simplifies voice queue;And initialize candidate collection;
(2) sentence for taking current sequence the first is as candidate sentence Sc;
(3) when it is empty to simplify voice queue, directly candidate sentence is added to and simplifies voice queue;Otherwise it calculates and waits successively Select the similarity of sentence Sc and each summary sentence Ss:
Once there are sim (Sc, Ss) > SimtdThe case where, directly turn (5), wherein SimtdFor similarity threshold;
(4) candidate sentence is added to and simplifies voice queue;
(5) current candidate sentence is removed from Candidate Set;
(6) if the sentence number simplified in voice queue is less than preset quantity M, turn (1), otherwise turn (7);
(7) voice queue is simplified in output.
Wherein, it if summary sentence includes temporal information, chronologically combines;If a plurality of summary sentence belongs in morpheme Same subject, then according in raw tone statement sequence combine.
In conclusion the method for the present invention is based on the processing of voice big data, noise immunity is stronger, and accuracy rate higher has more High recall rate significantly improves the efficiency that user obtains knowledge.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims (5)

1. a kind of high speed information processing method based on big data, which is characterized in that including:
To each frame audio of audio block, the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
The parameter of other layers in convolutional neural networks is updated using BP algorithm;
The polynomial repressentation of each frame audio is updated according to new parameter;
Based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block XkPolynomial table Show YkIt is rebuild.
2. according to the method described in claim 1, it is characterized in that, described to current audio block XkPolynomial repressentation YkIt carries out It rebuilds, further comprises:
First minimize function of many variables F (Yk,Ck, D) and obtain then first item i.e. l that optimal reconstructed coefficients substitute into2In norm and calculate Its value is current reconstruction error ε.
3. according to the method described in claim 1, it is characterized in that, further including:
If the error ε is more than given threshold, current audio block is added in summary speech library and updates the library.
4. according to the method described in claim 2, it is characterized in that, in the library by current audio block addition summary speech and more The new library, further comprises:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will be current Audio block XkIt is added in summary speech library S;
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into yq, then Use Yk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
5. according to the method described in claim 2, it is characterized in that, further including:
When block of speech is extracted, after eliminating DC component, the LPC coefficient maximum value and frequency domain of analog voice signal time domain are extracted Average amplitude it is poor;
Mode in setting network output valveJudge again for the one-dimensional vector that threshold value exports network, is optionally greater than This threshold value is determined as voice segments, is determined as non-speech segment less than this threshold value;
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, i.e. input layer god Number through member is 2;Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output layer god Number through member is 1;Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer;Each layer of neuron state only influences Next layer of neuron state;If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is the connection of hidden layer and output layer Weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents output Layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold value Convolutional neural networks prediction output is set to approach desired output.
CN201810161849.5A 2018-02-27 2018-02-27 High speed information processing method based on big data Pending CN108417206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810161849.5A CN108417206A (en) 2018-02-27 2018-02-27 High speed information processing method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810161849.5A CN108417206A (en) 2018-02-27 2018-02-27 High speed information processing method based on big data

Publications (1)

Publication Number Publication Date
CN108417206A true CN108417206A (en) 2018-08-17

Family

ID=63129113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810161849.5A Pending CN108417206A (en) 2018-02-27 2018-02-27 High speed information processing method based on big data

Country Status (1)

Country Link
CN (1) CN108417206A (en)

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819017A (en) * 2004-12-13 2006-08-16 Lg电子株式会社 Method for extracting feature vectors for speech recognition
CN101303857A (en) * 2007-11-05 2008-11-12 华为技术有限公司 Encoding method and encoder
CN101393545A (en) * 2008-11-06 2009-03-25 新百丽鞋业(深圳)有限公司 Method for implementing automatic abstracting by utilizing association model
CN101398814A (en) * 2007-09-26 2009-04-01 北京大学 Method and system for simultaneously abstracting document summarization and key words
CN102411621A (en) * 2011-11-22 2012-04-11 华中师范大学 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode
CN103246687A (en) * 2012-06-13 2013-08-14 苏州大学 Method for automatically abstracting Blog on basis of feature information
CN103699873A (en) * 2013-09-22 2014-04-02 杭州电子科技大学 Lower-limb flat ground walking gait recognition method based on GA-BP (Genetic Algorithm-Back Propagation) neural network
CN103699525A (en) * 2014-01-03 2014-04-02 江苏金智教育信息技术有限公司 Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN104679730A (en) * 2015-02-13 2015-06-03 刘秀磊 Webpage summarization extraction method and device thereof
US20150161995A1 (en) * 2013-12-06 2015-06-11 Nuance Communications, Inc. Learning front-end speech recognition parameters within neural network training
CN104778157A (en) * 2015-03-02 2015-07-15 华南理工大学 Multi-document abstract sentence generating method
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN106407178A (en) * 2016-08-25 2017-02-15 中国科学院计算技术研究所 Session abstract generation method and device
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract
CN106709254A (en) * 2016-12-29 2017-05-24 天津中科智能识别产业技术研究院有限公司 Medical diagnostic robot system
CN106898350A (en) * 2017-01-16 2017-06-27 华南理工大学 A kind of interaction of intelligent industrial robot voice and control method based on deep learning
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN107423398A (en) * 2017-07-26 2017-12-01 腾讯科技(上海)有限公司 Exchange method, device, storage medium and computer equipment

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819017A (en) * 2004-12-13 2006-08-16 Lg电子株式会社 Method for extracting feature vectors for speech recognition
CN101398814A (en) * 2007-09-26 2009-04-01 北京大学 Method and system for simultaneously abstracting document summarization and key words
CN101303857A (en) * 2007-11-05 2008-11-12 华为技术有限公司 Encoding method and encoder
CN101393545A (en) * 2008-11-06 2009-03-25 新百丽鞋业(深圳)有限公司 Method for implementing automatic abstracting by utilizing association model
CN102411621A (en) * 2011-11-22 2012-04-11 华中师范大学 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode
CN103246687A (en) * 2012-06-13 2013-08-14 苏州大学 Method for automatically abstracting Blog on basis of feature information
CN103699873A (en) * 2013-09-22 2014-04-02 杭州电子科技大学 Lower-limb flat ground walking gait recognition method based on GA-BP (Genetic Algorithm-Back Propagation) neural network
US20150161995A1 (en) * 2013-12-06 2015-06-11 Nuance Communications, Inc. Learning front-end speech recognition parameters within neural network training
CN103699525A (en) * 2014-01-03 2014-04-02 江苏金智教育信息技术有限公司 Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN104679730A (en) * 2015-02-13 2015-06-03 刘秀磊 Webpage summarization extraction method and device thereof
CN104778157A (en) * 2015-03-02 2015-07-15 华南理工大学 Multi-document abstract sentence generating method
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN106407178A (en) * 2016-08-25 2017-02-15 中国科学院计算技术研究所 Session abstract generation method and device
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract
CN106709254A (en) * 2016-12-29 2017-05-24 天津中科智能识别产业技术研究院有限公司 Medical diagnostic robot system
CN106898350A (en) * 2017-01-16 2017-06-27 华南理工大学 A kind of interaction of intelligent industrial robot voice and control method based on deep learning
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN107423398A (en) * 2017-07-26 2017-12-01 腾讯科技(上海)有限公司 Exchange method, device, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
Hinton et al. Improving neural networks by preventing co-adaptation of feature detectors
CN110442684A (en) A kind of class case recommended method based on content of text
CN109767759A (en) End-to-end speech recognition methods based on modified CLDNN structure
CN108647226B (en) Hybrid recommendation method based on variational automatic encoder
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN107608953B (en) Word vector generation method based on indefinite-length context
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN109214004B (en) Big data processing method based on machine learning
CN108268449A (en) A kind of text semantic label abstracting method based on lexical item cluster
CN111816156A (en) Many-to-many voice conversion method and system based on speaker style feature modeling
CN109857457B (en) Function level embedding representation method in source code learning in hyperbolic space
CN109933808A (en) One kind is based on the decoded neural machine translation method of dynamic configuration
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112884149B (en) Random sensitivity ST-SM-based deep neural network pruning method and system
CN110097096A (en) A kind of file classification method based on TF-IDF matrix and capsule network
CN114120041A (en) Small sample classification method based on double-pair anti-variation self-encoder
CN115689008A (en) CNN-BilSTM short-term photovoltaic power prediction method and system based on ensemble empirical mode decomposition
CN114118369A (en) Image classification convolution neural network design method based on group intelligent optimization
CN116226626A (en) Multi-source heterogeneous data association method
CN108388942A (en) Information intelligent processing method based on big data
CN108417204A (en) Information security processing method based on big data
CN109241298A (en) Semantic data stores dispatching method
CN113920379A (en) Zero sample image classification method based on knowledge assistance
CN116756303A (en) Automatic generation method and system for multi-topic text abstract

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180817

RJ01 Rejection of invention patent application after publication