CN108388942A - Information intelligent processing method based on big data - Google Patents
Information intelligent processing method based on big data Download PDFInfo
- Publication number
- CN108388942A CN108388942A CN201810163995.1A CN201810163995A CN108388942A CN 108388942 A CN108388942 A CN 108388942A CN 201810163995 A CN201810163995 A CN 201810163995A CN 108388942 A CN108388942 A CN 108388942A
- Authority
- CN
- China
- Prior art keywords
- sentence
- convolutional neural
- audio
- neural networks
- audio block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 7
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 14
- 239000000203 mixture Substances 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 8
- 230000036039 immunity Effects 0.000 abstract description 3
- 239000013598 vector Substances 0.000 description 26
- 230000006870 function Effects 0.000 description 17
- 238000013528 artificial neural network Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 108010074506 Transfer Factor Proteins 0.000 description 1
- 239000011717 all-trans-retinol Substances 0.000 description 1
- FPIPGXGPPPQFEQ-OVSJKPMPSA-N all-trans-retinol Chemical compound OC\C=C(/C)\C=C\C=C(/C)\C=C\C1=C(C)CCCC1(C)C FPIPGXGPPPQFEQ-OVSJKPMPSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013549 information retrieval technique Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The information intelligent processing method based on big data that the present invention provides a kind of, this method include:The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data;It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;Convolutional neural networks parameter is updated according to next audio block, while reconstruction error is rebuild and calculated to the audio block, if error is more than given threshold, which is added summary speech data.The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall rate, significantly improves the efficiency that user obtains knowledge.
Description
Technical field
The present invention relates to language data process, more particularly to a kind of information intelligent processing method based on big data.
Background technology
With the development of scientific and technological progress and network technology, in network, there are magnanimity remembers using media information, such as call
Record, wechat voice, minutes face a large amount of audio data, and user needs more rapidly to understand voice messaging, when saving user
Between, improve working efficiency.With the fast development of information retrieval technique, voice summary generation technique is also increasingly mature.From initial
Method based on word frequency, to machine learning is introduced, performance has greatly improved.Existing scheme generally uses supervised learning side
Method is trained training set using disaggregated model, obtains optimal weight vector, and to test set in training set
Carry out classification prediction;But supervised learning model is relied on, labeled data is needed, usually by manually marking realization, is taken very much, and have
There is subjectivity, and be easy to ignore the semantic similarity between sentence, reduces the accuracy of result of calculation.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes at a kind of information intelligent based on big data
Reason method, including:
The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data;
It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;According to
Next audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, if error is big
In given threshold, then summary speech data are added in the audio block.
Preferably, in the trained primary voice data efficient voice block multilayer convolutional neural networks, obtain each frame
Polynomial repressentation further comprises:
Utilize denoising encoder initial training multilayer convolutional neural networks;
Each frame audio is proceeded as follows in each layer:
First, by adding Gaussian noise, the random input variable that sets as each frame audio-frequency noise of arbitrary value generation;
Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
Update is adjusted to each layer parameter of convolutional neural networks.
Preferably, the described pair of audio block is rebuild, and is further comprised:
The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th original
Audio block;
Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,Yk,…Ym, YkIt is corresponding
The polynomial repressentation of k-th of audio block;
If initial speech library D is by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If reconstructed coefficients
For C, element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkCorresponding k-th of sound
Frequency block coefficient, ciCorresponding i-th frame voice;
Initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F
(Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library
It builds, specially:
First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned target
Function becomes the convex function of parameter D, and iteration alternately updates two parameters.
The present invention compared with prior art, has the following advantages:
The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall
Rate significantly improves the efficiency that user obtains knowledge.
Description of the drawings
Fig. 1 is the flow chart of the information intelligent processing method according to the ... of the embodiment of the present invention based on big data.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention
It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right
Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of information intelligent processing method based on big data.Fig. 1 is according to the present invention
The information intelligent process flow figure based on big data of embodiment.
The present invention obtains primary voice data first, carries out following operation:
1) it is multiple audio blocks by phonetic segmentation, each audio block includes multiframe, extracts the statistical nature of each frame audio, shape
At corresponding feature vector;
2) training efficient voice block multilayer convolutional neural networks, obtain the polynomial repressentation of each frame;
3) m audio block is initial results before choosing, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;
4) convolutional neural networks parameter is updated according to next audio block, while the audio block is rebuild and calculates reconstruction
The audio block is added in summary speech library if error is more than given threshold and updates the library by error;
5) according to the new audio block of step 4) successively online processing until terminating, newer summary speech data are to generate
Summary speech data.
The statistical nature of each frame audio of extraction described in step 1) forms individual features vector, specifically:
1) it sets raw tone and is uniformly divided into n audio block, i.e., each audio block includes t frame audios, and each frame audio is converted
At unified code check and keep crude sampling rate;
2) local feature of each frame, including zero-crossing rate average amplitude difference and LPC coefficient are extracted;
3) the above-mentioned audio frequency characteristics for sequentially combining each frame form the feature vector that dimension is nf.
Initial training efficient voice block multilayer convolutional neural networks in step 2) obtain the polynomial repressentation of each frame, specifically
It is:
Utilize denoising encoder initial training multilayer convolutional neural networks;
A, each frame audio is proceeded as follows in each layer:First, by adding Gaussian noise, setting input variable at random
Each frame audio-frequency noise is generated for arbitrary value;Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
B, update is adjusted to each layer parameter of convolutional neural networks;
Summary speech data are rebuild in step 3), specifically:
1) summary speech data by raw tone preceding m audio block set at, m is positive integer, i.e., shared m × t frame audios,
XkCorresponding k-th of original audio block;Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,
Yk,Ym, YkThe polynomial repressentation of corresponding k-th of audio block;
2) initial speech library D is set by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If rebuilding system
Number is C, and element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkIt is k-th corresponding
Audio block coefficient, ciCorresponding i-th frame voice;
3) initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F
(Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library
It builds.Specially:First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned mesh
Scalar functions become the convex function of parameter D, and iteration alternately updates two parameters.
Convolutional neural networks parameter is updated according to next audio block and the audio block is rebuild and counted in step 4)
Reconstruction error is calculated, specifically:
1) each frame audio of the audio block is done as follows successively:
A. the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
B. the parameter of other layers in BP algorithm update convolutional neural networks is utilized;
2) polynomial repressentation of each frame audio is updated according to new parameter;
3) it is based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block Xk's
Polynomial repressentation YkIt is rebuild, the specific steps are:First minimize function of many variables F (Yk,Ck, D) and obtain optimal reconstructed coefficients then
First item, that is, l of substitution2In norm and to calculate its value be current reconstruction error ε.
Current audio block is added in summary speech library and is updated the library if error is more than given threshold in step 4),
Specifically:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will
Current audio block XkIt is added in summary speech library S.
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into
yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Wherein, when block of speech is extracted, the present invention extracts the LPC coefficient maximum value and frequency of analog voice signal time domain first
The average amplitude in domain is poor, and the feature of extraction is then formed bivector as the input of convolutional neural networks, utilizes nerve net
The output of network judges whether signal is analog voice signal.
After eliminating DC component, the LPC coefficient maximum value of voice and average amplitude difference are extracted.Setting network output valve is worked as
In modeJudge again for the one-dimensional vector that threshold value exports network, be determined as voice segments optionally greater than this threshold value,
It is determined as non-speech segment less than this threshold value.
Extract two features of LPC coefficient maximum value and average amplitude difference of analog voice signal.Analog voice signal s (n)
LPC coefficient Rw(k):
In formula, sw(n) it is adding window voice;N is efficient voice block length;K is retardation;
To sw(n) it is maximized, you can obtain LPC coefficient maximum value.
The average amplitude difference Ω of analog voice signal s (n) is given by:
In formula, N is frame length;S (k) is the FFT transform of s (n);E is the mean value of analog voice signal frequency domain amplitude.
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted
The number of layer neuron is 2.Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output
The number of layer neuron is 1.Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer.Each layer of neuron state is only
Under the influence of one layer of neuron state.If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer
Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated
Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold
Value makes convolutional neural networks prediction output approach desired output.
Using the initial weighting coefficients and threshold value of genetic algorithm optimization convolutional neural networks, including:
(1) individual is encoded using coefficient coding, each individual is a numeric string, by input layer and hidden layer
It connects weighting coefficient, hidden layer threshold value, hidden layer and output layer connection weighting coefficient and 4 part of output layer threshold value forms;It is a
Body contains neural network whole weighting coefficient and threshold value, known to network structure, so that it may with constitute a structure,
The neural network that weighting coefficient, threshold value determine.
(2) initial weighting coefficients and threshold value of convolutional neural networks are obtained according to individual, with training data training convolutional god
Forecasting system exports after network, by the Error Absolute Value between prediction output and desired output and as ideal adaptation angle value,
That is fitness function F is set as:
N is convolutional neural networks output node number;yiFor the desired output of i-th of node of convolutional neural networks;oiFor convolution
The prediction of i-th of node of neural network exports;K is predefined coefficient.
(3) selection strategy based on the ratio of adaptation, the select probability p of each individual iiFor:
fi=k/Fi
In formula, FiFor the fitness value of individual i;
K is coefficient, takes 10 here;
N is population at individual quantity, takes 10 here.
(4) coefficient interior extrapolation method, k-th of chromosome a are used in crossover operationkWith first of chromosome alIn j intersections
Operating method is as follows:
akj=akj(1-b)+aljb
alj=alj(1-b)+akjb
In formula, random numbers of the b between [0,1].
(5) j-th of gene a of i-th of individual is chosenijInto row variation:
aij=aij+(aij-amax)*f(g)r>0.5
aij=aij+(amin-aij)*f(g)r≤0.5
In formula, amax, aminIt is respectively gene aijThe upper bound and lower bound;
F (g)=r2(1-g/Gmax)2;
Wherein r2For random number;G is current iteration number;GmaxFor maximum evolution number;R takes the random number between [0,1].
Voice critical points detection realizes that steps are as follows:
1) the convolutional neural networks structure for using 2-5-1, extracts the LPC coefficient maximum value of raw tone and average width first
Poor two features are spent, using the bivector as the input of neural network.Judge whether the frame is efficient voice using output layer
Block, random initializtion weighting coefficient and threshold value.
2) original audio block is randomly choosed, the classification of every frame signal is marked, if voice is then labeled as 1, if not then
Labeled as 0.Extract the LPC coefficient maximum value of the section audio block and average amplitude difference respectively, formed a two-dimensional feature to
Amount, the input vector as convolutional neural networks.
3) training sample is inputted into convolutional neural networks to train the parameter of network, and convolutional neural networks is carried out excellent
Change, the error between network output valve and desired value is made to reach preset standard.
4) the LPC coefficient maximum value of each audio block and average amplitude difference are extracted respectively, form a two-dimensional feature
Vector is tested as test sample input convolutional neural networks.Improved threshold value T, i.e. convolutional neural networks are used herein
The mode of all elements, is then determined as efficient voice block more than or equal to T in 1 dimensional vector of output;Less than T, it is determined that have to be non-
Imitate block of speech.The output valve of convolutional neural networks is compared with the value marked in advance, if accuracy is relatively low, to network into
Row re -training.
5) determine whether voice segments using the output valve of network.
Block of speech input signal vector is X (n)=[x1(n),x2(n),..xM(n)]T, then X (n) filtered by speech enhan-cement
The y (n) obtained after wave device is expressed as:
Y (n)=coef [β1x(n)+β2x(n)+…+βnb+1x(n)]
In formula, B=[β1,β2,…βnb+1] it is filter weighting coefficients coefficient vector, coef is auto-adaptive parameter
Then this auto-adaptive parameter coef is introduced into ILMS sef-adapting filter models and block of speech is carried out at denoising
Reason.SNR is calculated to the voice after denoisingi,SNRiCorresponding coef values are final coef training output valves when being maximized:
SNRi=∏snr(fLM(coefi,s(n)))
In formula, coefiThe natural number for being 1 for step-length;S (n) is block of speech to be reinforced;fLMFor adaptive filter algorithm letter
Number, according to coefiValue to s (n) carry out speech de-noising enhancing;∏snr() is the function for calculating segmental signal-to-noise ratio;
To SNRiMaximizing, and the subscript i corresponding to maximum value is assigned to coef:
Coef=argmax (SNR1,SNR2,,…)
In formula, argmax is the lower target function sought corresponding to maximum value.
Finally, in adaptive noise filter, according to coef values, each block of speech to be reinforced is enhanced.It is based on
Enhanced block of speech carries out speech-to-text conversion.
After block of speech is carried out text identification, so that it may be extracted automatically with carrying out summary speech, the invention firstly uses convolution
The feature vector of neural network algorithm training characteristics word, and then similarity between sentence is accurately calculated, iterative calculation update sentence weighting system
Number is then based between sentence similarity to eliminate the information redundancy simplified in voice, specifically includes following steps:
1, the feature vector that Feature Words are obtained using convolutional neural networks model training morpheme is indicated:From big data storage
Acquisition morpheme collection simultaneously pre-processes the morpheme collection, and the pretreatment includes carrying out subordinate sentence processing to the morpheme that morpheme is concentrated,
Obtain training characteristics morpheme collection;Training parameter is set, is integrated as training data using training characteristics morpheme, to convolutional neural networks model
It is trained, is exported in the form of feature vector as Feature Words by training using each word for concentrating training characteristics morpheme,
Obtain the feature vector representation of Feature Words;
Feature vector for the training characteristics word from a large amount of unstructured voice data indicates that the present invention utilizes current word
Feature vector predicts the feature vector of specified window context.Given feature morpheme w1,w2,w3,…,wTAs training data, mesh
Scalar functions are:
Wherein, c is the parameter for determining contextual window size, and the c the big, and it is training number to need more training datas, T
According to number.
The present invention distributes shorter path using W word of output layer as leaf node, to high frequency words.Each feature morpheme w
It can be accessed from the root node of tree along unique paths.If n (w, j) is the jth on from root node to the paths w
A node, L (w) are the length of this paths, therefore n (w, 1)=root, n (w, L (w))=w.For any internal node n,
Ch (n) is any child node of node n.Then define:
Wherein functionIt indicates:
After defining above formula, object function is solved using stochastic gradient descent method, the feature vector for ultimately producing word indicates
Form.
S2, it is retrieved in the collected morpheme concentrations of step S1 according to default query word, the block of speech retrieved is made
For candidate blocks collection, subordinate sentence processing is carried out to the candidate blocks collection and removes candidate blocks concentrating the sentence repeated, obtain candidate blocks collection;
Wherein SiFor the arbitrary sentence in candidate blocks collection S, N is the sum of sentence;Utilize the spy of the obtained Feature Words of step S1
Sign vector is used as the weighting coefficient on side in figure by semantic similarity between calculating sentence, constitutes sentence DAG graph models;
Arbitrary two S are concentrated to candidate blocksiAnd Sj, separately include Feature Words tiAnd tjFeature vectorWithI.e.WithCorresponding feature vector is obtained by the convolutional neural networks model training of step S1, then sentence SiAnd SjBetween semantic similarity
Sim (Si, Sj) formula is:
Wherein, for sentence SiIn Feature Words tiFeature vector It indicates in sentence SjNeutralize Feature Words tiBelong to
In identical part of speech all Feature Words feature vector withMaximum similarity value;|Si| and | Sj| S is indicated respectivelyiAnd SjLength
Degree;
S3, the DAG graph models that step S2 is obtained, according in step S2 average initial weighting coefficients and sentence between semantic phase
Weighting coefficient weight (the S of each node are updated using following formula iteration like degreei), until convergence, to which obtain can be anti-
Reflect the score value of sentence importance:
Wherein d is damped coefficient, and value range is [0,1].
Assoc(Si) indicate and SiConnected sentence set, i.e., with sentence SiSimilarity is more than 0 sentence set, | | Assoc (Si)|
| it is then sentence sum in the set;
Using the similarity matrix that semantic similarity is constituted between the average initial weighting coefficients and sentence of step S2 interior joints,
The weighting coefficient of each node in DAG graph models is iterated to calculate, until convergence.Final each node will obtain a score, be
Summary speech is generated in next step to prepare.
S4 weakens the sentence, i.e., if a sentence has higher similitude with existing sentence in summary set
The maximum and irredundant sentence composition of selection weighting coefficient simplifies set, the specific steps are:
1) what, initialization was empty simplifies voice queue;Using the sentence corresponding to each node in DAG graph models as initial time
Voice queue is simplified in choosing;
2) the sentence weighting coefficient corresponding to each DAG graph models node in voice queue, is simplified to candidate according to step S3
Descending arranges, using the sentence corresponding to each node after sequence as candidate summary statement sequence;
3) it, according to candidate summary statement sequence, primary sentence will be arranged in will be transferred to and simplify in voice queue, to candidate
The remaining sentence simplified in voice queue updates their weighting coefficient using following formula:
Weight(Sj)=Weight (Sj)-ω×Sim(Si,Sj)
Wherein, i ≠ j, ω are the reduction factor, when the sentence of weighting coefficient to be updated is deposited with the sentence simplified in voice queue
In similitude, reduction factor ω is 1.0.Sim(Si,Sj) it is the semantic similarity obtained in step S2;
4) step 2) and 3), is repeated, until to reach preset summary speech long for the sentence collection for simplifying in voice queue
Degree.
Assuming that the sentence sum of raw tone T is m, the rate of simplifying of summary speech is set as λ, needs the digest sentence extracted
Sum is n, then λ=n/m.M is the sentence sum that raw tone identifies.Text is the linear combination of sentence, and sentence is word
Linear combination, and word can be considered the linear combination of morpheme, i.e., can obtain the important journey of sentence indirectly by the significance level of morpheme
Degree.Therefore, as follows based on the predefined summary speech extraction process for simplifying rate:
1. the significance level of each node in morpheme network is calculated, with the average value of the significance level of each morpheme in sentence
Instead of the significance level of corresponding sentence, the sentence cluster S={ S with significance level are thus obtained1,S2…,Sm}:
w(ni)tIt is w (ni) the t times iteration, ε is attenuation factor, C (ni) it is morpheme set, each morpheme in the set
With node niThe morpheme of expression all exists while relationship occurs;Coexsit(ni,nj) it is morpheme nodes niAnd njIt is representative
Morpheme while occurrence rate;N is the sum of morpheme included in morpheme network.
2. distich cluster S carries out multi-field division;Assuming that the sentence cluster of k subdomains is obtained, with language in each subdomains sentence cluster
The synthesis significance level of sentence replaces the significance level of each subdomains sentence cluster, and according to significance level by k subdomains sentence cluster descending
Arrangement, is denoted as MS1,MS2..., MSk(k<M), the sentence in each subdomains sentence cluster is arranged also according to significance level height descending;
3. carrying out above-mentioned de-redundancy processing to the sentence in subdomains sentence cluster.Then rate λ is simplified according to summary speech, from each
Before field sentence cluster is extracted respectively by significance level sequenceA sentence.If;λ × m can be divided exactly by k, then can be obtained and finally want
λ × m of output simplify sentence;If cannot be divided exactly by k, then subordinate clause cluster MS1,MS2..., MS(λ × m%k)It is middle to extract respectivelyIt a sentence and is extracted just nowA sentence forms the summary sentence of voice T together, in this way, just having obtained most
Afterwards simplify a cluster, and be denoted as S '={ S '1,S’2…,S’λ×m};
4. the sentence in set S ' is sequentially output according to original sequence, summary speech is obtained.
Processing for social networks voice data, the present invention is after the sentence and word for identifying speech text, it is preferable that
Two phrases adjacent in each sentence are further synthesized into a word pair, each sentence indicates sequence by a string of words.Word is to combining
Contextual information, be mutually reinforcing importance of the other side as the possibility and whole sentence of keyword, and according to occurring jointly
Word generates summary speech data to extracting summary sentence.
N number of word that can accurately reflect text collection some sub-topics is extracted respectively first to as keyword pair, obtaining one
A keyword is to set.The weights of each word pair can be calculated by following formula:
WTF(bi)=fre (bi)*log2(ifre(bi))
Wherein fre (bi) be word to biWord frequency, that is, biThe frequency occurred in entire text collection.
ifre(bi) it is sentence sum and b occuriSentence quantity ratio.
By all words to according to its WTFIt is worth descending arrangement, then takes top n as keyword pair.
Calculate the distribution matrix of theme and word pair.Every a line is the probability distribution that theme is closed in words pair set in the matrix,
Each element characterizes the word to the significance level relative to the theme.It sums by row to the matrix, using obtained value as every
A word is to the global score closed in theme collection.
Take top n word to constituting keyword to set descending sort word based on this global score.
Based on above-mentioned keyword to set, calculates candidate sentence and word be overlapped in set is entirely closing number with keyword
Keyword is to the ratio in set.
Meanwhile in order to weaken long or too short sentence, regularization is carried out to the score value, and what regularization factors took is to wait
Select the length of sentence itself and numerical value larger in the mean sentence length of sentence set.The candidate sentence score of calculating can formulate definition such as
Under:
Wherein S indicates that candidate sentence, KBS indicate keyword to set, biThe keyword pair as occurred simultaneously.| S | and |
KBS | expression candidate sentence length and keyword are to the size of set respectively, the average length that Avlen is all in sentence set.
The extraction of summary sentence is extracted from the forward sentence that sorts on the basis of introducing similarity threshold to prevent redundancy
M meet the sentence of similarity condition as summary sentence.The flow for extracting summary sentence is as follows:
(1) what initialization was empty simplifies voice queue;And initialize candidate collection;
(2) sentence for taking current sequence the first is as candidate sentence Sc;
(3) when it is empty to simplify voice queue, directly candidate sentence is added to and simplifies voice queue;Otherwise it calculates and waits successively
Select the similarity of sentence Sc and each summary sentence Ss:
Once there are sim (Sc, Ss) > SimtdThe case where, directly turn (5), wherein SimtdFor similarity threshold;
(4) candidate sentence is added to and simplifies voice queue;
(5) current candidate sentence is removed from Candidate Set;
(6) if the sentence number simplified in voice queue is less than preset quantity M, turn (1), otherwise turn (7);
(7) voice queue is simplified in output.
Wherein, it if summary sentence includes temporal information, chronologically combines;If a plurality of summary sentence belongs in morpheme
Same subject, then according in raw tone statement sequence combine.
In conclusion the method for the present invention is based on the processing of voice big data, noise immunity is stronger, and accuracy rate higher has more
High recall rate significantly improves the efficiency that user obtains knowledge.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed
Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored
It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's
Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing
Change example.
Claims (3)
1. a kind of information intelligent processing method based on big data, which is characterized in that including:
The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data;It chooses
The audio block of predefined quantity is initial results, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;According to next
Audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, is set if error is more than
Determine threshold value, then summary speech data is added in the audio block.
2. according to the method described in claim 1, it is characterized in that, in the trained primary voice data efficient voice block it is more
Layer convolutional neural networks, obtain the polynomial repressentation of each frame, further comprise:
Utilize denoising encoder initial training multilayer convolutional neural networks;
Each frame audio is proceeded as follows in each layer:
First, by adding Gaussian noise, the random input variable that sets as each frame audio-frequency noise of arbitrary value generation;
Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
Update is adjusted to each layer parameter of convolutional neural networks.
3. according to the method described in claim 1, it is characterized in that, the described pair of audio block is rebuild, further comprise:
The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th of original audio
Block;
Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,Yk,…Ym, YkCorresponding kth
The polynomial repressentation of a audio block;
If initial speech library D is by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If reconstructed coefficients are C,
Its element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkCorresponding k-th of audio block
Coefficient, ciCorresponding i-th frame voice;
Initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Yk,Ck,
Being embodied as D):
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression expression in symbol is rebuild using the i-th frame audio of D pairs of library, is had
Body is:
First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned object function
Become the convex function of parameter D, iteration alternately updates two parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810163995.1A CN108388942A (en) | 2018-02-27 | 2018-02-27 | Information intelligent processing method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810163995.1A CN108388942A (en) | 2018-02-27 | 2018-02-27 | Information intelligent processing method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108388942A true CN108388942A (en) | 2018-08-10 |
Family
ID=63070092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810163995.1A Pending CN108388942A (en) | 2018-02-27 | 2018-02-27 | Information intelligent processing method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108388942A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN117440001A (en) * | 2023-12-20 | 2024-01-23 | 国投人力资源服务有限公司 | Data synchronization method based on message |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1188957A (en) * | 1996-09-24 | 1998-07-29 | 索尼公司 | Vector quantization method and speech encoding method and apparatus |
CN1819017A (en) * | 2004-12-13 | 2006-08-16 | Lg电子株式会社 | Method for extracting feature vectors for speech recognition |
CN101546556A (en) * | 2008-03-28 | 2009-09-30 | 展讯通信(上海)有限公司 | Classification system for identifying audio content |
CN101546557A (en) * | 2008-03-28 | 2009-09-30 | 展讯通信(上海)有限公司 | Method for updating classifier parameters for identifying audio content |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104679902A (en) * | 2015-03-20 | 2015-06-03 | 湘潭大学 | Information abstract extraction method in conjunction with cross-media fuse |
CN105989067A (en) * | 2015-02-09 | 2016-10-05 | 华为技术有限公司 | Method for generating text abstract from image, user equipment and training server |
CN106446109A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Acquiring method and device for audio file abstract |
-
2018
- 2018-02-27 CN CN201810163995.1A patent/CN108388942A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1188957A (en) * | 1996-09-24 | 1998-07-29 | 索尼公司 | Vector quantization method and speech encoding method and apparatus |
CN1819017A (en) * | 2004-12-13 | 2006-08-16 | Lg电子株式会社 | Method for extracting feature vectors for speech recognition |
CN101546556A (en) * | 2008-03-28 | 2009-09-30 | 展讯通信(上海)有限公司 | Classification system for identifying audio content |
CN101546557A (en) * | 2008-03-28 | 2009-09-30 | 展讯通信(上海)有限公司 | Method for updating classifier parameters for identifying audio content |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN105989067A (en) * | 2015-02-09 | 2016-10-05 | 华为技术有限公司 | Method for generating text abstract from image, user equipment and training server |
CN104679902A (en) * | 2015-03-20 | 2015-06-03 | 湘潭大学 | Information abstract extraction method in conjunction with cross-media fuse |
CN106446109A (en) * | 2016-09-14 | 2017-02-22 | 科大讯飞股份有限公司 | Acquiring method and device for audio file abstract |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448726A (en) * | 2019-01-14 | 2019-03-08 | 李庆湧 | A kind of method of adjustment and system of voice control accuracy rate |
CN117440001A (en) * | 2023-12-20 | 2024-01-23 | 国投人力资源服务有限公司 | Data synchronization method based on message |
CN117440001B (en) * | 2023-12-20 | 2024-02-27 | 国投人力资源服务有限公司 | Data synchronization method based on message |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hinton et al. | Improving neural networks by preventing co-adaptation of feature detectors | |
CN111816156B (en) | Multi-to-multi voice conversion method and system based on speaker style feature modeling | |
Krishnamurthy et al. | Neural networks for vector quantization of speech and images | |
CN109767759A (en) | End-to-end speech recognition methods based on modified CLDNN structure | |
CN110442684A (en) | A kind of class case recommended method based on content of text | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN109214004B (en) | Big data processing method based on machine learning | |
CN110534132A (en) | A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic | |
CN108268449A (en) | A kind of text semantic label abstracting method based on lexical item cluster | |
CN108170848B (en) | Chinese mobile intelligent customer service-oriented conversation scene classification method | |
CN109857457B (en) | Function level embedding representation method in source code learning in hyperbolic space | |
Zhao | Evolutionary design of neural network tree-integration of decision tree, neural network and GA | |
CN111127146A (en) | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder | |
CN112232087A (en) | Transformer-based specific aspect emotion analysis method of multi-granularity attention model | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN112884149B (en) | Random sensitivity ST-SM-based deep neural network pruning method and system | |
CN108647206A (en) | Chinese spam filtering method based on chaotic particle swarm optimization CNN networks | |
CN109241298A (en) | Semantic data stores dispatching method | |
CN114120041A (en) | Small sample classification method based on double-pair anti-variation self-encoder | |
CN110634476A (en) | Method and system for rapidly building robust acoustic model | |
CN109409434A (en) | The method of liver diseases data classification Rule Extraction based on random forest | |
CN108388942A (en) | Information intelligent processing method based on big data | |
CN108417204A (en) | Information security processing method based on big data | |
CN113806543B (en) | Text classification method of gate control circulation unit based on residual jump connection | |
CN116467416A (en) | Multi-mode dialogue emotion recognition method and system based on graphic neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180810 |
|
RJ01 | Rejection of invention patent application after publication |