CN108388942A - Information intelligent processing method based on big data - Google Patents

Information intelligent processing method based on big data Download PDF

Info

Publication number
CN108388942A
CN108388942A CN201810163995.1A CN201810163995A CN108388942A CN 108388942 A CN108388942 A CN 108388942A CN 201810163995 A CN201810163995 A CN 201810163995A CN 108388942 A CN108388942 A CN 108388942A
Authority
CN
China
Prior art keywords
sentence
convolutional neural
audio
neural networks
audio block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810163995.1A
Other languages
Chinese (zh)
Inventor
王兰鹰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Songyuan Cloud Technology Co Ltd
Original Assignee
Sichuan Songyuan Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Songyuan Cloud Technology Co Ltd filed Critical Sichuan Songyuan Cloud Technology Co Ltd
Priority to CN201810163995.1A priority Critical patent/CN108388942A/en
Publication of CN108388942A publication Critical patent/CN108388942A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The information intelligent processing method based on big data that the present invention provides a kind of, this method include:The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data;It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;Convolutional neural networks parameter is updated according to next audio block, while reconstruction error is rebuild and calculated to the audio block, if error is more than given threshold, which is added summary speech data.The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall rate, significantly improves the efficiency that user obtains knowledge.

Description

Information intelligent processing method based on big data
Technical field
The present invention relates to language data process, more particularly to a kind of information intelligent processing method based on big data.
Background technology
With the development of scientific and technological progress and network technology, in network, there are magnanimity remembers using media information, such as call Record, wechat voice, minutes face a large amount of audio data, and user needs more rapidly to understand voice messaging, when saving user Between, improve working efficiency.With the fast development of information retrieval technique, voice summary generation technique is also increasingly mature.From initial Method based on word frequency, to machine learning is introduced, performance has greatly improved.Existing scheme generally uses supervised learning side Method is trained training set using disaggregated model, obtains optimal weight vector, and to test set in training set Carry out classification prediction;But supervised learning model is relied on, labeled data is needed, usually by manually marking realization, is taken very much, and have There is subjectivity, and be easy to ignore the semantic similarity between sentence, reduces the accuracy of result of calculation.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes at a kind of information intelligent based on big data Reason method, including:
The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data; It is initial results to choose and predefine the audio block of quantity, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;According to Next audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, if error is big In given threshold, then summary speech data are added in the audio block.
Preferably, in the trained primary voice data efficient voice block multilayer convolutional neural networks, obtain each frame Polynomial repressentation further comprises:
Utilize denoising encoder initial training multilayer convolutional neural networks;
Each frame audio is proceeded as follows in each layer:
First, by adding Gaussian noise, the random input variable that sets as each frame audio-frequency noise of arbitrary value generation;
Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
Update is adjusted to each layer parameter of convolutional neural networks.
Preferably, the described pair of audio block is rebuild, and is further comprised:
The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th original Audio block;
Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,Yk,…Ym, YkIt is corresponding The polynomial repressentation of k-th of audio block;
If initial speech library D is by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If reconstructed coefficients For C, element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkCorresponding k-th of sound Frequency block coefficient, ciCorresponding i-th frame voice;
Initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library It builds, specially:
First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned target Function becomes the convex function of parameter D, and iteration alternately updates two parameters.
The present invention compared with prior art, has the following advantages:
The method of the present invention is based on the processing of voice big data, and noise immunity is stronger, and accuracy rate higher has higher recall Rate significantly improves the efficiency that user obtains knowledge.
Description of the drawings
Fig. 1 is the flow chart of the information intelligent processing method according to the ... of the embodiment of the present invention based on big data.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of information intelligent processing method based on big data.Fig. 1 is according to the present invention The information intelligent process flow figure based on big data of embodiment.
The present invention obtains primary voice data first, carries out following operation:
1) it is multiple audio blocks by phonetic segmentation, each audio block includes multiframe, extracts the statistical nature of each frame audio, shape At corresponding feature vector;
2) training efficient voice block multilayer convolutional neural networks, obtain the polynomial repressentation of each frame;
3) m audio block is initial results before choosing, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;
4) convolutional neural networks parameter is updated according to next audio block, while the audio block is rebuild and calculates reconstruction The audio block is added in summary speech library if error is more than given threshold and updates the library by error;
5) according to the new audio block of step 4) successively online processing until terminating, newer summary speech data are to generate Summary speech data.
The statistical nature of each frame audio of extraction described in step 1) forms individual features vector, specifically:
1) it sets raw tone and is uniformly divided into n audio block, i.e., each audio block includes t frame audios, and each frame audio is converted At unified code check and keep crude sampling rate;
2) local feature of each frame, including zero-crossing rate average amplitude difference and LPC coefficient are extracted;
3) the above-mentioned audio frequency characteristics for sequentially combining each frame form the feature vector that dimension is nf.
Initial training efficient voice block multilayer convolutional neural networks in step 2) obtain the polynomial repressentation of each frame, specifically It is:
Utilize denoising encoder initial training multilayer convolutional neural networks;
A, each frame audio is proceeded as follows in each layer:First, by adding Gaussian noise, setting input variable at random Each frame audio-frequency noise is generated for arbitrary value;Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
B, update is adjusted to each layer parameter of convolutional neural networks;
Summary speech data are rebuild in step 3), specifically:
1) summary speech data by raw tone preceding m audio block set at, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th of original audio block;Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…, Yk,Ym, YkThe polynomial repressentation of corresponding k-th of audio block;
2) initial speech library D is set by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If rebuilding system Number is C, and element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkIt is k-th corresponding Audio block coefficient, ciCorresponding i-th frame voice;
3) initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Yk,Ck, D) be embodied as:
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression in symbol indicates to carry out weight using the i-th frame audio of D pairs of library It builds.Specially:First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned mesh Scalar functions become the convex function of parameter D, and iteration alternately updates two parameters.
Convolutional neural networks parameter is updated according to next audio block and the audio block is rebuild and counted in step 4) Reconstruction error is calculated, specifically:
1) each frame audio of the audio block is done as follows successively:
A. the parameter of last layer in convolutional neural networks, i.e. weighting coefficient W and offset b are updated;
B. the parameter of other layers in BP algorithm update convolutional neural networks is utilized;
2) polynomial repressentation of each frame audio is updated according to new parameter;
3) it is based on existing voice library D, error ε is rebuild and calculated to current audio block, i.e., to current audio block Xk's Polynomial repressentation YkIt is rebuild, the specific steps are:First minimize function of many variables F (Yk,Ck, D) and obtain optimal reconstructed coefficients then First item, that is, l of substitution2In norm and to calculate its value be current reconstruction error ε.
Current audio block is added in summary speech library and is updated the library if error is more than given threshold in step 4), Specifically:
If 1) to current audio block XkPolynomial repressentation YkThe reconstruction error ε being calculated is more than given threshold θ, then will Current audio block XkIt is added in summary speech library S.
If 2) contain q audio block in current summary speech library S, the frame audio polynomial repressentation collection for updating the library is combined into yq, then Y is usedk∈yqIt updates library D and solves object function:
Wherein, parameter lambda is the coefficient more than 0, the influence for adjusting regularization term.
Wherein, when block of speech is extracted, the present invention extracts the LPC coefficient maximum value and frequency of analog voice signal time domain first The average amplitude in domain is poor, and the feature of extraction is then formed bivector as the input of convolutional neural networks, utilizes nerve net The output of network judges whether signal is analog voice signal.
After eliminating DC component, the LPC coefficient maximum value of voice and average amplitude difference are extracted.Setting network output valve is worked as In modeJudge again for the one-dimensional vector that threshold value exports network, be determined as voice segments optionally greater than this threshold value, It is determined as non-speech segment less than this threshold value.
Extract two features of LPC coefficient maximum value and average amplitude difference of analog voice signal.Analog voice signal s (n) LPC coefficient Rw(k):
In formula, sw(n) it is adding window voice;N is efficient voice block length;K is retardation;
To sw(n) it is maximized, you can obtain LPC coefficient maximum value.
The average amplitude difference Ω of analog voice signal s (n) is given by:
In formula, N is frame length;S (k) is the FFT transform of s (n);E is the mean value of analog voice signal frequency domain amplitude.
The input vector of convolutional neural networks is 2 dimensional vectors of LPC coefficient maximum value and average amplitude difference composition, that is, is inputted The number of layer neuron is 2.Output is to judge that present frame is efficient voice block or 1 dimensional vector of non-effective block of speech, output The number of layer neuron is 1.Hidden layer neuron number is 5.
In forward direction transmission, input signal is successively handled through hidden layer, until output layer.Each layer of neuron state is only Under the influence of one layer of neuron state.If wijIt is the connection weighting coefficient of input layer and hidden layer, wjkIt is hidden layer and output layer Connect weighting coefficient, ajIt is the threshold value of hidden layer, bkIt is the threshold value of output layer, i represents input layer, and j represents hidden layer, and k represents defeated Go out layer.If output layer cannot get desired output, it is transferred to backpropagation, according to prediction error transfer factor network weights coefficient and threshold Value makes convolutional neural networks prediction output approach desired output.
Using the initial weighting coefficients and threshold value of genetic algorithm optimization convolutional neural networks, including:
(1) individual is encoded using coefficient coding, each individual is a numeric string, by input layer and hidden layer It connects weighting coefficient, hidden layer threshold value, hidden layer and output layer connection weighting coefficient and 4 part of output layer threshold value forms;It is a Body contains neural network whole weighting coefficient and threshold value, known to network structure, so that it may with constitute a structure, The neural network that weighting coefficient, threshold value determine.
(2) initial weighting coefficients and threshold value of convolutional neural networks are obtained according to individual, with training data training convolutional god Forecasting system exports after network, by the Error Absolute Value between prediction output and desired output and as ideal adaptation angle value, That is fitness function F is set as:
N is convolutional neural networks output node number;yiFor the desired output of i-th of node of convolutional neural networks;oiFor convolution The prediction of i-th of node of neural network exports;K is predefined coefficient.
(3) selection strategy based on the ratio of adaptation, the select probability p of each individual iiFor:
fi=k/Fi
In formula, FiFor the fitness value of individual i;
K is coefficient, takes 10 here;
N is population at individual quantity, takes 10 here.
(4) coefficient interior extrapolation method, k-th of chromosome a are used in crossover operationkWith first of chromosome alIn j intersections Operating method is as follows:
akj=akj(1-b)+aljb
alj=alj(1-b)+akjb
In formula, random numbers of the b between [0,1].
(5) j-th of gene a of i-th of individual is chosenijInto row variation:
aij=aij+(aij-amax)*f(g)r>0.5
aij=aij+(amin-aij)*f(g)r≤0.5
In formula, amax, aminIt is respectively gene aijThe upper bound and lower bound;
F (g)=r2(1-g/Gmax)2
Wherein r2For random number;G is current iteration number;GmaxFor maximum evolution number;R takes the random number between [0,1].
Voice critical points detection realizes that steps are as follows:
1) the convolutional neural networks structure for using 2-5-1, extracts the LPC coefficient maximum value of raw tone and average width first Poor two features are spent, using the bivector as the input of neural network.Judge whether the frame is efficient voice using output layer Block, random initializtion weighting coefficient and threshold value.
2) original audio block is randomly choosed, the classification of every frame signal is marked, if voice is then labeled as 1, if not then Labeled as 0.Extract the LPC coefficient maximum value of the section audio block and average amplitude difference respectively, formed a two-dimensional feature to Amount, the input vector as convolutional neural networks.
3) training sample is inputted into convolutional neural networks to train the parameter of network, and convolutional neural networks is carried out excellent Change, the error between network output valve and desired value is made to reach preset standard.
4) the LPC coefficient maximum value of each audio block and average amplitude difference are extracted respectively, form a two-dimensional feature Vector is tested as test sample input convolutional neural networks.Improved threshold value T, i.e. convolutional neural networks are used herein The mode of all elements, is then determined as efficient voice block more than or equal to T in 1 dimensional vector of output;Less than T, it is determined that have to be non- Imitate block of speech.The output valve of convolutional neural networks is compared with the value marked in advance, if accuracy is relatively low, to network into Row re -training.
5) determine whether voice segments using the output valve of network.
Block of speech input signal vector is X (n)=[x1(n),x2(n),..xM(n)]T, then X (n) filtered by speech enhan-cement The y (n) obtained after wave device is expressed as:
Y (n)=coef [β1x(n)+β2x(n)+…+βnb+1x(n)]
In formula, B=[β12,…βnb+1] it is filter weighting coefficients coefficient vector, coef is auto-adaptive parameter
Then this auto-adaptive parameter coef is introduced into ILMS sef-adapting filter models and block of speech is carried out at denoising Reason.SNR is calculated to the voice after denoisingi,SNRiCorresponding coef values are final coef training output valves when being maximized:
SNRi=∏snr(fLM(coefi,s(n)))
In formula, coefiThe natural number for being 1 for step-length;S (n) is block of speech to be reinforced;fLMFor adaptive filter algorithm letter Number, according to coefiValue to s (n) carry out speech de-noising enhancing;∏snr() is the function for calculating segmental signal-to-noise ratio;
To SNRiMaximizing, and the subscript i corresponding to maximum value is assigned to coef:
Coef=argmax (SNR1,SNR2,,…)
In formula, argmax is the lower target function sought corresponding to maximum value.
Finally, in adaptive noise filter, according to coef values, each block of speech to be reinforced is enhanced.It is based on Enhanced block of speech carries out speech-to-text conversion.
After block of speech is carried out text identification, so that it may be extracted automatically with carrying out summary speech, the invention firstly uses convolution The feature vector of neural network algorithm training characteristics word, and then similarity between sentence is accurately calculated, iterative calculation update sentence weighting system Number is then based between sentence similarity to eliminate the information redundancy simplified in voice, specifically includes following steps:
1, the feature vector that Feature Words are obtained using convolutional neural networks model training morpheme is indicated:From big data storage Acquisition morpheme collection simultaneously pre-processes the morpheme collection, and the pretreatment includes carrying out subordinate sentence processing to the morpheme that morpheme is concentrated, Obtain training characteristics morpheme collection;Training parameter is set, is integrated as training data using training characteristics morpheme, to convolutional neural networks model It is trained, is exported in the form of feature vector as Feature Words by training using each word for concentrating training characteristics morpheme, Obtain the feature vector representation of Feature Words;
Feature vector for the training characteristics word from a large amount of unstructured voice data indicates that the present invention utilizes current word Feature vector predicts the feature vector of specified window context.Given feature morpheme w1,w2,w3,…,wTAs training data, mesh Scalar functions are:
Wherein, c is the parameter for determining contextual window size, and the c the big, and it is training number to need more training datas, T According to number.
The present invention distributes shorter path using W word of output layer as leaf node, to high frequency words.Each feature morpheme w It can be accessed from the root node of tree along unique paths.If n (w, j) is the jth on from root node to the paths w A node, L (w) are the length of this paths, therefore n (w, 1)=root, n (w, L (w))=w.For any internal node n, Ch (n) is any child node of node n.Then define:
Wherein functionIt indicates:
After defining above formula, object function is solved using stochastic gradient descent method, the feature vector for ultimately producing word indicates Form.
S2, it is retrieved in the collected morpheme concentrations of step S1 according to default query word, the block of speech retrieved is made For candidate blocks collection, subordinate sentence processing is carried out to the candidate blocks collection and removes candidate blocks concentrating the sentence repeated, obtain candidate blocks collection;
Wherein SiFor the arbitrary sentence in candidate blocks collection S, N is the sum of sentence;Utilize the spy of the obtained Feature Words of step S1 Sign vector is used as the weighting coefficient on side in figure by semantic similarity between calculating sentence, constitutes sentence DAG graph models;
Arbitrary two S are concentrated to candidate blocksiAnd Sj, separately include Feature Words tiAnd tjFeature vectorWithI.e.WithCorresponding feature vector is obtained by the convolutional neural networks model training of step S1, then sentence SiAnd SjBetween semantic similarity Sim (Si, Sj) formula is:
Wherein, for sentence SiIn Feature Words tiFeature vector It indicates in sentence SjNeutralize Feature Words tiBelong to In identical part of speech all Feature Words feature vector withMaximum similarity value;|Si| and | Sj| S is indicated respectivelyiAnd SjLength Degree;
S3, the DAG graph models that step S2 is obtained, according in step S2 average initial weighting coefficients and sentence between semantic phase Weighting coefficient weight (the S of each node are updated using following formula iteration like degreei), until convergence, to which obtain can be anti- Reflect the score value of sentence importance:
Wherein d is damped coefficient, and value range is [0,1].
Assoc(Si) indicate and SiConnected sentence set, i.e., with sentence SiSimilarity is more than 0 sentence set, | | Assoc (Si)| | it is then sentence sum in the set;
Using the similarity matrix that semantic similarity is constituted between the average initial weighting coefficients and sentence of step S2 interior joints, The weighting coefficient of each node in DAG graph models is iterated to calculate, until convergence.Final each node will obtain a score, be Summary speech is generated in next step to prepare.
S4 weakens the sentence, i.e., if a sentence has higher similitude with existing sentence in summary set The maximum and irredundant sentence composition of selection weighting coefficient simplifies set, the specific steps are:
1) what, initialization was empty simplifies voice queue;Using the sentence corresponding to each node in DAG graph models as initial time Voice queue is simplified in choosing;
2) the sentence weighting coefficient corresponding to each DAG graph models node in voice queue, is simplified to candidate according to step S3 Descending arranges, using the sentence corresponding to each node after sequence as candidate summary statement sequence;
3) it, according to candidate summary statement sequence, primary sentence will be arranged in will be transferred to and simplify in voice queue, to candidate The remaining sentence simplified in voice queue updates their weighting coefficient using following formula:
Weight(Sj)=Weight (Sj)-ω×Sim(Si,Sj)
Wherein, i ≠ j, ω are the reduction factor, when the sentence of weighting coefficient to be updated is deposited with the sentence simplified in voice queue In similitude, reduction factor ω is 1.0.Sim(Si,Sj) it is the semantic similarity obtained in step S2;
4) step 2) and 3), is repeated, until to reach preset summary speech long for the sentence collection for simplifying in voice queue Degree.
Assuming that the sentence sum of raw tone T is m, the rate of simplifying of summary speech is set as λ, needs the digest sentence extracted Sum is n, then λ=n/m.M is the sentence sum that raw tone identifies.Text is the linear combination of sentence, and sentence is word Linear combination, and word can be considered the linear combination of morpheme, i.e., can obtain the important journey of sentence indirectly by the significance level of morpheme Degree.Therefore, as follows based on the predefined summary speech extraction process for simplifying rate:
1. the significance level of each node in morpheme network is calculated, with the average value of the significance level of each morpheme in sentence Instead of the significance level of corresponding sentence, the sentence cluster S={ S with significance level are thus obtained1,S2…,Sm}:
w(ni)tIt is w (ni) the t times iteration, ε is attenuation factor, C (ni) it is morpheme set, each morpheme in the set With node niThe morpheme of expression all exists while relationship occurs;Coexsit(ni,nj) it is morpheme nodes niAnd njIt is representative Morpheme while occurrence rate;N is the sum of morpheme included in morpheme network.
2. distich cluster S carries out multi-field division;Assuming that the sentence cluster of k subdomains is obtained, with language in each subdomains sentence cluster The synthesis significance level of sentence replaces the significance level of each subdomains sentence cluster, and according to significance level by k subdomains sentence cluster descending Arrangement, is denoted as MS1,MS2..., MSk(k<M), the sentence in each subdomains sentence cluster is arranged also according to significance level height descending;
3. carrying out above-mentioned de-redundancy processing to the sentence in subdomains sentence cluster.Then rate λ is simplified according to summary speech, from each Before field sentence cluster is extracted respectively by significance level sequenceA sentence.If;λ × m can be divided exactly by k, then can be obtained and finally want λ × m of output simplify sentence;If cannot be divided exactly by k, then subordinate clause cluster MS1,MS2..., MS(λ × m%k)It is middle to extract respectivelyIt a sentence and is extracted just nowA sentence forms the summary sentence of voice T together, in this way, just having obtained most Afterwards simplify a cluster, and be denoted as S '={ S '1,S’2…,S’λ×m};
4. the sentence in set S ' is sequentially output according to original sequence, summary speech is obtained.
Processing for social networks voice data, the present invention is after the sentence and word for identifying speech text, it is preferable that Two phrases adjacent in each sentence are further synthesized into a word pair, each sentence indicates sequence by a string of words.Word is to combining Contextual information, be mutually reinforcing importance of the other side as the possibility and whole sentence of keyword, and according to occurring jointly Word generates summary speech data to extracting summary sentence.
N number of word that can accurately reflect text collection some sub-topics is extracted respectively first to as keyword pair, obtaining one A keyword is to set.The weights of each word pair can be calculated by following formula:
WTF(bi)=fre (bi)*log2(ifre(bi))
Wherein fre (bi) be word to biWord frequency, that is, biThe frequency occurred in entire text collection.
ifre(bi) it is sentence sum and b occuriSentence quantity ratio.
By all words to according to its WTFIt is worth descending arrangement, then takes top n as keyword pair.
Calculate the distribution matrix of theme and word pair.Every a line is the probability distribution that theme is closed in words pair set in the matrix, Each element characterizes the word to the significance level relative to the theme.It sums by row to the matrix, using obtained value as every A word is to the global score closed in theme collection.
Take top n word to constituting keyword to set descending sort word based on this global score.
Based on above-mentioned keyword to set, calculates candidate sentence and word be overlapped in set is entirely closing number with keyword Keyword is to the ratio in set.
Meanwhile in order to weaken long or too short sentence, regularization is carried out to the score value, and what regularization factors took is to wait Select the length of sentence itself and numerical value larger in the mean sentence length of sentence set.The candidate sentence score of calculating can formulate definition such as Under:
Wherein S indicates that candidate sentence, KBS indicate keyword to set, biThe keyword pair as occurred simultaneously.| S | and | KBS | expression candidate sentence length and keyword are to the size of set respectively, the average length that Avlen is all in sentence set.
The extraction of summary sentence is extracted from the forward sentence that sorts on the basis of introducing similarity threshold to prevent redundancy M meet the sentence of similarity condition as summary sentence.The flow for extracting summary sentence is as follows:
(1) what initialization was empty simplifies voice queue;And initialize candidate collection;
(2) sentence for taking current sequence the first is as candidate sentence Sc;
(3) when it is empty to simplify voice queue, directly candidate sentence is added to and simplifies voice queue;Otherwise it calculates and waits successively Select the similarity of sentence Sc and each summary sentence Ss:
Once there are sim (Sc, Ss) > SimtdThe case where, directly turn (5), wherein SimtdFor similarity threshold;
(4) candidate sentence is added to and simplifies voice queue;
(5) current candidate sentence is removed from Candidate Set;
(6) if the sentence number simplified in voice queue is less than preset quantity M, turn (1), otherwise turn (7);
(7) voice queue is simplified in output.
Wherein, it if summary sentence includes temporal information, chronologically combines;If a plurality of summary sentence belongs in morpheme Same subject, then according in raw tone statement sequence combine.
In conclusion the method for the present invention is based on the processing of voice big data, noise immunity is stronger, and accuracy rate higher has more High recall rate significantly improves the efficiency that user obtains knowledge.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims (3)

1. a kind of information intelligent processing method based on big data, which is characterized in that including:
The multilayer convolutional neural networks of efficient voice block, obtain the polynomial repressentation of each frame in training primary voice data;It chooses The audio block of predefined quantity is initial results, and is rebuild to it, and initial speech library and reconstructed coefficients are obtained;According to next Audio block updates convolutional neural networks parameter, while reconstruction error is rebuild and calculated to the audio block, is set if error is more than Determine threshold value, then summary speech data is added in the audio block.
2. according to the method described in claim 1, it is characterized in that, in the trained primary voice data efficient voice block it is more Layer convolutional neural networks, obtain the polynomial repressentation of each frame, further comprise:
Utilize denoising encoder initial training multilayer convolutional neural networks;
Each frame audio is proceeded as follows in each layer:
First, by adding Gaussian noise, the random input variable that sets as each frame audio-frequency noise of arbitrary value generation;
Then, audio-frequency noise is mapped to obtain its polynomial repressentation;
Update is adjusted to each layer parameter of convolutional neural networks.
3. according to the method described in claim 1, it is characterized in that, the described pair of audio block is rebuild, further comprise:
The preceding m audio block of raw tone is obtained, m is positive integer, i.e., shared m × t frame audios, XkCorresponding k-th of original audio Block;
Corresponding polynomial table, which is obtained, by initial training convolutional neural networks is shown as { Y1,Y2,…,Yk,…Ym, YkCorresponding kth The polynomial repressentation of a audio block;
If initial speech library D is by ndA element composition, i.e. D={ dj}j∈[1,nd], djCorresponding j-th of element;If reconstructed coefficients are C, Its element number corresponds to number of frames, and dimension corresponds to the number of elements in library, i.e. C={ Ci}i∈[1,nf], CkCorresponding k-th of audio block Coefficient, ciCorresponding i-th frame voice;
Initial speech library D and reconstructed coefficients C are respectively obtained using following formula, i.e.,:
Wherein, symbol | | | |2Indicate the l of variable2Norm, regularization parameter λ are the coefficient more than 0, function of many variables F (Yk,Ck, Being embodied as D):
Wherein, parameter γ is the coefficient more than 0, and the mathematical expression expression in symbol is rebuild using the i-th frame audio of D pairs of library, is had Body is:
First preset parameter D, makes above-mentioned object function become the convex function of parameter C;Then preset parameter C makes above-mentioned object function Become the convex function of parameter D, iteration alternately updates two parameters.
CN201810163995.1A 2018-02-27 2018-02-27 Information intelligent processing method based on big data Pending CN108388942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810163995.1A CN108388942A (en) 2018-02-27 2018-02-27 Information intelligent processing method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810163995.1A CN108388942A (en) 2018-02-27 2018-02-27 Information intelligent processing method based on big data

Publications (1)

Publication Number Publication Date
CN108388942A true CN108388942A (en) 2018-08-10

Family

ID=63070092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810163995.1A Pending CN108388942A (en) 2018-02-27 2018-02-27 Information intelligent processing method based on big data

Country Status (1)

Country Link
CN (1) CN108388942A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448726A (en) * 2019-01-14 2019-03-08 李庆湧 A kind of method of adjustment and system of voice control accuracy rate
CN117440001A (en) * 2023-12-20 2024-01-23 国投人力资源服务有限公司 Data synchronization method based on message

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1188957A (en) * 1996-09-24 1998-07-29 索尼公司 Vector quantization method and speech encoding method and apparatus
CN1819017A (en) * 2004-12-13 2006-08-16 Lg电子株式会社 Method for extracting feature vectors for speech recognition
CN101546556A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Classification system for identifying audio content
CN101546557A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Method for updating classifier parameters for identifying audio content
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN105989067A (en) * 2015-02-09 2016-10-05 华为技术有限公司 Method for generating text abstract from image, user equipment and training server
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1188957A (en) * 1996-09-24 1998-07-29 索尼公司 Vector quantization method and speech encoding method and apparatus
CN1819017A (en) * 2004-12-13 2006-08-16 Lg电子株式会社 Method for extracting feature vectors for speech recognition
CN101546556A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Classification system for identifying audio content
CN101546557A (en) * 2008-03-28 2009-09-30 展讯通信(上海)有限公司 Method for updating classifier parameters for identifying audio content
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN105989067A (en) * 2015-02-09 2016-10-05 华为技术有限公司 Method for generating text abstract from image, user equipment and training server
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN106446109A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Acquiring method and device for audio file abstract

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448726A (en) * 2019-01-14 2019-03-08 李庆湧 A kind of method of adjustment and system of voice control accuracy rate
CN117440001A (en) * 2023-12-20 2024-01-23 国投人力资源服务有限公司 Data synchronization method based on message
CN117440001B (en) * 2023-12-20 2024-02-27 国投人力资源服务有限公司 Data synchronization method based on message

Similar Documents

Publication Publication Date Title
Hinton et al. Improving neural networks by preventing co-adaptation of feature detectors
CN111816156B (en) Multi-to-multi voice conversion method and system based on speaker style feature modeling
Krishnamurthy et al. Neural networks for vector quantization of speech and images
CN109767759A (en) End-to-end speech recognition methods based on modified CLDNN structure
CN110442684A (en) A kind of class case recommended method based on content of text
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN109214004B (en) Big data processing method based on machine learning
CN110534132A (en) A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic
CN108268449A (en) A kind of text semantic label abstracting method based on lexical item cluster
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN109857457B (en) Function level embedding representation method in source code learning in hyperbolic space
Zhao Evolutionary design of neural network tree-integration of decision tree, neural network and GA
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112884149B (en) Random sensitivity ST-SM-based deep neural network pruning method and system
CN108647206A (en) Chinese spam filtering method based on chaotic particle swarm optimization CNN networks
CN109241298A (en) Semantic data stores dispatching method
CN114120041A (en) Small sample classification method based on double-pair anti-variation self-encoder
CN110634476A (en) Method and system for rapidly building robust acoustic model
CN109409434A (en) The method of liver diseases data classification Rule Extraction based on random forest
CN108388942A (en) Information intelligent processing method based on big data
CN108417204A (en) Information security processing method based on big data
CN113806543B (en) Text classification method of gate control circulation unit based on residual jump connection
CN116467416A (en) Multi-mode dialogue emotion recognition method and system based on graphic neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180810

RJ01 Rejection of invention patent application after publication