CN108010533A - The automatic identifying method and device of voice data code check - Google Patents

The automatic identifying method and device of voice data code check Download PDF

Info

Publication number
CN108010533A
CN108010533A CN201610957146.4A CN201610957146A CN108010533A CN 108010533 A CN108010533 A CN 108010533A CN 201610957146 A CN201610957146 A CN 201610957146A CN 108010533 A CN108010533 A CN 108010533A
Authority
CN
China
Prior art keywords
code check
target class
voice data
class code
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610957146.4A
Other languages
Chinese (zh)
Inventor
璧靛博
赵岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuwo Technology Co Ltd
Original Assignee
Beijing Kuwo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuwo Technology Co Ltd filed Critical Beijing Kuwo Technology Co Ltd
Priority to CN201610957146.4A priority Critical patent/CN108010533A/en
Publication of CN108010533A publication Critical patent/CN108010533A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The present invention relates to the automatic identifying method and device of voice data code check.The described method includes:According to automatic identification training pattern, voice data to be predicted is labeled, obtains the labeled data with target class code check form and the labeled data with non-target class code check form;By the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, labeled data of the output with target class code check form.The embodiment of the present invention is labeled voice data to be predicted, obtains the labeled data with target class code check form and the labeled data with non-target class code check form according to automatic identification training pattern;And by the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, realize the process that different voice data code checks are carried out with automatic identification.

Description

The automatic identifying method and device of voice data code check
Technical field
The present invention relates to Audiotechnica field, and specifically, the present invention relates to the automatic identifying method of voice data code check And device.
Background technology
At present, (MPEG-1or MPEG-2Audio Layer III, dynamic image expert group -1 or dynamic image are special by MP3 Family -2 audio layer III of group) it is current most popular a kind of digital audio encoding and lossy compression method form, it is designed to significantly Reduce amount of audio data.MP3 is lossy compression method form, and the less music file of capacity, makes transmission and storage more convenient, More conducively user uses, and therefore, MP3 is developed rapidly.One of important technology used in MP3 is human body acoustic model, The technology has been given up to the unessential part of human auditory system in pulse code modulation voice data, so that digital audio file Compressed.
According to different code checks, the audio file of MP3 format is compressed.Unit interval when code check is exactly data transfer The data bits of transmission, code check, which represents that the video/audio after compressed encoding is per second, to be needed to be represented with how many a bits, The unit that code check generally uses is kbps, i.e. kilobit is per second.Based on the correspondence between size of data and tonequality, mainstream code check Including 320kbps, 256kbps, 224kbps, 192kbps, 128kbps, 96kbps, 64kbps.However, as music format turns The popularization of software is changed, the false high code check digital music largely converted by low bit- rate occurs in the market, this false high Code check digital music cause the actual musical qualities enjoyed of user with expect it is inconsistent, reduce user experience.
At present, for digital music service provider, the recognition methods of audio code rate is mainly the different sound of manual identified Frequency code rate.But the manual identified of audio code rate not only needs to consume substantial amounts of human cost, but also inefficiency, identification Accuracy rate is low, it is difficult to carries out quality monitoring to the identification quality of the manual identified of audio code rate, therefore, it is necessary to a kind of voice data The automatic identifying method of code check, realizes and carries out automatic identification to the code check of different voice datas.
The content of the invention
The embodiment of the present invention is the automatic identifying method and device for providing voice data code check, passes through the sound to collecting Frequency obtains the automatic identification training pattern of voice data code check according to model training is carried out;It is right according to automatic identification training pattern Voice data to be predicted is labeled, and obtains labeled data with target class code check form and with non-target class code check form Labeled data, so as to fulfill to the code check of different voice datas carry out automatic identification process.
In a first aspect, an embodiment of the present invention provides the automatic identifying method of voice data code check, the described method includes:
By carrying out model training to the voice data collected, the automatic identification training of the voice data code check is obtained Model;
According to the automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check The labeled data of form and the labeled data with non-target class code check form;
The probability that the labeled data with target class code check form occurs and pre-set threshold probability are carried out Compare, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, The then output labeled data with target class code check form.
Preferably, it is described by carrying out model training to the voice data collected, obtain the voice data code check Automatic identification training pattern specifically includes:
The voice data is labeled, to generate the training sample of the labeled data with the target class code check form This;
Sonograph conversion is carried out to the voice data of the labeled data with the target class code check form, is obtained corresponding Sonograph;
Picture scaling is carried out to the sonograph, obtains corresponding thumbnail;
Model training is carried out to the view data of the thumbnail using convolutional neural networks algorithm, obtains corresponding audio The training pattern of the automatic identification of data bit rate.
Preferably, the target class code check is the target class code check of MP3 format, and the target class code check of the MP3 format Specifically include the code check of following 320kbps, the code check of 256kbps, the code check of 224kbps, the code check of 192kbps, 128kbps Any code check in the code check of code check, the code check of 96kbps and 64kbps.
Preferably, the non-target class code check is the target class code check of MP3 format, and the non-target class of the MP3 format Code check specifically includes following remaining whole code checks different from the target class code check of the MP3 format.
Preferably, by bilinear interpolation, picture scaling is carried out to the sonograph, obtains corresponding thumbnail.
Preferably, by bilinear interpolation, using AlexNet convolutional neural networks model as training pattern, to institute The view data for stating thumbnail carries out model training, obtains the training pattern of the automatic identification of corresponding voice data code check.
Preferably, the AlexNet convolutional neural networks model specifically includes 1 input layer, 5 convolutional layers, 3 ponds Layer, 2 full articulamentums and 1 output layer.
Preferably, the automatic identification training pattern is deployed to digital music storage server cluster, with to be predicted Voice data is labeled.
Preferably, using cpu model, the automatic identification training pattern is deployed to digital music storage server collection Group.
Second aspect, an embodiment of the present invention provides the automatic identification equipment of voice data code check, described device includes:
Training pattern acquisition module, by carrying out model training to the voice data collected, obtains the voice data The automatic identification training pattern of code check;
Labeled data acquisition module, according to the automatic identification training pattern, is labeled voice data to be predicted, obtains Obtain the labeled data with target class code check form and the labeled data with non-target class code check form;
Comparison module, the probability that the labeled data with target class code check form is occurred and pre-set threshold value Probability is compared, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold It is worth probability, then the output labeled data with target class code check form.
An embodiment of the present invention provides the automatic identifying method of voice data code check, by the voice data to collecting into Row model training, obtains the automatic identification training pattern of voice data code check;According to automatic identification training pattern, to sound to be predicted Frequency obtains the labeled data with target class code check form and the mark number with non-target class code check form according to being labeled According to;By the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, if tool The probability that the labeled data for having target class code check form occurs is more than or equal to pre-set threshold probability, then output has target The labeled data of class code check form, so as to fulfill different voice data code checks are carried out with the process of automatic identification.The present invention is implemented Example by the probability that will occur with the labeled data of target class code check form compared with pre-set threshold probability, if The probability that labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, then output has mesh The labeled data of class code check form is marked, so as to fulfill different voice data code checks are carried out with the process of automatic identification.
Brief description of the drawings
Fig. 1 is the automatic identifying method flow chart of voice data code check provided in an embodiment of the present invention;
Fig. 2 is the automatic identification equipment structure diagram of voice data code check provided in an embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work, belong to the scope of protection of the invention.
For ease of the understanding to the embodiment of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment It is bright.
In technical solution provided by the present invention, by carrying out model training to the voice data collected, audio is obtained The automatic identification training pattern of data bit rate;According to automatic identification training pattern, voice data to be predicted is labeled, is obtained Labeled data with target class code check form and the labeled data with non-target class code check form;There to be target class code check The probability that the labeled data of form occurs is compared with pre-set threshold probability, if the mark with target class code check form The probability that note data occur is more than or equal to pre-set threshold probability, then mark number of the output with target class code check form According to it is achieved thereby that different voice data code checks are carried out with the process of automatic identification.
The technical solution that the invention will now be described in detail with reference to the accompanying drawings.
The automatic identifying method flow chart of voice data code check provided in an embodiment of the present invention, as shown in Figure 1, voice data The automatic identifying method of code check includes the following steps:
S101:By carrying out model training to the voice data collected, the automatic identification instruction of voice data code check is obtained Practice model.
Specifically, by carrying out model training to the voice data collected, the automatic knowledge of voice data code check is obtained Other training pattern specifically comprises the following steps:
Voice data is labeled, to generate the training sample of the labeled data with target class code check form.
In order to ensure the accuracy of the automatic identification training pattern obtained by sample training, in the specific embodiment of the invention Used voice data is specially lossless music compression generation low bit- rate music file.
Further, it is described in detail below to the preprocessing process of voice data:Rail generation WAV forms are grabbed to high tone quality CD Digital music file;By the digital music file of obtained WAV forms be transcoded into 320kbp code checks, 256kbp code checks, 224kbp code checks, 192kbp code checks, 128kbp code checks, 96kbp code checks, the MP3 format of each code check of 64kbp code checks;Will The MP3 of 320kbp code checks is as positive sample, and the MP3 of remaining six kinds of code check is as negative sample.
Sonograph conversion is carried out to the voice data of the labeled data with target class code check form, obtains corresponding sound spectrum Figure.
It should be noted that since sonograph can characterize time of sound, frequency, energy information at the same time.In order to ensure The integrality of audio data information expression, in a specific embodiment of the present invention, using the corresponding sonograph of voice data as volume The input data of product neural network algorithm.
Short Time Fourier Transform is the conventional means of spectrum analysis.Change compared to Fourier, Short Time Fourier Transform is drawn Window function is entered, the information that frequency signal changes over time can be provided.The sonograph finally obtained characterizes the time with abscissa, Ordinate characterization frequency, characterization energy size, wherein, the energy characterization of sonograph uses RGB color model.
In a specific embodiment of the present invention, the energy characterization of sonograph is in addition to using RGB color model, sound The energy characterization of spectrogram can also use the energy characterization mode of gray scale sonograph.
In order to ensure the accuracy of the automatic identification of voice data code check, to the labeled data with target class code check form Voice data carry out sonograph conversion, the process for obtaining corresponding sonograph is described in detail below:
Picture scaling is carried out to sonograph, obtains corresponding thumbnail.
It should be noted that due to using view data of the convolutional neural networks algorithm to thumbnail in the embodiment of the present invention Model training is carried out, and since convolutional neural networks algorithm only receives the view data of fixed size, using convolution god Model training is carried out before, it is necessary to the corresponding sonograph of each voice data to the view data of thumbnail through network algorithm Size carries out specification.
In a specific embodiment of the present invention, by bilinear interpolation, picture scaling is carried out to sonograph, is obtained corresponding Thumbnail.
Picture scaling is carried out to sonograph using bilinear interpolation, can not only take into account the Gao Lian of pixel in view data Continuous property, but also the complexity of algorithm can be further improved, the thumbnail of the sonograph enabled to more approaches In real sonograph.
Model training is carried out to the view data of thumbnail using convolutional neural networks algorithm, obtains corresponding voice data The training pattern of the automatic identification of code check.
In a specific embodiment of the present invention, respectively to the data of tetra- kinds of sizes of 28*28,56*56,84*84,256*256 Collection has carried out model training, the results show:Image is bigger, the training mould of the automatic identification of obtained corresponding voice data code check The accuracy rate of type is higher.Further, as a result also show:Picture is bigger, and the training speed of model training is slower.
In practical applications, it is often not high to the requirement of real-time of the automatic identification of voice data code check, according to 256* 256 picture size, has obtained the training pattern of the automatic identification of the voice data code check of high-accuracy.
Convolutional neural networks algorithm is a kind of feedforward neural network algorithm, which can be recognized with the vision of the approximate simulation mankind Know process, had a wide range of applications in image real time transfer field.
Further, it is right using AlexNet convolutional neural networks model as training pattern by bilinear interpolation The view data of thumbnail carries out model training, obtains the training pattern of the automatic identification of corresponding voice data code check.Wherein, AlexNet convolutional neural networks models specifically include 1 input layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 Output layer.
In a specific embodiment of the present invention, using AlexNet convolutional neural networks model as training pattern, align, Negative sample is trained.
In a specific embodiment of the present invention, the code check to 320kbps, the code check of 256kbps, the code of 224kbps respectively Rate, the code check of 192kbps, the code check of 128kbps, the MP3 format of the code check of the code check of 96kbps and 64kbps each code check Data set has carried out model training, the results show:The recognition accuracy of the MP3 of the code check of 320kbps has reached 98.54%.
It should be noted that in a specific embodiment of the present invention, except the music data for MP3 format carries out multi-code Outside rate automatic identification, the music data of WMA, AAC, OGG form carries out the automatic identification of multi code Rate of Chinese character.
It should be noted that in a specific embodiment of the present invention, using AlexNet convolutional neural networks model as instruction The reason for practicing model is that the number of parameters of the model is about 60,000,000, is 12 times of GoogleNet models, the expression of the model Ability is strong, easily gets more accurate features.
Further, AlexNet convolutional neural networks model additionally uses the technologies such as ReLU, LRN, Dropout, effectively slow The problem of having solved activation primitive saturation, and the problem of model over-fitting, meanwhile, improve the operational performance of model.
Further, for acceleration model training process, during model training CUDA+GPU is employed to be accelerated, with Shorten the training time of the training pattern for the automatic identification for obtaining voice data code check.
It should be noted that in a specific embodiment of the present invention, except being made using AlexNet convolutional neural networks model Outside training pattern, other convolutional neural networks models such as LeNet, GoogleNet, VGG can also be used as training mould Type, remaining these convolutional neural networks model as training pattern technical solution also the present invention specific embodiment protection In scheme.
S102:According to automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check The labeled data of form and the labeled data with non-target class code check form.
It should be noted that target class code check is the target class code check of MP3 format, and the target class code check tool of MP3 format Body includes the code check of following 320kbps, the code check of 256kbps, the code check of 224kbps, the code check of 192kbps, the code of 128kbps Any code check in the code check of rate, the code check of 96kbps and 64kbps.
Non-target class code check be MP3 format target class code check, and the non-target class code check of MP3 format specifically include it is as follows The whole code checks of remaining different from the target class code check of foregoing MP3 format.
S103:The probability occurred with the labeled data of target class code check form and pre-set threshold probability are carried out Compare, it is defeated if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability Go out the labeled data with target class code check form.
In addition, in a specific embodiment of the present invention, the automatic identifying method of voice data code check further includes:Will be certainly Dynamic recognition training model is deployed to digital music storage server cluster, to be labeled to voice data to be predicted.
In a specific embodiment of the present invention, using GPU patterns, automatic identification training pattern is deployed to digital music and is deposited Store up server cluster.
Specifically, single GPU cluster is deployed to using GPU patterns, digital music is moved to the GPU cluster carries out Mark.
It is that arithmetic speed faster, for digital music mark task is related to substantial amounts of audio number using the advantages of GPU patterns According to causing the difficulty of Data Migration, be that cost is excessive using the shortcomings that GPU patterns still.Based on voice data code check from Dynamic identification requires low cost to the of less demanding of real-time, is not more preferably mode using GPU patterns.It is if it is required that high Speed, the application scenarios of online service, it may be considered that single GPU cluster is deployed to using GPU patterns, digital music is moved It is labeled to the GPU cluster.
In a specific embodiment of the present invention, using cpu model, automatic identification training pattern is deployed to digital music and is deposited Store up server cluster.
Specifically, single CPU cluster is deployed to using cpu model, digital music is moved to the CPU cluster carries out Mark.
Automatic identification based on voice data code check requires low cost to the of less demanding of real-time, using CPU moulds Formula is more preferably mode.If for the application scenarios of batch processing voice data under line, it may be considered that disposed using cpu model To single CPU cluster, digital music is moved to the CPU cluster and is labeled.
In a specific embodiment of the present invention, in addition to CPU cluster deployment way, GPU cluster deployment way, PC, mobile phone Etc. disposing just in the scheme of the specific embodiment of the present invention for other hardware devices.
In conclusion the automatic identifying method of voice data code check provided in an embodiment of the present invention, by collecting Voice data carries out model training, obtains the automatic identification training pattern of voice data code check;According to automatic identification training pattern, Voice data to be predicted is labeled, obtains labeled data with target class code check form and with non-target class code check lattice The labeled data of formula;The probability occurred with the labeled data of target class code check form and pre-set threshold probability are carried out Compare, it is defeated if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability Go out the labeled data with target class code check form, so as to fulfill different voice data code checks are carried out with the process of automatic identification.
As shown in Fig. 2, it is the internal structure of wireless device automatic positioning equipment in building provided in an embodiment of the present invention Block diagram;As shown in Fig. 2, the automatic identification equipment for the voice data code check that the embodiment of the present invention is provided, including:Training pattern obtains Modulus block 201, labeled data acquisition module 202 and comparison module 203.
Specifically, training pattern acquisition module, by carrying out model training to the voice data collected, obtains audio The automatic identification training pattern of data bit rate.
Further, training pattern acquisition module is specifically used for:Voice data is labeled, there is target class with generation The training sample of the labeled data of code check form;
Sonograph conversion is carried out to the voice data of the labeled data with target class code check form, obtains corresponding sound spectrum Figure;
Picture scaling is carried out to sonograph, obtains corresponding thumbnail;
Further, training pattern acquisition module carries out picture scaling to sonograph, obtains phase by bilinear interpolation The thumbnail answered.
Model training is carried out to the view data of thumbnail using convolutional neural networks algorithm, obtains corresponding voice data The training pattern of the automatic identification of code check.
Further, training pattern acquisition module is by bilinear interpolation, using AlexNet convolutional neural networks models As training pattern, model training is carried out to the view data of thumbnail, obtains the automatic identification of corresponding voice data code check Training pattern.
Wherein, AlexNet convolutional neural networks models specifically include 1 input used by training pattern acquisition module Layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 output layer.
Labeled data acquisition module, according to automatic identification training pattern, is labeled voice data to be predicted, is had There are the labeled data of target class code check form and the labeled data with non-target class code check form.
Wherein, the target class code check for the labeled data that labeled data acquisition module is got is the target class code of MP3 format Rate, and the target class code check of MP3 format specifically includes the code check of following 320kbps, the code check of 256kbps, the code of 224kbps Rate, the code check of 192kbps, the code check of 128kbps, the code check of 96kbps and 64kbps code check in any code check.
The non-target class code check for the labeled data that labeled data acquisition module is got is the target class code check of MP3 format, And the non-target class code check of MP3 format specifically includes following remaining whole codes different from the target class code check of foregoing MP3 format Rate.
Comparison module, by the probability occurred with the labeled data of target class code check form and pre-set threshold probability It is compared, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, Then labeled data of the output with target class code check form.
In addition, the automatic identification equipment of voice data code check further includes training pattern deployment module and (does not mark in fig. 2 Go out).
Training pattern deployment module, is deployed to digital music storage server cluster, with right by automatic identification training pattern Voice data to be predicted is labeled.
Further, training pattern deployment module, using cpu model, digital sound is deployed to by automatic identification training pattern Happy storage server cluster.
In technical scheme, by carrying out model training to the voice data collected, voice data code is obtained The automatic identification training pattern of rate;According to automatic identification training pattern, voice data to be predicted is labeled, acquisition has mesh Mark the labeled data of class code check form and the labeled data with non-target class code check form;By with target class code check form The probability that labeled data occurs is compared with pre-set threshold probability, if the labeled data with target class code check form The probability of appearance is more than or equal to pre-set threshold probability, then labeled data of the output with target class code check form, so that Realize the process that different voice data code checks are carried out with automatic identification.
Above-described embodiment, has carried out the purpose of the present invention, technical solution and beneficial effect further Describe in detail, it should be understood that the foregoing is merely the embodiment of the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution, improvement and etc. done, should all include Within protection scope of the present invention.

Claims (10)

1. the automatic identifying method of voice data code check, it is characterised in that including:
By carrying out model training to the voice data collected, the automatic identification training mould of the voice data code check is obtained Type;
According to the automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check form Labeled data and with non-target class code check form labeled data;
By the probability of the labeled data appearance with target class code check form compared with pre-set threshold probability, If the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, export The labeled data with target class code check form.
It is 2. according to the method described in claim 1, it is characterized in that, described by carrying out model instruction to the voice data collected Practice, the automatic identification training pattern for obtaining the voice data code check specifically includes:
The voice data is labeled, to generate the training sample of the labeled data with the target class code check form;
Sonograph conversion is carried out to the voice data of the labeled data with the target class code check form, obtains corresponding sound spectrum Figure;
Picture scaling is carried out to the sonograph, obtains corresponding thumbnail;
Model training is carried out to the view data of the thumbnail using convolutional neural networks algorithm, obtains corresponding voice data The training pattern of the automatic identification of code check.
3. according to the method described in claim 1, it is characterized in that, the target class code check be MP3 format target class code check, And the target class code check of the MP3 format specifically includes the code check of following 320kbps, the code check of 256kbps, the code of 224kbps Rate, the code check of 192kbps, the code check of 128kbps, the code check of 96kbps and 64kbps code check in any code check.
4. according to the method described in claim 3, it is characterized in that, the non-target class code check is the target class code of MP3 format Rate, and the non-target class code check of the MP3 format specifically includes remaining whole different from the target class code check of the MP3 format Code check.
5. according to the method described in claim 2, it is characterized in that, by bilinear interpolation, figure is carried out to the sonograph Piece scales, and obtains corresponding thumbnail.
6. according to the method described in claim 2, it is characterized in that, by bilinear interpolation, using AlexNet convolutional Neurals Network model carries out model training to the view data of the thumbnail, obtains corresponding voice data code as training pattern The training pattern of the automatic identification of rate.
7. according to the method described in claim 6, it is characterized in that, the AlexNet convolutional neural networks model specifically includes 1 A input layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 output layer.
8. according to the method described in claim 1, it is characterized in that, the method further includes:The automatic identification is trained into mould Type is deployed to digital music storage server cluster, to be labeled to voice data to be predicted.
9. according to the method described in claim 8, it is characterized in that, using cpu model, by the automatic identification training pattern portion Affix one's name to digital music storage server cluster.
10. the automatic identification equipment of voice data code check, it is characterised in that including:
Training pattern acquisition module, by carrying out model training to the voice data collected, obtains the voice data code check Automatic identification training pattern;
Labeled data acquisition module, according to the automatic identification training pattern, is labeled voice data to be predicted, is had There are the labeled data of target class code check form and the labeled data with non-target class code check form;
Comparison module, the probability that the labeled data with target class code check form is occurred and pre-set threshold probability It is compared, if the probability that the labeled data with target class code check form occurs is general more than or equal to pre-set threshold value Rate, then have the labeled data of target class code check form described in output.
CN201610957146.4A 2016-10-27 2016-10-27 The automatic identifying method and device of voice data code check Pending CN108010533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610957146.4A CN108010533A (en) 2016-10-27 2016-10-27 The automatic identifying method and device of voice data code check

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610957146.4A CN108010533A (en) 2016-10-27 2016-10-27 The automatic identifying method and device of voice data code check

Publications (1)

Publication Number Publication Date
CN108010533A true CN108010533A (en) 2018-05-08

Family

ID=62048392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610957146.4A Pending CN108010533A (en) 2016-10-27 2016-10-27 The automatic identifying method and device of voice data code check

Country Status (1)

Country Link
CN (1) CN108010533A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036465A (en) * 2018-06-28 2018-12-18 南京邮电大学 Speech-emotion recognition method
CN110807159A (en) * 2019-10-30 2020-02-18 同盾控股有限公司 Data marking method and device, storage medium and electronic equipment
CN110992963A (en) * 2019-12-10 2020-04-10 腾讯科技(深圳)有限公司 Network communication method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394065A (en) * 2011-11-04 2012-03-28 中山大学 Analysis method of digital audio fake quality WAVE file
CN102413378A (en) * 2011-11-02 2012-04-11 杭州电子科技大学 Adaptive neural network-based lost packet recovery method in video transmission
CN102903379A (en) * 2012-09-14 2013-01-30 浪潮(北京)电子信息产业有限公司 Method and device for detecting MP3 file authenticity
CN103871405A (en) * 2014-01-14 2014-06-18 中山大学 AMR audio authenticating method
CN104123935A (en) * 2014-07-16 2014-10-29 武汉大学 Double compression detection method towards MP3 (moving picture experts group audio Layer-3) digital audio file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413378A (en) * 2011-11-02 2012-04-11 杭州电子科技大学 Adaptive neural network-based lost packet recovery method in video transmission
CN102394065A (en) * 2011-11-04 2012-03-28 中山大学 Analysis method of digital audio fake quality WAVE file
CN102903379A (en) * 2012-09-14 2013-01-30 浪潮(北京)电子信息产业有限公司 Method and device for detecting MP3 file authenticity
CN103871405A (en) * 2014-01-14 2014-06-18 中山大学 AMR audio authenticating method
CN104123935A (en) * 2014-07-16 2014-10-29 武汉大学 Double compression detection method towards MP3 (moving picture experts group audio Layer-3) digital audio file

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL SEICHTER等: ""AAC encoding detection and bitrate estimation using a convolutional neural network"", 《2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
高冲红 等: ""基于CNN的录音设备判别研究"", 《信息化研究》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036465A (en) * 2018-06-28 2018-12-18 南京邮电大学 Speech-emotion recognition method
CN109036465B (en) * 2018-06-28 2021-05-11 南京邮电大学 Speech emotion recognition method
CN110807159A (en) * 2019-10-30 2020-02-18 同盾控股有限公司 Data marking method and device, storage medium and electronic equipment
CN110807159B (en) * 2019-10-30 2021-05-11 同盾控股有限公司 Data marking method and device, storage medium and electronic equipment
CN110992963A (en) * 2019-12-10 2020-04-10 腾讯科技(深圳)有限公司 Network communication method, device, computer equipment and storage medium
CN110992963B (en) * 2019-12-10 2023-09-29 腾讯科技(深圳)有限公司 Network communication method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104347067B (en) Audio signal classification method and device
CN1185626C (en) System and method for modifying speech signals
CN103026407B (en) Bandwidth extender
CN110223705A (en) Phonetics transfer method, device, equipment and readable storage medium storing program for executing
CN108053836A (en) A kind of audio automation mask method based on deep learning
CN111696580B (en) Voice detection method and device, electronic equipment and storage medium
CN105321525A (en) System and method for reducing VOIP (voice over internet protocol) communication resource overhead
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN101599271A (en) A kind of recognition methods of digital music emotion
CN110047510A (en) Audio identification methods, device, computer equipment and storage medium
WO2011128723A1 (en) Audio communication device, method for outputting an audio signal, and communication system
CN111508469A (en) Text-to-speech conversion method and device
CN108206027A (en) A kind of audio quality evaluation method and system
CN107895571A (en) Lossless audio file identification method and device
WO2023116660A2 (en) Model training and tone conversion method and apparatus, device, and medium
CN108010533A (en) The automatic identifying method and device of voice data code check
CN113129927B (en) Voice emotion recognition method, device, equipment and storage medium
CN106375780A (en) Method and apparatus for generating multimedia file
CN104064191B (en) Sound mixing method and device
CN107293306A (en) A kind of appraisal procedure of the Objective speech quality based on output
CN110931045A (en) Audio feature generation method based on convolutional neural network
CN1049062C (en) Method of converting speech
CN109036470A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN114338623B (en) Audio processing method, device, equipment and medium
CN106233112A (en) Coding method and equipment and signal decoding method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180508