CN108010533A

CN108010533A - The automatic identifying method and device of voice data code check

Info

Publication number: CN108010533A
Application number: CN201610957146.4A
Authority: CN
Inventors: 璧靛博; 赵岩
Original assignee: Beijing Kuwo Technology Co Ltd
Current assignee: Beijing Kuwo Technology Co Ltd
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2018-05-08

Abstract

The present invention relates to the automatic identifying method and device of voice data code check.The described method includes：According to automatic identification training pattern, voice data to be predicted is labeled, obtains the labeled data with target class code check form and the labeled data with non-target class code check form；By the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, labeled data of the output with target class code check form.The embodiment of the present invention is labeled voice data to be predicted, obtains the labeled data with target class code check form and the labeled data with non-target class code check form according to automatic identification training pattern；And by the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, realize the process that different voice data code checks are carried out with automatic identification.

Description

The automatic identifying method and device of voice data code check

Technical field

The present invention relates to Audiotechnica field, and specifically, the present invention relates to the automatic identifying method of voice data code check And device.

Background technology

At present, (MPEG-1or MPEG-2Audio Layer III, dynamic image expert group -1 or dynamic image are special by MP3 Family -2 audio layer III of group) it is current most popular a kind of digital audio encoding and lossy compression method form, it is designed to significantly Reduce amount of audio data.MP3 is lossy compression method form, and the less music file of capacity, makes transmission and storage more convenient, More conducively user uses, and therefore, MP3 is developed rapidly.One of important technology used in MP3 is human body acoustic model, The technology has been given up to the unessential part of human auditory system in pulse code modulation voice data, so that digital audio file Compressed.

According to different code checks, the audio file of MP3 format is compressed.Unit interval when code check is exactly data transfer The data bits of transmission, code check, which represents that the video/audio after compressed encoding is per second, to be needed to be represented with how many a bits, The unit that code check generally uses is kbps, i.e. kilobit is per second.Based on the correspondence between size of data and tonequality, mainstream code check Including 320kbps, 256kbps, 224kbps, 192kbps, 128kbps, 96kbps, 64kbps.However, as music format turns The popularization of software is changed, the false high code check digital music largely converted by low bit- rate occurs in the market, this false high Code check digital music cause the actual musical qualities enjoyed of user with expect it is inconsistent, reduce user experience.

At present, for digital music service provider, the recognition methods of audio code rate is mainly the different sound of manual identified Frequency code rate.But the manual identified of audio code rate not only needs to consume substantial amounts of human cost, but also inefficiency, identification Accuracy rate is low, it is difficult to carries out quality monitoring to the identification quality of the manual identified of audio code rate, therefore, it is necessary to a kind of voice data The automatic identifying method of code check, realizes and carries out automatic identification to the code check of different voice datas.

The content of the invention

The embodiment of the present invention is the automatic identifying method and device for providing voice data code check, passes through the sound to collecting Frequency obtains the automatic identification training pattern of voice data code check according to model training is carried out；It is right according to automatic identification training pattern Voice data to be predicted is labeled, and obtains labeled data with target class code check form and with non-target class code check form Labeled data, so as to fulfill to the code check of different voice datas carry out automatic identification process.

In a first aspect, an embodiment of the present invention provides the automatic identifying method of voice data code check, the described method includes：

By carrying out model training to the voice data collected, the automatic identification training of the voice data code check is obtained Model；

According to the automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check The labeled data of form and the labeled data with non-target class code check form；

The probability that the labeled data with target class code check form occurs and pre-set threshold probability are carried out Compare, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, The then output labeled data with target class code check form.

Preferably, it is described by carrying out model training to the voice data collected, obtain the voice data code check Automatic identification training pattern specifically includes：

The voice data is labeled, to generate the training sample of the labeled data with the target class code check form This；

Sonograph conversion is carried out to the voice data of the labeled data with the target class code check form, is obtained corresponding Sonograph；

Picture scaling is carried out to the sonograph, obtains corresponding thumbnail；

Model training is carried out to the view data of the thumbnail using convolutional neural networks algorithm, obtains corresponding audio The training pattern of the automatic identification of data bit rate.

Preferably, the target class code check is the target class code check of MP3 format, and the target class code check of the MP3 format Specifically include the code check of following 320kbps, the code check of 256kbps, the code check of 224kbps, the code check of 192kbps, 128kbps Any code check in the code check of code check, the code check of 96kbps and 64kbps.

Preferably, the non-target class code check is the target class code check of MP3 format, and the non-target class of the MP3 format Code check specifically includes following remaining whole code checks different from the target class code check of the MP3 format.

Preferably, by bilinear interpolation, picture scaling is carried out to the sonograph, obtains corresponding thumbnail.

Preferably, by bilinear interpolation, using AlexNet convolutional neural networks model as training pattern, to institute The view data for stating thumbnail carries out model training, obtains the training pattern of the automatic identification of corresponding voice data code check.

Preferably, the AlexNet convolutional neural networks model specifically includes 1 input layer, 5 convolutional layers, 3 ponds Layer, 2 full articulamentums and 1 output layer.

Preferably, the automatic identification training pattern is deployed to digital music storage server cluster, with to be predicted Voice data is labeled.

Preferably, using cpu model, the automatic identification training pattern is deployed to digital music storage server collection Group.

Second aspect, an embodiment of the present invention provides the automatic identification equipment of voice data code check, described device includes：

Training pattern acquisition module, by carrying out model training to the voice data collected, obtains the voice data The automatic identification training pattern of code check；

Labeled data acquisition module, according to the automatic identification training pattern, is labeled voice data to be predicted, obtains Obtain the labeled data with target class code check form and the labeled data with non-target class code check form；

Comparison module, the probability that the labeled data with target class code check form is occurred and pre-set threshold value Probability is compared, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold It is worth probability, then the output labeled data with target class code check form.

An embodiment of the present invention provides the automatic identifying method of voice data code check, by the voice data to collecting into Row model training, obtains the automatic identification training pattern of voice data code check；According to automatic identification training pattern, to sound to be predicted Frequency obtains the labeled data with target class code check form and the mark number with non-target class code check form according to being labeled According to；By the probability occurred with the labeled data of target class code check form compared with pre-set threshold probability, if tool The probability that the labeled data for having target class code check form occurs is more than or equal to pre-set threshold probability, then output has target The labeled data of class code check form, so as to fulfill different voice data code checks are carried out with the process of automatic identification.The present invention is implemented Example by the probability that will occur with the labeled data of target class code check form compared with pre-set threshold probability, if The probability that labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, then output has mesh The labeled data of class code check form is marked, so as to fulfill different voice data code checks are carried out with the process of automatic identification.

Brief description of the drawings

Fig. 1 is the automatic identifying method flow chart of voice data code check provided in an embodiment of the present invention；

Fig. 2 is the automatic identification equipment structure diagram of voice data code check provided in an embodiment of the present invention.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work, belong to the scope of protection of the invention.

For ease of the understanding to the embodiment of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment It is bright.

In technical solution provided by the present invention, by carrying out model training to the voice data collected, audio is obtained The automatic identification training pattern of data bit rate；According to automatic identification training pattern, voice data to be predicted is labeled, is obtained Labeled data with target class code check form and the labeled data with non-target class code check form；There to be target class code check The probability that the labeled data of form occurs is compared with pre-set threshold probability, if the mark with target class code check form The probability that note data occur is more than or equal to pre-set threshold probability, then mark number of the output with target class code check form According to it is achieved thereby that different voice data code checks are carried out with the process of automatic identification.

The technical solution that the invention will now be described in detail with reference to the accompanying drawings.

The automatic identifying method flow chart of voice data code check provided in an embodiment of the present invention, as shown in Figure 1, voice data The automatic identifying method of code check includes the following steps：

S101：By carrying out model training to the voice data collected, the automatic identification instruction of voice data code check is obtained Practice model.

Specifically, by carrying out model training to the voice data collected, the automatic knowledge of voice data code check is obtained Other training pattern specifically comprises the following steps：

Voice data is labeled, to generate the training sample of the labeled data with target class code check form.

In order to ensure the accuracy of the automatic identification training pattern obtained by sample training, in the specific embodiment of the invention Used voice data is specially lossless music compression generation low bit- rate music file.

Further, it is described in detail below to the preprocessing process of voice data：Rail generation WAV forms are grabbed to high tone quality CD Digital music file；By the digital music file of obtained WAV forms be transcoded into 320kbp code checks, 256kbp code checks, 224kbp code checks, 192kbp code checks, 128kbp code checks, 96kbp code checks, the MP3 format of each code check of 64kbp code checks；Will The MP3 of 320kbp code checks is as positive sample, and the MP3 of remaining six kinds of code check is as negative sample.

Sonograph conversion is carried out to the voice data of the labeled data with target class code check form, obtains corresponding sound spectrum Figure.

It should be noted that since sonograph can characterize time of sound, frequency, energy information at the same time.In order to ensure The integrality of audio data information expression, in a specific embodiment of the present invention, using the corresponding sonograph of voice data as volume The input data of product neural network algorithm.

Short Time Fourier Transform is the conventional means of spectrum analysis.Change compared to Fourier, Short Time Fourier Transform is drawn Window function is entered, the information that frequency signal changes over time can be provided.The sonograph finally obtained characterizes the time with abscissa, Ordinate characterization frequency, characterization energy size, wherein, the energy characterization of sonograph uses RGB color model.

In a specific embodiment of the present invention, the energy characterization of sonograph is in addition to using RGB color model, sound The energy characterization of spectrogram can also use the energy characterization mode of gray scale sonograph.

In order to ensure the accuracy of the automatic identification of voice data code check, to the labeled data with target class code check form Voice data carry out sonograph conversion, the process for obtaining corresponding sonograph is described in detail below：

Picture scaling is carried out to sonograph, obtains corresponding thumbnail.

It should be noted that due to using view data of the convolutional neural networks algorithm to thumbnail in the embodiment of the present invention Model training is carried out, and since convolutional neural networks algorithm only receives the view data of fixed size, using convolution god Model training is carried out before, it is necessary to the corresponding sonograph of each voice data to the view data of thumbnail through network algorithm Size carries out specification.

In a specific embodiment of the present invention, by bilinear interpolation, picture scaling is carried out to sonograph, is obtained corresponding Thumbnail.

Picture scaling is carried out to sonograph using bilinear interpolation, can not only take into account the Gao Lian of pixel in view data Continuous property, but also the complexity of algorithm can be further improved, the thumbnail of the sonograph enabled to more approaches In real sonograph.

Model training is carried out to the view data of thumbnail using convolutional neural networks algorithm, obtains corresponding voice data The training pattern of the automatic identification of code check.

In a specific embodiment of the present invention, respectively to the data of tetra- kinds of sizes of 28*28,56*56,84*84,256*256 Collection has carried out model training, the results show：Image is bigger, the training mould of the automatic identification of obtained corresponding voice data code check The accuracy rate of type is higher.Further, as a result also show：Picture is bigger, and the training speed of model training is slower.

In practical applications, it is often not high to the requirement of real-time of the automatic identification of voice data code check, according to 256* 256 picture size, has obtained the training pattern of the automatic identification of the voice data code check of high-accuracy.

Convolutional neural networks algorithm is a kind of feedforward neural network algorithm, which can be recognized with the vision of the approximate simulation mankind Know process, had a wide range of applications in image real time transfer field.

Further, it is right using AlexNet convolutional neural networks model as training pattern by bilinear interpolation The view data of thumbnail carries out model training, obtains the training pattern of the automatic identification of corresponding voice data code check.Wherein, AlexNet convolutional neural networks models specifically include 1 input layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 Output layer.

In a specific embodiment of the present invention, using AlexNet convolutional neural networks model as training pattern, align, Negative sample is trained.

In a specific embodiment of the present invention, the code check to 320kbps, the code check of 256kbps, the code of 224kbps respectively Rate, the code check of 192kbps, the code check of 128kbps, the MP3 format of the code check of the code check of 96kbps and 64kbps each code check Data set has carried out model training, the results show：The recognition accuracy of the MP3 of the code check of 320kbps has reached 98.54%.

It should be noted that in a specific embodiment of the present invention, except the music data for MP3 format carries out multi-code Outside rate automatic identification, the music data of WMA, AAC, OGG form carries out the automatic identification of multi code Rate of Chinese character.

It should be noted that in a specific embodiment of the present invention, using AlexNet convolutional neural networks model as instruction The reason for practicing model is that the number of parameters of the model is about 60,000,000, is 12 times of GoogleNet models, the expression of the model Ability is strong, easily gets more accurate features.

Further, AlexNet convolutional neural networks model additionally uses the technologies such as ReLU, LRN, Dropout, effectively slow The problem of having solved activation primitive saturation, and the problem of model over-fitting, meanwhile, improve the operational performance of model.

Further, for acceleration model training process, during model training CUDA+GPU is employed to be accelerated, with Shorten the training time of the training pattern for the automatic identification for obtaining voice data code check.

It should be noted that in a specific embodiment of the present invention, except being made using AlexNet convolutional neural networks model Outside training pattern, other convolutional neural networks models such as LeNet, GoogleNet, VGG can also be used as training mould Type, remaining these convolutional neural networks model as training pattern technical solution also the present invention specific embodiment protection In scheme.

S102：According to automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check The labeled data of form and the labeled data with non-target class code check form.

It should be noted that target class code check is the target class code check of MP3 format, and the target class code check tool of MP3 format Body includes the code check of following 320kbps, the code check of 256kbps, the code check of 224kbps, the code check of 192kbps, the code of 128kbps Any code check in the code check of rate, the code check of 96kbps and 64kbps.

Non-target class code check be MP3 format target class code check, and the non-target class code check of MP3 format specifically include it is as follows The whole code checks of remaining different from the target class code check of foregoing MP3 format.

S103：The probability occurred with the labeled data of target class code check form and pre-set threshold probability are carried out Compare, it is defeated if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability Go out the labeled data with target class code check form.

In addition, in a specific embodiment of the present invention, the automatic identifying method of voice data code check further includes：Will be certainly Dynamic recognition training model is deployed to digital music storage server cluster, to be labeled to voice data to be predicted.

In a specific embodiment of the present invention, using GPU patterns, automatic identification training pattern is deployed to digital music and is deposited Store up server cluster.

Specifically, single GPU cluster is deployed to using GPU patterns, digital music is moved to the GPU cluster carries out Mark.

It is that arithmetic speed faster, for digital music mark task is related to substantial amounts of audio number using the advantages of GPU patterns According to causing the difficulty of Data Migration, be that cost is excessive using the shortcomings that GPU patterns still.Based on voice data code check from Dynamic identification requires low cost to the of less demanding of real-time, is not more preferably mode using GPU patterns.It is if it is required that high Speed, the application scenarios of online service, it may be considered that single GPU cluster is deployed to using GPU patterns, digital music is moved It is labeled to the GPU cluster.

In a specific embodiment of the present invention, using cpu model, automatic identification training pattern is deployed to digital music and is deposited Store up server cluster.

Specifically, single CPU cluster is deployed to using cpu model, digital music is moved to the CPU cluster carries out Mark.

Automatic identification based on voice data code check requires low cost to the of less demanding of real-time, using CPU moulds Formula is more preferably mode.If for the application scenarios of batch processing voice data under line, it may be considered that disposed using cpu model To single CPU cluster, digital music is moved to the CPU cluster and is labeled.

In a specific embodiment of the present invention, in addition to CPU cluster deployment way, GPU cluster deployment way, PC, mobile phone Etc. disposing just in the scheme of the specific embodiment of the present invention for other hardware devices.

In conclusion the automatic identifying method of voice data code check provided in an embodiment of the present invention, by collecting Voice data carries out model training, obtains the automatic identification training pattern of voice data code check；According to automatic identification training pattern, Voice data to be predicted is labeled, obtains labeled data with target class code check form and with non-target class code check lattice The labeled data of formula；The probability occurred with the labeled data of target class code check form and pre-set threshold probability are carried out Compare, it is defeated if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability Go out the labeled data with target class code check form, so as to fulfill different voice data code checks are carried out with the process of automatic identification.

As shown in Fig. 2, it is the internal structure of wireless device automatic positioning equipment in building provided in an embodiment of the present invention Block diagram；As shown in Fig. 2, the automatic identification equipment for the voice data code check that the embodiment of the present invention is provided, including：Training pattern obtains Modulus block 201, labeled data acquisition module 202 and comparison module 203.

Specifically, training pattern acquisition module, by carrying out model training to the voice data collected, obtains audio The automatic identification training pattern of data bit rate.

Further, training pattern acquisition module is specifically used for：Voice data is labeled, there is target class with generation The training sample of the labeled data of code check form；

Sonograph conversion is carried out to the voice data of the labeled data with target class code check form, obtains corresponding sound spectrum Figure；

Picture scaling is carried out to sonograph, obtains corresponding thumbnail；

Further, training pattern acquisition module carries out picture scaling to sonograph, obtains phase by bilinear interpolation The thumbnail answered.

Further, training pattern acquisition module is by bilinear interpolation, using AlexNet convolutional neural networks models As training pattern, model training is carried out to the view data of thumbnail, obtains the automatic identification of corresponding voice data code check Training pattern.

Wherein, AlexNet convolutional neural networks models specifically include 1 input used by training pattern acquisition module Layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 output layer.

Labeled data acquisition module, according to automatic identification training pattern, is labeled voice data to be predicted, is had There are the labeled data of target class code check form and the labeled data with non-target class code check form.

Wherein, the target class code check for the labeled data that labeled data acquisition module is got is the target class code of MP3 format Rate, and the target class code check of MP3 format specifically includes the code check of following 320kbps, the code check of 256kbps, the code of 224kbps Rate, the code check of 192kbps, the code check of 128kbps, the code check of 96kbps and 64kbps code check in any code check.

The non-target class code check for the labeled data that labeled data acquisition module is got is the target class code check of MP3 format, And the non-target class code check of MP3 format specifically includes following remaining whole codes different from the target class code check of foregoing MP3 format Rate.

Comparison module, by the probability occurred with the labeled data of target class code check form and pre-set threshold probability It is compared, if the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, Then labeled data of the output with target class code check form.

In addition, the automatic identification equipment of voice data code check further includes training pattern deployment module and (does not mark in fig. 2 Go out).

Training pattern deployment module, is deployed to digital music storage server cluster, with right by automatic identification training pattern Voice data to be predicted is labeled.

Further, training pattern deployment module, using cpu model, digital sound is deployed to by automatic identification training pattern Happy storage server cluster.

In technical scheme, by carrying out model training to the voice data collected, voice data code is obtained The automatic identification training pattern of rate；According to automatic identification training pattern, voice data to be predicted is labeled, acquisition has mesh Mark the labeled data of class code check form and the labeled data with non-target class code check form；By with target class code check form The probability that labeled data occurs is compared with pre-set threshold probability, if the labeled data with target class code check form The probability of appearance is more than or equal to pre-set threshold probability, then labeled data of the output with target class code check form, so that Realize the process that different voice data code checks are carried out with automatic identification.

Above-described embodiment, has carried out the purpose of the present invention, technical solution and beneficial effect further Describe in detail, it should be understood that the foregoing is merely the embodiment of the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution, improvement and etc. done, should all include Within protection scope of the present invention.

Claims

1. the automatic identifying method of voice data code check, it is characterised in that including：

By carrying out model training to the voice data collected, the automatic identification training mould of the voice data code check is obtained Type；

According to the automatic identification training pattern, voice data to be predicted is labeled, acquisition has target class code check form Labeled data and with non-target class code check form labeled data；

By the probability of the labeled data appearance with target class code check form compared with pre-set threshold probability, If the probability that the labeled data with target class code check form occurs is more than or equal to pre-set threshold probability, export The labeled data with target class code check form.

It is 2. according to the method described in claim 1, it is characterized in that, described by carrying out model instruction to the voice data collected Practice, the automatic identification training pattern for obtaining the voice data code check specifically includes：

The voice data is labeled, to generate the training sample of the labeled data with the target class code check form；

Sonograph conversion is carried out to the voice data of the labeled data with the target class code check form, obtains corresponding sound spectrum Figure；

Model training is carried out to the view data of the thumbnail using convolutional neural networks algorithm, obtains corresponding voice data The training pattern of the automatic identification of code check.

3. according to the method described in claim 1, it is characterized in that, the target class code check be MP3 format target class code check, And the target class code check of the MP3 format specifically includes the code check of following 320kbps, the code check of 256kbps, the code of 224kbps Rate, the code check of 192kbps, the code check of 128kbps, the code check of 96kbps and 64kbps code check in any code check.

4. according to the method described in claim 3, it is characterized in that, the non-target class code check is the target class code of MP3 format Rate, and the non-target class code check of the MP3 format specifically includes remaining whole different from the target class code check of the MP3 format Code check.

5. according to the method described in claim 2, it is characterized in that, by bilinear interpolation, figure is carried out to the sonograph Piece scales, and obtains corresponding thumbnail.

6. according to the method described in claim 2, it is characterized in that, by bilinear interpolation, using AlexNet convolutional Neurals Network model carries out model training to the view data of the thumbnail, obtains corresponding voice data code as training pattern The training pattern of the automatic identification of rate.

7. according to the method described in claim 6, it is characterized in that, the AlexNet convolutional neural networks model specifically includes 1 A input layer, 5 convolutional layers, 3 pond layers, 2 full articulamentums and 1 output layer.

8. according to the method described in claim 1, it is characterized in that, the method further includes：The automatic identification is trained into mould Type is deployed to digital music storage server cluster, to be labeled to voice data to be predicted.

9. according to the method described in claim 8, it is characterized in that, using cpu model, by the automatic identification training pattern portion Affix one's name to digital music storage server cluster.

10. the automatic identification equipment of voice data code check, it is characterised in that including：

Training pattern acquisition module, by carrying out model training to the voice data collected, obtains the voice data code check Automatic identification training pattern；

Labeled data acquisition module, according to the automatic identification training pattern, is labeled voice data to be predicted, is had There are the labeled data of target class code check form and the labeled data with non-target class code check form；

Comparison module, the probability that the labeled data with target class code check form is occurred and pre-set threshold probability It is compared, if the probability that the labeled data with target class code check form occurs is general more than or equal to pre-set threshold value Rate, then have the labeled data of target class code check form described in output.