CN108257614A - The method and its system of audio data mark - Google Patents

The method and its system of audio data mark Download PDF

Info

Publication number
CN108257614A
CN108257614A CN201611247230.3A CN201611247230A CN108257614A CN 108257614 A CN108257614 A CN 108257614A CN 201611247230 A CN201611247230 A CN 201611247230A CN 108257614 A CN108257614 A CN 108257614A
Authority
CN
China
Prior art keywords
audio
tag along
along sort
audio data
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611247230.3A
Other languages
Chinese (zh)
Inventor
晁卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuwo Technology Co Ltd
Original Assignee
Beijing Kuwo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuwo Technology Co Ltd filed Critical Beijing Kuwo Technology Co Ltd
Priority to CN201611247230.3A priority Critical patent/CN108257614A/en
Publication of CN108257614A publication Critical patent/CN108257614A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The present invention provides a kind of method and its system of audio data mark, and this method includes:Receive audio data to be marked;The audio fragment of audio data to be marked is obtained, audio fragment is analyzed using at least one training pattern of pre-training, determines the tag along sort of audio fragment;Tag along sort is marked for the corresponding audio data to be marked of audio fragment.The automation mark of audio data is realized, improves the accuracy rate of audio data mark.

Description

The method and its system of audio data mark
Technical field
The present invention relates to audio analysis and processing technology field, especially design a kind of method of audio data mark and its are System.
Background technology
With the fast development of Technology of Audio Collection and Internet technology, a large amount of audio data (example can be all generated daily Such as song) network is uploaded to, the genre classification of audio data can help user quickly to search out the audio data liked, but pass The audio data classification of system, i.e., carrying out Emotion tagging to audio data needs artificial screening, and be labeled, and needs a large amount of people Power and time, and audio data can be led to because personal subjective factor causes audio data classification results to have very poor difference The accuracy of mark is low.
Invention content
The present invention provides a kind of method and its system of audio data mark, by extracting the part audio in audio data The feature vector of data completes the automation mark of audio data, it is noted that the accuracy of audio data mark.
In a first aspect, the embodiment of the present invention provides a kind of method of audio data mark, this method includes:
Receive audio data to be marked;
The audio fragment of audio data to be marked is obtained, using at least one training pattern of pre-training to audio fragment It is analyzed, determines the tag along sort of audio fragment;
Tag along sort is marked for the corresponding audio data to be marked of audio fragment.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data, Improve the accuracy rate of audio data mark.
Optionally, in a designing scheme, audio fragment is carried out at least one training pattern using pre-training Before analysis, method further includes:
The corresponding multiple audio datas to be trained of each tag along sort are obtained according at least one tag along sort;
The audio fragment of the corresponding multiple audio datas to be trained of each tag along sort is obtained, and extracts audio fragment Feature vector;
The feature vector of multiple audio fragments corresponding at least one tag along sort is trained, and obtains at least one point The corresponding at least one training pattern of class label.
Optionally, in a designing scheme, the feature vector of audio fragment is extracted, including:
Using the feature vector of mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extraction audio fragments.
Optionally, in a designing scheme, before the feature vector of extraction audio fragment, this method further includes:
Hamming window processing is carried out to audio fragment.
Optionally, in a designing scheme, the features of multiple audio fragments corresponding at least one tag along sort to Amount is trained, including:
It is carried out using the feature vector of convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort Training.
Second aspect, the embodiment of the present invention provide a kind of system, and system includes:
Receiving unit, for receiving audio data to be marked;
Processing unit, for obtaining the audio fragment of audio data to be marked, using at least one training of pre-training Model analyzes audio fragment, determines the tag along sort of audio fragment;
Processing unit is additionally operable to mark tag along sort for the corresponding audio data to be marked of audio fragment.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data, Improve the accuracy rate of audio data mark.
Optionally, in a designing scheme, system further includes training unit;
Processing unit is additionally operable to obtain the corresponding multiple sounds to be trained of each tag along sort according at least one tag along sort Frequency evidence;
Processing unit is additionally operable to obtain the audio fragment of the corresponding multiple audio datas to be trained of each tag along sort, And extract the feature vector of audio fragment;
Training unit, the feature vector for multiple audio fragments corresponding at least one tag along sort are trained, Obtain the corresponding at least one training pattern of at least one tag along sort.
Optionally, in a designing scheme, processing unit extracts the feature vector of audio fragment, including:
Using the feature vector of mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extraction audio fragments.
Optionally, in a designing scheme, processing unit is additionally operable to carry out Hamming window processing to audio fragment.
Optionally, in a designing scheme, training unit multiple audio fragments corresponding at least one tag along sort Feature vector be trained, including:
Training unit uses the feature of convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort Vector is trained.
Based on the method and its system of audio data provided by the invention mark, the audio piece of audio data to be sorted is taken Section by training pattern trained in advance, classifies to audio data, and mark, realizes the automation mark of audio data Note improves the accuracy of audio data mark.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, it will make below to required in the embodiment of the present invention Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is a kind of method flow diagram of audio data mark provided in an embodiment of the present invention;
Fig. 2 is a kind of method flow diagram of model training provided in an embodiment of the present invention;
Fig. 3 is a kind of result figure of audio data mark provided in an embodiment of the present invention;
Fig. 4 is a kind of structure diagram of system provided in an embodiment of the present invention.
Specific embodiment
The present invention provides the method and its system of a kind of audio data mark, suitable for audio data, such as:Song Type classify and the mark of classification type.
Technical scheme of the present invention is described in detail below in conjunction with the accompanying drawings.
Fig. 1 is a kind of method flow diagram of audio data mark provided in an embodiment of the present invention.As shown in Figure 1, this method It may comprise steps of:
S110 receives audio data to be marked.
Audio data to be marked is the audio data of pending classification.When there is audio data to be sorted to need to classify When, such as the audio data progress classification of type in audio database.More specifically, type point is carried out to the song in music libraries Class in other words carries out song tag along sort classification, the classification of stylistic category, such as popular (POP) song, rock and roll (Rock) song Song, hip-hop (Rap) song, jazz (Jazz) song, Blues (Blues) song, allusion (Classical) song, punk (Punk), terrible (Reggae) music of metal (Metal) type song, Latin music (Latin Music), thunder, new century (New Age), country music (Folk Music or Country Music), electronics dance music (Electronic Dance), nursery rhymes (Child Music), folk music, folk song, the world (World) music, fever (HiFi) music, etc..
S120 obtains the audio fragment of audio data to be marked, using at least one training pattern of pre-training to sound Frequency segment is analyzed, and determines the tag along sort of audio fragment.
In embodiments of the present invention, the part audio fragment of audio data to be marked is obtained, to accelerate the speed of acquisition, The audio fragment of 30 seconds in audio data to be marked is obtained in embodiments of the present invention.Specifically acquisition process is:Using sample rate as 16KHz (a frame audio data can have 512 sampled points), frame, which is moved, samples audio data for 16ms, i.e. a frame audio Data can have 256 sampled points, to obtain the audio fragment of audio data.In embodiments of the present invention, a song can be with 1875 frames are obtained, it is consistent with former audio data to ensure.
In embodiments of the present invention, audio fragment is carried out to analyze it using at least one training pattern of pre-training Before, it needs to train at least one training pattern, the description of specific training process such as Fig. 2.
Audio fragment is analyzed using trained at least one training pattern, determines the classification of audio fragment.It can Selection of land in embodiments of the present invention, is analyzed audio fragment as training pattern using AlexNet.AlexNet compares The advantage of other training patterns such as LeNet is:Network increases (5 complete+1 softmax layers of articulamentums of convolutional layer+3), together When solve fitting (dropout, data augmentation or LRN), and multiple graphics processors can be utilized simultaneously (Graphic Processing Unit, GPU) is calculated, and is accelerated calculating speed, is shortened the training time, that is, shorten To the analysis time of audio fragment.
In embodiments of the present invention, server/customer end may be used in the deployment way of audio data labeling system (Client/Server, CS) structure.In embodiments of the present invention, distributed deployment mode may be used in server-side.Client is held Row S110 and S120 after S120, that is, after the audio fragment for obtaining audio data to be marked, send to server and call The call request of at least one training pattern, server call training pattern according to call request, audio fragment are analyzed, Determine the tag along sort of audio fragment.The parallel place that training pattern treats trained audio data is realized using CS deployment way Reason improves the response speed of client request.
S130 is that the corresponding audio data to be marked of audio fragment marks tag along sort.
The method marked using audio data provided in an embodiment of the present invention, by the audio for obtaining audio data to be marked Segment, and trained model analyzes audio fragment, and is divided for the corresponding audio data to be marked of audio fragment The mark of class label realizes the automation mark of audio data, improves the accuracy rate of audio data mark.
Fig. 2 is a kind of method flow diagram of model training provided in an embodiment of the present invention.As shown in Fig. 2, this method can be with Include the following steps:
S210 obtains the corresponding multiple audio datas to be trained of each tag along sort according at least one tag along sort.
In the deep learning field of audio data, it is necessary first to determine the basic principle that training set is chosen, wherein, training When collection is training pattern, according to the corresponding multiple audio datas to be trained of each tag along sort of at least one tag along sort acquisition Set.
For example, at least one tag along sort is 20 tag along sorts or referred to as 20 stylistic categories, such as popular (POP), Rock and roll (Rock), hip-hop (Rap), jazz (Jazz), Blues (Blues), classic (Classical), punk (Punk), metal (Metal), Latin (Latin), thunder terrible (Reggae), new century (New Age), country music (Folk Music or Country Music), electronics dance music (Electronic Dance), nursery rhymes (Child Music), folk music, folk song, the world (World) sound Happy, fever (HiFi) music, etc. music style type.According to 20 stylistic categories from audio database, 20 styles are chosen The training set of type, each stylistic category choose multiple audio datas to be trained, in embodiments of the present invention, a stylistic category 1000 songs to be trained can be selected, artificial screening can be aided with during selection, to improve the matter of music to be trained Amount.
S220, obtains the audio fragment of the corresponding multiple audio datas to be trained of each tag along sort, and extracts audio The feature vector of segment.
In inventive embodiments, for speed up processing, the audio fragment of 30 seconds in each audio data is intercepted.Specifically Can be using sample rate as 16KHz (a frame audio data there can be 512 sampled points), frame, which is moved, adopts audio data for 16ms Sample, i.e. a frame audio data can have 256 sampled points, to obtain the audio fragment of audio data.
Optionally, in embodiments of the present invention, audio fragment will be got and carry out Hamming window processing, Hamming window processing is normal The function processing procedure seen for succinct description, repeats no more herein.
The feature vector of audio fragment after extraction process.Optionally, in embodiments of the present invention, Meier frequency may be used Described in rate cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC) and perceptual linear prediction PLP extractions The feature vector of audio fragment.Such as:Its preceding 20 dimension MFCC is extracted to each song by pretreatment, takes RASTA-PLP Cepstrum 9 is tieed up, and RASTA-PLP frequency spectrums 21 are tieed up, its mean value and variance are asked respectively to obtained MFCC and RASTA-PLP feature vectors, The feature vector that such words a piece of music segment can have 100 dimensions represents.
It should be noted that mel-frequency cepstrum coefficient (MFCC) it the auditory model of human ear can be modeled.Sound Happy characteristic aspect MFCC can accurately more represent music signal than other short-time characteristic parameters, so the application selects to use MFCC.It is a kind of strong characteristic parameter to perceive linear prediction (PLP), it simulates the characteristic of human auditory system, other are special with voice Sign parameter will be got well compared to robustness, while pass through RASTA filtering process, and the variation between short time spectrum analysis time frame and frame is played Certain smoothing effect.In addition, having carried out spectrum increase and decrease processing to obtained PLP cepstrum parameters, spectrum vertex is sharpened.Finally to obtaining To short-time characteristic parameter take its mean value and variance respectively, to establish the correlation between each characteristic parameter frame and frame.
S230, the feature vector of multiple audio fragments corresponding at least one tag along sort are trained, obtain at least The corresponding at least one training pattern of one tag along sort.
Optionally, in embodiments of the present invention, using convolutional neural networks (Convolutional Neural Network, CNN) feature vectors of corresponding at least one tag along sort multiple audio fragments is trained, obtain at least one The corresponding at least one training pattern of a tag along sort.CNN is a kind of feedforward neural network, its artificial neuron can respond Surrounding cells in a part of coverage area have outstanding performance for image procossing.It includes convolutional layer (alternating Convolutional layer) and pond layer (pooling layer).
Training method provided in an embodiment of the present invention carries out convolutional neural networks model using the feature vector extracted Training successfully reduces the artificial mark with subjective factor.
The model trained using the training method can reach 98.58% recognition accuracy.Such as shown in Fig. 3.
Fig. 3 is the result figure of audio data provided in an embodiment of the present invention mark.Fig. 3 (a) is the knot of ethnic song mark Fruit is schemed;Fig. 3 (b) is the result figure of classic song mark;Fig. 3 (c) is the annotation results figure of DJ songs;Fig. 3 (d) is children's song Annotation results figure.Wherein, the abscissa in Fig. 3 (a), Fig. 3 (b), Fig. 3 (c) and Fig. 3 (d) represents dimension;Ordinate expression pair The dimension values answered.
From Fig. 3 (a), Fig. 3 (b), Fig. 3 (c) and Fig. 3 (d) it is found that except Fig. 3 (c) DJ in the annotation results figure of this few class style The fluctuation of style is bigger, the substantially presentation ascendant trend of other three kinds of styles.For Fig. 3 (a), Fig. 3 (b), Fig. 3 (b), Fig. 3 (c) mark accuracy rate is up to 98.73%, 98.97%, 99.73%, 98.17% respectively.
Fig. 1 above and Fig. 3 describe the training process of training pattern in detail, the annotation process of audio data to be marked, And the interpretation of result that annotated audio data are labeled is treated using the training pattern of Fig. 2 training, 4 in detail below in conjunction with the accompanying drawings System provided in an embodiment of the present invention is described.
Fig. 4 is a kind of structure diagram of system provided in an embodiment of the present invention.As shown in figure 4, the system can include Receiving unit 310 and processing unit 320.
Receiving unit 310, for receiving audio data to be marked.
Processing unit 320, for obtaining the audio fragment of audio data to be marked, using at least one instruction of pre-training Practice model to analyze the audio fragment, determine the tag along sort of audio fragment;It is corresponding to be marked for audio fragment Audio data marks tag along sort.
Its detailed process is identical with the process of S110, S120 and S130 in Fig. 1, specifically describe please refer to Fig. 1 S110, S120 and S130 for succinct description, is repeated no more herein.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data, Improve the accuracy rate of audio data mark.
Optionally, in embodiments of the present invention, as shown in figure 4, the system can also include training unit 330.
Processing unit 320 obtains the corresponding multiple audios to be trained of each tag along sort according at least one tag along sort Data;The audio fragment of the corresponding the multiple audio data to be trained of each tag along sort is obtained, and extracts the audio The feature vector of segment.
Training unit 330, the feature vector for multiple audio fragments corresponding at least one tag along sort are instructed Practice, obtain the corresponding at least one training pattern of at least one tag along sort.
In the training process, it needs first to obtain the corresponding training sample of each tag along sort according to tag along sort, i.e., it is multiple Audio data to be trained.And the snatch of music of multiple audio datas to be trained is obtained, the feature vector that extraction audio judges.
Optionally, in embodiments of the present invention, processing unit 320 carries out Hamming window processing to the audio fragment got. And will treated audio fragment, the audio fragment of each tag along sort is extracted according to tag along sort.
In embodiments of the present invention, mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extractions institute may be used State the feature vector of audio fragment.
Then, the feature vector of the multiple audio fragments corresponding at least one tag along sort of training unit 330 is instructed Practice, including:
Training unit 330 is using convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort Feature vector is trained.
Detailed process is identical with the process of S210, S220 and S230 of Fig. 2, specifically describes S210, the S220 for referring to Fig. 2 And S230, for succinct description, repeat no more herein.
Using system provided in an embodiment of the present invention, by obtaining the audio fragment of audio data to be marked, and it is trained Model analyzes audio fragment, and the mark of tag along sort is carried out for the corresponding audio data to be marked of audio fragment, The automation mark of audio data is realized, improves the accuracy rate of audio data mark.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (10)

  1. A kind of 1. method of audio data mark, which is characterized in that the method includes:
    Receive audio data to be marked;
    The audio fragment of the audio data to be marked is obtained, using at least one training pattern of pre-training to the audio Segment is analyzed, and determines the tag along sort of the audio fragment;
    The tag along sort is marked for the corresponding audio data to be marked of the audio fragment.
  2. 2. according to the method described in claim 1, it is characterized in that, at least one training pattern pair using pre-training Before the audio fragment is analyzed, the method further includes:
    The corresponding multiple audio datas to be trained of each tag along sort are obtained according at least one tag along sort;
    The audio fragment of the corresponding the multiple audio data to be trained of each tag along sort is obtained, and extracts the audio piece The feature vector of section;
    The feature vector of the corresponding multiple audio fragments of at least one tag along sort is trained, obtains described at least one The corresponding at least one training pattern of a tag along sort.
  3. 3. according to the method described in claim 2, it is characterized in that, the feature vector of the extraction audio fragment, including:
    The feature vector of the audio fragment is extracted using mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP.
  4. 4. according to the method described in claim 2, it is characterized in that, the extraction audio fragment feature vector it Before, the method further includes:
    Hamming window processing is carried out to the audio fragment.
  5. 5. according to claim 2 to 4 any one of them method, which is characterized in that described at least one tag along sort The feature vector of corresponding multiple audio fragments is trained, including:
    The feature vector of the corresponding multiple audio fragments of at least one tag along sort is carried out using convolutional neural networks CNN Training.
  6. 6. a kind of system, which is characterized in that the system comprises:
    Receiving unit, for receiving audio data to be marked;
    Processing unit, for obtaining the audio fragment of the audio data to be marked, using at least one training of pre-training Model analyzes the audio fragment, determines the tag along sort of the audio fragment;
    The processing unit is additionally operable to mark the contingency table for the corresponding audio data to be marked of the audio fragment Label.
  7. 7. system according to claim 6, which is characterized in that the system also includes training units;
    The processing unit is additionally operable to obtain the corresponding multiple sounds to be trained of each tag along sort according at least one tag along sort Frequency evidence;
    The processing unit is additionally operable to obtain the audio piece of the corresponding the multiple audio data to be trained of each tag along sort Section, and extract the feature vector of the audio fragment;
    The training unit is instructed for the feature vector to the corresponding multiple audio fragments of at least one tag along sort Practice, obtain the corresponding at least one training pattern of at least one tag along sort.
  8. 8. system according to claim 7, which is characterized in that the processing unit extract the feature of the audio fragment to Amount, including:
    The feature vector of the audio fragment is extracted using mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP.
  9. 9. system according to claim 7, which is characterized in that
    The processing unit is additionally operable to carry out Hamming window processing to the audio fragment.
  10. 10. system according to any one of claims 7 to 9, which is characterized in that the training unit is to described at least one The feature vector of the corresponding multiple audio fragments of tag along sort is trained, including:
    The training unit is using convolutional neural networks CNN to the corresponding multiple audio fragments of at least one tag along sort Feature vector is trained.
CN201611247230.3A 2016-12-29 2016-12-29 The method and its system of audio data mark Pending CN108257614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611247230.3A CN108257614A (en) 2016-12-29 2016-12-29 The method and its system of audio data mark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611247230.3A CN108257614A (en) 2016-12-29 2016-12-29 The method and its system of audio data mark

Publications (1)

Publication Number Publication Date
CN108257614A true CN108257614A (en) 2018-07-06

Family

ID=62720722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611247230.3A Pending CN108257614A (en) 2016-12-29 2016-12-29 The method and its system of audio data mark

Country Status (1)

Country Link
CN (1) CN108257614A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065075A (en) * 2018-09-26 2018-12-21 广州势必可赢网络科技有限公司 A kind of method of speech processing, device, system and computer readable storage medium
CN109166593A (en) * 2018-08-17 2019-01-08 腾讯音乐娱乐科技(深圳)有限公司 audio data processing method, device and storage medium
CN109408660A (en) * 2018-08-31 2019-03-01 安徽四创电子股份有限公司 A method of the music based on audio frequency characteristics is classified automatically
CN110517671A (en) * 2019-08-30 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of appraisal procedure of audio-frequency information, device and storage medium
CN110584701A (en) * 2019-08-23 2019-12-20 杭州智团信息技术有限公司 Labeling identification system and method for bowel sounds
CN110689040A (en) * 2019-08-19 2020-01-14 广州荔支网络技术有限公司 Sound classification method based on anchor portrait
CN110782917A (en) * 2019-11-01 2020-02-11 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN110930997A (en) * 2019-12-10 2020-03-27 四川长虹电器股份有限公司 Method for labeling audio by using deep learning model
CN112420070A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Automatic labeling method and device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN105872855A (en) * 2016-05-26 2016-08-17 广州酷狗计算机科技有限公司 Labeling method and device for video files
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN105872855A (en) * 2016-05-26 2016-08-17 广州酷狗计算机科技有限公司 Labeling method and device for video files
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟子厚 等著: "《汉语语音区别特征分析》", 30 June 2016, 国防工业出版社 *
迷之飞翔: ""caffe深度学习笔记实例薛开宇 基于卷积神经网络CNN的声音识别"", 《HTTPS://WWW.DOCIN.COM/P-1441307242.HTML》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166593A (en) * 2018-08-17 2019-01-08 腾讯音乐娱乐科技(深圳)有限公司 audio data processing method, device and storage medium
CN109408660A (en) * 2018-08-31 2019-03-01 安徽四创电子股份有限公司 A method of the music based on audio frequency characteristics is classified automatically
CN109065075A (en) * 2018-09-26 2018-12-21 广州势必可赢网络科技有限公司 A kind of method of speech processing, device, system and computer readable storage medium
CN110689040A (en) * 2019-08-19 2020-01-14 广州荔支网络技术有限公司 Sound classification method based on anchor portrait
CN110689040B (en) * 2019-08-19 2022-10-18 广州荔支网络技术有限公司 Sound classification method based on anchor portrait
CN112420070A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Automatic labeling method and device, electronic equipment and computer readable storage medium
CN110584701A (en) * 2019-08-23 2019-12-20 杭州智团信息技术有限公司 Labeling identification system and method for bowel sounds
CN110517671A (en) * 2019-08-30 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of appraisal procedure of audio-frequency information, device and storage medium
CN110782917A (en) * 2019-11-01 2020-02-11 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN110782917B (en) * 2019-11-01 2022-07-12 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN110930997A (en) * 2019-12-10 2020-03-27 四川长虹电器股份有限公司 Method for labeling audio by using deep learning model
CN110930997B (en) * 2019-12-10 2022-08-16 四川长虹电器股份有限公司 Method for labeling audio by using deep learning model

Similar Documents

Publication Publication Date Title
CN108257614A (en) The method and its system of audio data mark
Chen et al. The AMG1608 dataset for music emotion recognition
Rozgic et al. Emotion Recognition using Acoustic and Lexical Features.
CN105895087A (en) Voice recognition method and apparatus
Tran et al. Ensemble application of ELM and GPU for real-time multimodal sentiment analysis
Dissanayake et al. Speech emotion recognition ‘in the wild’using an autoencoder
Mokhsin et al. Automatic music emotion classification using artificial neural network based on vocal and instrumental sound timbres.
CN107221344A (en) A kind of speech emotional moving method
CN113813609A (en) Game music style classification method and device, readable medium and electronic equipment
Wu et al. The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge.
Zhang et al. Convolutional neural network with spectrogram and perceptual features for speech emotion recognition
CN111462774B (en) Music emotion credible classification method based on deep learning
Wang Research on recognition and classification of folk music based on feature extraction algorithm
CN111859008B (en) Music recommending method and terminal
Chaudhary et al. Automatic music emotion classification using hashtag graph
CN111402919A (en) Game cavity style identification method based on multiple scales and multiple views
Unni et al. A Technique to Detect Music Emotions Based on Machine Learning Classifiers
Mezghani et al. Multifeature speech/music discrimination based on mid-term level statistics and supervised classifiers
Matsane et al. The use of automatic speech recognition in education for identifying attitudes of the speakers
Ricard et al. Bag of MFCC-based Words for Bird Identification.
Khanna et al. Recognizing emotions from human speech
Li et al. Multi-modal emotion recognition based on speech and image
Choudhury et al. Music Genre Classification Using Convolutional Neural Network
Kamińska et al. Polish emotional speech recognition based on the committee of classifiers
CN114446323B (en) Dynamic multi-dimensional music emotion analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706

RJ01 Rejection of invention patent application after publication