CN108257614A - The method and its system of audio data mark - Google Patents
The method and its system of audio data mark Download PDFInfo
- Publication number
- CN108257614A CN108257614A CN201611247230.3A CN201611247230A CN108257614A CN 108257614 A CN108257614 A CN 108257614A CN 201611247230 A CN201611247230 A CN 201611247230A CN 108257614 A CN108257614 A CN 108257614A
- Authority
- CN
- China
- Prior art keywords
- audio
- tag along
- along sort
- audio data
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Abstract
The present invention provides a kind of method and its system of audio data mark, and this method includes:Receive audio data to be marked;The audio fragment of audio data to be marked is obtained, audio fragment is analyzed using at least one training pattern of pre-training, determines the tag along sort of audio fragment;Tag along sort is marked for the corresponding audio data to be marked of audio fragment.The automation mark of audio data is realized, improves the accuracy rate of audio data mark.
Description
Technical field
The present invention relates to audio analysis and processing technology field, especially design a kind of method of audio data mark and its are
System.
Background technology
With the fast development of Technology of Audio Collection and Internet technology, a large amount of audio data (example can be all generated daily
Such as song) network is uploaded to, the genre classification of audio data can help user quickly to search out the audio data liked, but pass
The audio data classification of system, i.e., carrying out Emotion tagging to audio data needs artificial screening, and be labeled, and needs a large amount of people
Power and time, and audio data can be led to because personal subjective factor causes audio data classification results to have very poor difference
The accuracy of mark is low.
Invention content
The present invention provides a kind of method and its system of audio data mark, by extracting the part audio in audio data
The feature vector of data completes the automation mark of audio data, it is noted that the accuracy of audio data mark.
In a first aspect, the embodiment of the present invention provides a kind of method of audio data mark, this method includes:
Receive audio data to be marked;
The audio fragment of audio data to be marked is obtained, using at least one training pattern of pre-training to audio fragment
It is analyzed, determines the tag along sort of audio fragment;
Tag along sort is marked for the corresponding audio data to be marked of audio fragment.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is
The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data,
Improve the accuracy rate of audio data mark.
Optionally, in a designing scheme, audio fragment is carried out at least one training pattern using pre-training
Before analysis, method further includes:
The corresponding multiple audio datas to be trained of each tag along sort are obtained according at least one tag along sort;
The audio fragment of the corresponding multiple audio datas to be trained of each tag along sort is obtained, and extracts audio fragment
Feature vector;
The feature vector of multiple audio fragments corresponding at least one tag along sort is trained, and obtains at least one point
The corresponding at least one training pattern of class label.
Optionally, in a designing scheme, the feature vector of audio fragment is extracted, including:
Using the feature vector of mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extraction audio fragments.
Optionally, in a designing scheme, before the feature vector of extraction audio fragment, this method further includes:
Hamming window processing is carried out to audio fragment.
Optionally, in a designing scheme, the features of multiple audio fragments corresponding at least one tag along sort to
Amount is trained, including:
It is carried out using the feature vector of convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort
Training.
Second aspect, the embodiment of the present invention provide a kind of system, and system includes:
Receiving unit, for receiving audio data to be marked;
Processing unit, for obtaining the audio fragment of audio data to be marked, using at least one training of pre-training
Model analyzes audio fragment, determines the tag along sort of audio fragment;
Processing unit is additionally operable to mark tag along sort for the corresponding audio data to be marked of audio fragment.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is
The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data,
Improve the accuracy rate of audio data mark.
Optionally, in a designing scheme, system further includes training unit;
Processing unit is additionally operable to obtain the corresponding multiple sounds to be trained of each tag along sort according at least one tag along sort
Frequency evidence;
Processing unit is additionally operable to obtain the audio fragment of the corresponding multiple audio datas to be trained of each tag along sort,
And extract the feature vector of audio fragment;
Training unit, the feature vector for multiple audio fragments corresponding at least one tag along sort are trained,
Obtain the corresponding at least one training pattern of at least one tag along sort.
Optionally, in a designing scheme, processing unit extracts the feature vector of audio fragment, including:
Using the feature vector of mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extraction audio fragments.
Optionally, in a designing scheme, processing unit is additionally operable to carry out Hamming window processing to audio fragment.
Optionally, in a designing scheme, training unit multiple audio fragments corresponding at least one tag along sort
Feature vector be trained, including:
Training unit uses the feature of convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort
Vector is trained.
Based on the method and its system of audio data provided by the invention mark, the audio piece of audio data to be sorted is taken
Section by training pattern trained in advance, classifies to audio data, and mark, realizes the automation mark of audio data
Note improves the accuracy of audio data mark.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, it will make below to required in the embodiment of the present invention
Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is a kind of method flow diagram of audio data mark provided in an embodiment of the present invention;
Fig. 2 is a kind of method flow diagram of model training provided in an embodiment of the present invention;
Fig. 3 is a kind of result figure of audio data mark provided in an embodiment of the present invention;
Fig. 4 is a kind of structure diagram of system provided in an embodiment of the present invention.
Specific embodiment
The present invention provides the method and its system of a kind of audio data mark, suitable for audio data, such as:Song
Type classify and the mark of classification type.
Technical scheme of the present invention is described in detail below in conjunction with the accompanying drawings.
Fig. 1 is a kind of method flow diagram of audio data mark provided in an embodiment of the present invention.As shown in Figure 1, this method
It may comprise steps of:
S110 receives audio data to be marked.
Audio data to be marked is the audio data of pending classification.When there is audio data to be sorted to need to classify
When, such as the audio data progress classification of type in audio database.More specifically, type point is carried out to the song in music libraries
Class in other words carries out song tag along sort classification, the classification of stylistic category, such as popular (POP) song, rock and roll (Rock) song
Song, hip-hop (Rap) song, jazz (Jazz) song, Blues (Blues) song, allusion (Classical) song, punk
(Punk), terrible (Reggae) music of metal (Metal) type song, Latin music (Latin Music), thunder, new century (New
Age), country music (Folk Music or Country Music), electronics dance music (Electronic Dance), nursery rhymes
(Child Music), folk music, folk song, the world (World) music, fever (HiFi) music, etc..
S120 obtains the audio fragment of audio data to be marked, using at least one training pattern of pre-training to sound
Frequency segment is analyzed, and determines the tag along sort of audio fragment.
In embodiments of the present invention, the part audio fragment of audio data to be marked is obtained, to accelerate the speed of acquisition,
The audio fragment of 30 seconds in audio data to be marked is obtained in embodiments of the present invention.Specifically acquisition process is:Using sample rate as
16KHz (a frame audio data can have 512 sampled points), frame, which is moved, samples audio data for 16ms, i.e. a frame audio
Data can have 256 sampled points, to obtain the audio fragment of audio data.In embodiments of the present invention, a song can be with
1875 frames are obtained, it is consistent with former audio data to ensure.
In embodiments of the present invention, audio fragment is carried out to analyze it using at least one training pattern of pre-training
Before, it needs to train at least one training pattern, the description of specific training process such as Fig. 2.
Audio fragment is analyzed using trained at least one training pattern, determines the classification of audio fragment.It can
Selection of land in embodiments of the present invention, is analyzed audio fragment as training pattern using AlexNet.AlexNet compares
The advantage of other training patterns such as LeNet is:Network increases (5 complete+1 softmax layers of articulamentums of convolutional layer+3), together
When solve fitting (dropout, data augmentation or LRN), and multiple graphics processors can be utilized simultaneously
(Graphic Processing Unit, GPU) is calculated, and is accelerated calculating speed, is shortened the training time, that is, shorten
To the analysis time of audio fragment.
In embodiments of the present invention, server/customer end may be used in the deployment way of audio data labeling system
(Client/Server, CS) structure.In embodiments of the present invention, distributed deployment mode may be used in server-side.Client is held
Row S110 and S120 after S120, that is, after the audio fragment for obtaining audio data to be marked, send to server and call
The call request of at least one training pattern, server call training pattern according to call request, audio fragment are analyzed,
Determine the tag along sort of audio fragment.The parallel place that training pattern treats trained audio data is realized using CS deployment way
Reason improves the response speed of client request.
S130 is that the corresponding audio data to be marked of audio fragment marks tag along sort.
The method marked using audio data provided in an embodiment of the present invention, by the audio for obtaining audio data to be marked
Segment, and trained model analyzes audio fragment, and is divided for the corresponding audio data to be marked of audio fragment
The mark of class label realizes the automation mark of audio data, improves the accuracy rate of audio data mark.
Fig. 2 is a kind of method flow diagram of model training provided in an embodiment of the present invention.As shown in Fig. 2, this method can be with
Include the following steps:
S210 obtains the corresponding multiple audio datas to be trained of each tag along sort according at least one tag along sort.
In the deep learning field of audio data, it is necessary first to determine the basic principle that training set is chosen, wherein, training
When collection is training pattern, according to the corresponding multiple audio datas to be trained of each tag along sort of at least one tag along sort acquisition
Set.
For example, at least one tag along sort is 20 tag along sorts or referred to as 20 stylistic categories, such as popular (POP),
Rock and roll (Rock), hip-hop (Rap), jazz (Jazz), Blues (Blues), classic (Classical), punk (Punk), metal
(Metal), Latin (Latin), thunder terrible (Reggae), new century (New Age), country music (Folk Music or Country
Music), electronics dance music (Electronic Dance), nursery rhymes (Child Music), folk music, folk song, the world (World) sound
Happy, fever (HiFi) music, etc. music style type.According to 20 stylistic categories from audio database, 20 styles are chosen
The training set of type, each stylistic category choose multiple audio datas to be trained, in embodiments of the present invention, a stylistic category
1000 songs to be trained can be selected, artificial screening can be aided with during selection, to improve the matter of music to be trained
Amount.
S220, obtains the audio fragment of the corresponding multiple audio datas to be trained of each tag along sort, and extracts audio
The feature vector of segment.
In inventive embodiments, for speed up processing, the audio fragment of 30 seconds in each audio data is intercepted.Specifically
Can be using sample rate as 16KHz (a frame audio data there can be 512 sampled points), frame, which is moved, adopts audio data for 16ms
Sample, i.e. a frame audio data can have 256 sampled points, to obtain the audio fragment of audio data.
Optionally, in embodiments of the present invention, audio fragment will be got and carry out Hamming window processing, Hamming window processing is normal
The function processing procedure seen for succinct description, repeats no more herein.
The feature vector of audio fragment after extraction process.Optionally, in embodiments of the present invention, Meier frequency may be used
Described in rate cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC) and perceptual linear prediction PLP extractions
The feature vector of audio fragment.Such as:Its preceding 20 dimension MFCC is extracted to each song by pretreatment, takes RASTA-PLP
Cepstrum 9 is tieed up, and RASTA-PLP frequency spectrums 21 are tieed up, its mean value and variance are asked respectively to obtained MFCC and RASTA-PLP feature vectors,
The feature vector that such words a piece of music segment can have 100 dimensions represents.
It should be noted that mel-frequency cepstrum coefficient (MFCC) it the auditory model of human ear can be modeled.Sound
Happy characteristic aspect MFCC can accurately more represent music signal than other short-time characteristic parameters, so the application selects to use
MFCC.It is a kind of strong characteristic parameter to perceive linear prediction (PLP), it simulates the characteristic of human auditory system, other are special with voice
Sign parameter will be got well compared to robustness, while pass through RASTA filtering process, and the variation between short time spectrum analysis time frame and frame is played
Certain smoothing effect.In addition, having carried out spectrum increase and decrease processing to obtained PLP cepstrum parameters, spectrum vertex is sharpened.Finally to obtaining
To short-time characteristic parameter take its mean value and variance respectively, to establish the correlation between each characteristic parameter frame and frame.
S230, the feature vector of multiple audio fragments corresponding at least one tag along sort are trained, obtain at least
The corresponding at least one training pattern of one tag along sort.
Optionally, in embodiments of the present invention, using convolutional neural networks (Convolutional Neural
Network, CNN) feature vectors of corresponding at least one tag along sort multiple audio fragments is trained, obtain at least one
The corresponding at least one training pattern of a tag along sort.CNN is a kind of feedforward neural network, its artificial neuron can respond
Surrounding cells in a part of coverage area have outstanding performance for image procossing.It includes convolutional layer (alternating
Convolutional layer) and pond layer (pooling layer).
Training method provided in an embodiment of the present invention carries out convolutional neural networks model using the feature vector extracted
Training successfully reduces the artificial mark with subjective factor.
The model trained using the training method can reach 98.58% recognition accuracy.Such as shown in Fig. 3.
Fig. 3 is the result figure of audio data provided in an embodiment of the present invention mark.Fig. 3 (a) is the knot of ethnic song mark
Fruit is schemed;Fig. 3 (b) is the result figure of classic song mark;Fig. 3 (c) is the annotation results figure of DJ songs;Fig. 3 (d) is children's song
Annotation results figure.Wherein, the abscissa in Fig. 3 (a), Fig. 3 (b), Fig. 3 (c) and Fig. 3 (d) represents dimension;Ordinate expression pair
The dimension values answered.
From Fig. 3 (a), Fig. 3 (b), Fig. 3 (c) and Fig. 3 (d) it is found that except Fig. 3 (c) DJ in the annotation results figure of this few class style
The fluctuation of style is bigger, the substantially presentation ascendant trend of other three kinds of styles.For Fig. 3 (a), Fig. 3 (b), Fig. 3 (b), Fig. 3
(c) mark accuracy rate is up to 98.73%, 98.97%, 99.73%, 98.17% respectively.
Fig. 1 above and Fig. 3 describe the training process of training pattern in detail, the annotation process of audio data to be marked,
And the interpretation of result that annotated audio data are labeled is treated using the training pattern of Fig. 2 training, 4 in detail below in conjunction with the accompanying drawings
System provided in an embodiment of the present invention is described.
Fig. 4 is a kind of structure diagram of system provided in an embodiment of the present invention.As shown in figure 4, the system can include
Receiving unit 310 and processing unit 320.
Receiving unit 310, for receiving audio data to be marked.
Processing unit 320, for obtaining the audio fragment of audio data to be marked, using at least one instruction of pre-training
Practice model to analyze the audio fragment, determine the tag along sort of audio fragment;It is corresponding to be marked for audio fragment
Audio data marks tag along sort.
Its detailed process is identical with the process of S110, S120 and S130 in Fig. 1, specifically describe please refer to Fig. 1 S110,
S120 and S130 for succinct description, is repeated no more herein.
By obtaining the audio fragment of audio data to be marked, and trained model analyzes audio fragment, and is
The corresponding audio data to be marked of audio fragment carries out the mark of tag along sort, realizes the automation mark of audio data,
Improve the accuracy rate of audio data mark.
Optionally, in embodiments of the present invention, as shown in figure 4, the system can also include training unit 330.
Processing unit 320 obtains the corresponding multiple audios to be trained of each tag along sort according at least one tag along sort
Data;The audio fragment of the corresponding the multiple audio data to be trained of each tag along sort is obtained, and extracts the audio
The feature vector of segment.
Training unit 330, the feature vector for multiple audio fragments corresponding at least one tag along sort are instructed
Practice, obtain the corresponding at least one training pattern of at least one tag along sort.
In the training process, it needs first to obtain the corresponding training sample of each tag along sort according to tag along sort, i.e., it is multiple
Audio data to be trained.And the snatch of music of multiple audio datas to be trained is obtained, the feature vector that extraction audio judges.
Optionally, in embodiments of the present invention, processing unit 320 carries out Hamming window processing to the audio fragment got.
And will treated audio fragment, the audio fragment of each tag along sort is extracted according to tag along sort.
In embodiments of the present invention, mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP extractions institute may be used
State the feature vector of audio fragment.
Then, the feature vector of the multiple audio fragments corresponding at least one tag along sort of training unit 330 is instructed
Practice, including:
Training unit 330 is using convolutional neural networks CNN multiple audio fragments corresponding at least one tag along sort
Feature vector is trained.
Detailed process is identical with the process of S210, S220 and S230 of Fig. 2, specifically describes S210, the S220 for referring to Fig. 2
And S230, for succinct description, repeat no more herein.
Using system provided in an embodiment of the present invention, by obtaining the audio fragment of audio data to be marked, and it is trained
Model analyzes audio fragment, and the mark of tag along sort is carried out for the corresponding audio data to be marked of audio fragment,
The automation mark of audio data is realized, improves the accuracy rate of audio data mark.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further
It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention
Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
- A kind of 1. method of audio data mark, which is characterized in that the method includes:Receive audio data to be marked;The audio fragment of the audio data to be marked is obtained, using at least one training pattern of pre-training to the audio Segment is analyzed, and determines the tag along sort of the audio fragment;The tag along sort is marked for the corresponding audio data to be marked of the audio fragment.
- 2. according to the method described in claim 1, it is characterized in that, at least one training pattern pair using pre-training Before the audio fragment is analyzed, the method further includes:The corresponding multiple audio datas to be trained of each tag along sort are obtained according at least one tag along sort;The audio fragment of the corresponding the multiple audio data to be trained of each tag along sort is obtained, and extracts the audio piece The feature vector of section;The feature vector of the corresponding multiple audio fragments of at least one tag along sort is trained, obtains described at least one The corresponding at least one training pattern of a tag along sort.
- 3. according to the method described in claim 2, it is characterized in that, the feature vector of the extraction audio fragment, including:The feature vector of the audio fragment is extracted using mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP.
- 4. according to the method described in claim 2, it is characterized in that, the extraction audio fragment feature vector it Before, the method further includes:Hamming window processing is carried out to the audio fragment.
- 5. according to claim 2 to 4 any one of them method, which is characterized in that described at least one tag along sort The feature vector of corresponding multiple audio fragments is trained, including:The feature vector of the corresponding multiple audio fragments of at least one tag along sort is carried out using convolutional neural networks CNN Training.
- 6. a kind of system, which is characterized in that the system comprises:Receiving unit, for receiving audio data to be marked;Processing unit, for obtaining the audio fragment of the audio data to be marked, using at least one training of pre-training Model analyzes the audio fragment, determines the tag along sort of the audio fragment;The processing unit is additionally operable to mark the contingency table for the corresponding audio data to be marked of the audio fragment Label.
- 7. system according to claim 6, which is characterized in that the system also includes training units;The processing unit is additionally operable to obtain the corresponding multiple sounds to be trained of each tag along sort according at least one tag along sort Frequency evidence;The processing unit is additionally operable to obtain the audio piece of the corresponding the multiple audio data to be trained of each tag along sort Section, and extract the feature vector of the audio fragment;The training unit is instructed for the feature vector to the corresponding multiple audio fragments of at least one tag along sort Practice, obtain the corresponding at least one training pattern of at least one tag along sort.
- 8. system according to claim 7, which is characterized in that the processing unit extract the feature of the audio fragment to Amount, including:The feature vector of the audio fragment is extracted using mel-frequency cepstrum coefficient MFCC and perceptual linear prediction PLP.
- 9. system according to claim 7, which is characterized in thatThe processing unit is additionally operable to carry out Hamming window processing to the audio fragment.
- 10. system according to any one of claims 7 to 9, which is characterized in that the training unit is to described at least one The feature vector of the corresponding multiple audio fragments of tag along sort is trained, including:The training unit is using convolutional neural networks CNN to the corresponding multiple audio fragments of at least one tag along sort Feature vector is trained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611247230.3A CN108257614A (en) | 2016-12-29 | 2016-12-29 | The method and its system of audio data mark |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611247230.3A CN108257614A (en) | 2016-12-29 | 2016-12-29 | The method and its system of audio data mark |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108257614A true CN108257614A (en) | 2018-07-06 |
Family
ID=62720722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611247230.3A Pending CN108257614A (en) | 2016-12-29 | 2016-12-29 | The method and its system of audio data mark |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108257614A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065075A (en) * | 2018-09-26 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of method of speech processing, device, system and computer readable storage medium |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109408660A (en) * | 2018-08-31 | 2019-03-01 | 安徽四创电子股份有限公司 | A method of the music based on audio frequency characteristics is classified automatically |
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN110584701A (en) * | 2019-08-23 | 2019-12-20 | 杭州智团信息技术有限公司 | Labeling identification system and method for bowel sounds |
CN110689040A (en) * | 2019-08-19 | 2020-01-14 | 广州荔支网络技术有限公司 | Sound classification method based on anchor portrait |
CN110782917A (en) * | 2019-11-01 | 2020-02-11 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN110930997A (en) * | 2019-12-10 | 2020-03-27 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
CN112420070A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Automatic labeling method and device, electronic equipment and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN105872855A (en) * | 2016-05-26 | 2016-08-17 | 广州酷狗计算机科技有限公司 | Labeling method and device for video files |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
-
2016
- 2016-12-29 CN CN201611247230.3A patent/CN108257614A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN105872855A (en) * | 2016-05-26 | 2016-08-17 | 广州酷狗计算机科技有限公司 | Labeling method and device for video files |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
Non-Patent Citations (2)
Title |
---|
孟子厚 等著: "《汉语语音区别特征分析》", 30 June 2016, 国防工业出版社 * |
迷之飞翔: ""caffe深度学习笔记实例薛开宇 基于卷积神经网络CNN的声音识别"", 《HTTPS://WWW.DOCIN.COM/P-1441307242.HTML》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109408660A (en) * | 2018-08-31 | 2019-03-01 | 安徽四创电子股份有限公司 | A method of the music based on audio frequency characteristics is classified automatically |
CN109065075A (en) * | 2018-09-26 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of method of speech processing, device, system and computer readable storage medium |
CN110689040A (en) * | 2019-08-19 | 2020-01-14 | 广州荔支网络技术有限公司 | Sound classification method based on anchor portrait |
CN110689040B (en) * | 2019-08-19 | 2022-10-18 | 广州荔支网络技术有限公司 | Sound classification method based on anchor portrait |
CN112420070A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Automatic labeling method and device, electronic equipment and computer readable storage medium |
CN110584701A (en) * | 2019-08-23 | 2019-12-20 | 杭州智团信息技术有限公司 | Labeling identification system and method for bowel sounds |
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN110782917A (en) * | 2019-11-01 | 2020-02-11 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN110782917B (en) * | 2019-11-01 | 2022-07-12 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN110930997A (en) * | 2019-12-10 | 2020-03-27 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
CN110930997B (en) * | 2019-12-10 | 2022-08-16 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108257614A (en) | The method and its system of audio data mark | |
Chen et al. | The AMG1608 dataset for music emotion recognition | |
Rozgic et al. | Emotion Recognition using Acoustic and Lexical Features. | |
CN105895087A (en) | Voice recognition method and apparatus | |
Tran et al. | Ensemble application of ELM and GPU for real-time multimodal sentiment analysis | |
Dissanayake et al. | Speech emotion recognition ‘in the wild’using an autoencoder | |
Mokhsin et al. | Automatic music emotion classification using artificial neural network based on vocal and instrumental sound timbres. | |
CN107221344A (en) | A kind of speech emotional moving method | |
CN113813609A (en) | Game music style classification method and device, readable medium and electronic equipment | |
Wu et al. | The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge. | |
Zhang et al. | Convolutional neural network with spectrogram and perceptual features for speech emotion recognition | |
CN111462774B (en) | Music emotion credible classification method based on deep learning | |
Wang | Research on recognition and classification of folk music based on feature extraction algorithm | |
CN111859008B (en) | Music recommending method and terminal | |
Chaudhary et al. | Automatic music emotion classification using hashtag graph | |
CN111402919A (en) | Game cavity style identification method based on multiple scales and multiple views | |
Unni et al. | A Technique to Detect Music Emotions Based on Machine Learning Classifiers | |
Mezghani et al. | Multifeature speech/music discrimination based on mid-term level statistics and supervised classifiers | |
Matsane et al. | The use of automatic speech recognition in education for identifying attitudes of the speakers | |
Ricard et al. | Bag of MFCC-based Words for Bird Identification. | |
Khanna et al. | Recognizing emotions from human speech | |
Li et al. | Multi-modal emotion recognition based on speech and image | |
Choudhury et al. | Music Genre Classification Using Convolutional Neural Network | |
Kamińska et al. | Polish emotional speech recognition based on the committee of classifiers | |
CN114446323B (en) | Dynamic multi-dimensional music emotion analysis method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |
|
RJ01 | Rejection of invention patent application after publication |