CN108205535A - The method and its system of Emotion tagging - Google Patents
The method and its system of Emotion tagging Download PDFInfo
- Publication number
- CN108205535A CN108205535A CN201611169265.XA CN201611169265A CN108205535A CN 108205535 A CN108205535 A CN 108205535A CN 201611169265 A CN201611169265 A CN 201611169265A CN 108205535 A CN108205535 A CN 108205535A
- Authority
- CN
- China
- Prior art keywords
- audio data
- training
- trained
- marked
- sonograph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000008451 emotion Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 137
- 238000012545 processing Methods 0.000 claims description 27
- 241001269238 Data Species 0.000 claims description 20
- 230000009466 transformation Effects 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000002996 emotional effect Effects 0.000 abstract description 18
- 238000011161 development Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 208000019901 Anxiety disease Diseases 0.000 description 3
- 230000036506 anxiety Effects 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000010224 classification analysis Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Abstract
The present invention provides a kind of method and its system of Emotion tagging, and this method includes:Receive audio data to be marked;Audio data to be marked is analyzed using at least one training pattern of pre-training, audio data to be marked is converted into sonograph, to determine the affective tag of audio data to be marked;Emotion tagging is carried out for audio data to be marked.By audio data to be identified by training pattern trained in advance, audio data is converted into sonograph identification, the emotion of audio data is labeled, the automation emotional semantic classification of audio data is realized, reduces learning cost, shorten the development cycle.
Description
Technical field
The present invention relates to data analysis technique fields, especially design the method and its system of a kind of Emotion tagging.
Background technology
The automatic marking of music emotion is related to music theory, psychology, signal processing, pattern-recognition, data mining etc.
Relevant knowledge, but professional span is big, and automatic identification process is complicated, and excessively high to the design process requirement of technical staff.
Invention content
The present invention provides a kind of method and its system of Emotion tagging, is known by the way that the identification of audio data is converted into image
Not, the automation mark of song emotion is completed, learning cost is reduced, shortens the development cycle.
In a first aspect, the embodiment of the present invention provides a kind of method of Emotion tagging, this method includes:
Receive audio data to be marked;
Audio data to be marked is analyzed using at least one training pattern of pre-training, by audio to be marked
Data are converted to sonograph, to determine the affective tag of audio data to be marked;
Emotion tagging is carried out for audio data to be marked.
By audio data to be identified by training pattern trained in advance, audio data is converted into sonograph identification, it is right
The emotion of audio data is labeled, and is realized the automation emotional semantic classification of audio data, is reduced learning cost, shorten out
Send out the period.
Optionally, audio data to be marked is analyzed using at least one training pattern of pre-training, determined
Before the affective tag of audio data to be marked, method further includes:
The corresponding training set of each affective tag is obtained according at least one affective tag, training set is waited to train including multiple
Audio data;
Multiple audio datas to be trained that training set includes are respectively converted into sonograph;
Model training is carried out to each sonograph that multiple audio datas to be trained that training set includes are converted to, is obtained
Training pattern.
Optionally, multiple audio datas to be trained that training set includes are respectively converted into sonograph, including:
Multiple audio datas to be trained are subjected to Fourier transformation respectively, obtain the sound spectrum of multiple audio datas to be trained
Figure.
Optionally, after the multiple audio datas to be trained for including training set are respectively converted into sonograph, method is also
Including:
Image scaling is carried out to multiple transformed sonographs of audio data to be trained;
Sonograph after scaling is subjected to model training, obtains training pattern.
Optionally, this method includes obtaining training pattern to sonograph progress model training using AlexNet models.
Second aspect, the embodiment of the present invention provide a kind of system, and system includes:
Receiving unit, for receiving audio data to be marked;
Analytic unit, for being analyzed using at least one training pattern of pre-training audio data to be marked,
Audio data to be marked is converted into sonograph, to determine the affective tag of audio data to be marked;
Processing unit is additionally operable to carry out Emotion tagging for audio data to be marked.
By audio data to be identified by training pattern trained in advance, audio data is converted into sonograph identification, it is right
The emotion of audio data is labeled, and is realized the automation emotional semantic classification of audio data, is reduced learning cost, shorten out
Send out the period.
Optionally, system further includes training unit,
Processing unit is additionally operable to obtain the corresponding training set of each affective tag, training according at least one affective tag
Collection includes multiple audio datas to be trained;
Processing unit is additionally operable to multiple audio datas to be trained that training set includes being respectively converted into sonograph;
Training unit, each sonograph for being converted to multiple audio datas to be trained that training set includes carry out
Model training obtains training pattern.
Optionally, processing unit is specifically used for,
Multiple audio datas to be trained are subjected to Fourier transformation respectively, obtain the sound spectrum of multiple audio datas to be trained
Figure.
Optionally,
Processing unit is additionally operable to carry out image scaling to multiple transformed sonographs of audio data to be trained;
Training unit is additionally operable to the sonograph after scaling carrying out model training, obtains training pattern.
Optionally, training unit carries out model training to sonograph using AlexNet models and obtains training pattern.
Audio data to be identified is passed through training in advance by method and its system based on Emotion tagging provided by the invention
Audio data is converted into sonograph identification, the emotion of audio data is labeled, realizes audio data by training pattern
Emotional semantic classification is automated, learning cost is reduced, shortens the development cycle.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, it will make below to required in the embodiment of the present invention
Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is a kind of method flow diagram of Emotion tagging provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram of model training provided in an embodiment of the present invention;
Fig. 3 is the method flow diagram of another model training provided in an embodiment of the present invention;
Fig. 4 is a kind of system structure diagram provided in an embodiment of the present invention.
Specific embodiment
The present invention provides the method and its system of a kind of Emotion tagging, suitable for the emotional semantic classification to audio data, example
Such as:Song.
Technical scheme of the present invention is described in detail below in conjunction with the accompanying drawings.
Fig. 1 is a kind of method flow diagram of Emotion tagging provided in an embodiment of the present invention.As shown in Figure 1, this method can be with
Include the following steps:
S110 receives audio data to be marked.
Audio data to be marked is audio data to be identified.Audio data to be identified is the audio for treating emotional semantic classification
Data.Emotional semantic classification is carried out when there is audio data to be identified to need identification, such as to the audio data in audio database.
Emotional semantic classification more specifically is carried out to the song in library, song " is such as made us full according to affective tag " vivifying "
Foot ", " dejected ", " anxiety ", etc. emotional semantic classification classifies to song.
S120 analyzes audio data to be marked using at least one training pattern of pre-training, determines to wait to know
The affective tag of other audio data.
In embodiments of the present invention, audio data to be marked is carried out using with trained at least one training pattern
Analysis before carrying out emotional semantic classification analysis in other words, needs to train at least one training pattern, specific training process is such as Fig. 2
Description.
Audio data to be marked is converted to sonograph according to trained at least one training pattern to be identified, really
The emotional semantic classification of fixed audio data to be marked determines the affective tag of audio data to be marked.
During sentiment analysis is carried out by least one training pattern, the deployment of at least one training pattern can adopt
With following two sets of plan:
Scheme one, at least one training pattern use graphics processor (Graphic Processing Unit, GPU) mould
Formula is deployed to individual GPU cluster, and audio data, such as digital music are moved to the cluster and are labeled.
Scheme two, the CPU cluster progress being deployed at least one training pattern where audio data, such as digital music
Local mark, model deployment way are cpu model.
Since audio data mark task is related to the difficulty that a large amount of audio data causes Data Migration, and in view of GPU
Although arithmetic speed is faster, cost is excessively high, therefore the dispositions method of training pattern that scheme two would generally be used to provide is treated
The audio data of mark is analyzed.
S130 marks the affective tag for audio data to be identified.
By audio data to be identified by training pattern trained in advance, audio data is converted into sonograph identification, it is right
The emotion of audio data is labeled, and is realized the automation emotional semantic classification of audio data, is reduced learning cost, shorten out
Send out the period.
Fig. 2 is a kind of flow chart of the training method of training pattern provided in an embodiment of the present invention.As shown in Fig. 2, the party
Method may comprise steps of:
S210 obtains the corresponding training set of each affective tag according at least one affective tag, and training set includes multiple
Audio data to be trained.
Using popular Thayer emotion models, i.e., respectively from two reference axis of energy and pressure by the emotion of audio data
Two classes are divided into, such as " vivifying ", " gratifying ", " dejected ", " anxiety ", etc. multiclass feelings can be formed
Sense, i.e. affective tag.
According at least one affective tag (such as vivifying, gratifying, dejected, anxiety) respectively from audio number
According to the audio data file to be trained that predetermined number is extracted in library, such as 1000.It is each to extracting in embodiments of the present invention
The number of the audio data to be trained of affective tag is not construed as limiting, and can be determined according to practical training precision, usually taken out
The precision of the number of audio data the to be trained training patterns trained taken higher more.
Multiple audio datas to be trained that training set includes are respectively converted into sonograph by S220.
The corresponding sound of audio data mainly includes three dimensions:Time, frequency, energy.Common sound visualization table
The mode of showing has oscillogram, spectrogram, sonograph.Oscillogram characterizes the time-domain information of audio data, has lost frequency information;Frequency spectrum
Chart levies the frequency domain information of audio data, not comprising energy information;Sonograph can then characterize simultaneously time of sound, frequency,
Energy information.Since the lossy compression process of audio data is related to three dimensions of sound, to ensure information representation
Integrality, the application carry out model training using the sonograph of audio data to be trained as the input of model training.
In embodiments of the present invention, audio data to be trained can be converted to sonograph by Fourier transformation.
In the embodiment of the present invention, audio data to be trained can be converted to by sonograph by short Fourier transformation.Short Fourier becomes
Commutation introduces window function compared with Fourier transformation, can provide the information that frequency signal changes over time.It most waits to train at last
Audio data be converted into horizontal axis characterization the time, the longitudinal axis characterization frequency, characterization energy size sonograph.
It should be noted that window function be in order to reduce wait train audio data carry out Fourier transformation when spectrum energy
Leakage, may be used different cutted functions and signal is blocked, truncation funcation is properly termed as wearing in embodiments of the present invention
Function, can also referred to as " window ".
S230 carries out model instruction to each sonograph that multiple audio datas to be trained that training set includes are converted to
Practice, obtain training pattern.
In the embodiment of the present invention, convolutional neural networks (Convolutional Neural Network, CNN) may be used
Carry out model training, convolutional neural networks are a kind of feedforward neural networks, it can with the visual cognition process of the approximate simulation mankind,
It is widely used in image processing field.
Optionally, present invention employs AlexNet as training pattern, the reason is that the number of parameters of the model is about 6000
Ten thousand, model tormulation is very capable, easily acquires more features.In addition, AlexNet additionally uses the skills such as ReLU, LRN, Dropout
Art, the problems such as effectively alleviating activation primitive saturation, model over-fitting, while improve calculated performance.Training process employs
CUDA+ graphics processors (Graphic Processing Unit, GPU) accelerate, to shorten the training time.
The trained training pattern of training method of training pattern through the embodiment of the present invention is to audio to be marked
Data are analyzed, and complete the emotional semantic classification of audio data, to realize the S120 in Fig. 1.
Optionally, such as Fig. 3 in embodiments of the present invention, turn respectively by multiple audio datas to be trained that training set includes
It is changed to after sonograph, this method can also include:
S240 carries out image scaling to multiple transformed sonographs of audio data to be trained.
In practical applications, it needs to consider the accuracy rate of training pattern simultaneously and annotated audio number is treated using training pattern
According to the processing speed for carrying out emotional semantic classification.In embodiments of the present invention, the method that image scaling may be used carrys out training for promotion mould
Type treats the processing speed that annotated audio data carry out emotional semantic classification.
Optionally, bilinear interpolation may be used in the strategy of image scaling in the embodiment of the present invention, the bilinear interpolation
Method has taken into account continuous pixels and computation complexity.Meanwhile in order to retain enough audio-frequency informations as far as possible, picture should not compress
It is too small.For example, in view of the application scenarios of song sentiment analysis are not high to requirement of real-time, the image ruler of 256*256 may be used
It is very little, to maximize the ability to express of audio data, and then ensure higher recognition accuracy.
Sonograph after scaling is carried out model training, obtains training pattern by S250.
The processing procedure of this method/step is identical with the process of method/step of S230 in Fig. 2, is described to be succinct, at this
In repeat no more.
Fig. 1 above and Fig. 3 describe the training process of training pattern and the emotion mark of audio data to be marked in detail
Below in conjunction with the accompanying drawings 4 system provided in an embodiment of the present invention is described in detail in the process of note.
Fig. 4 is system provided in an embodiment of the present invention, which can include:Receiving unit 410,420 and of analytic unit
Processing unit 430.
Receiving unit 410, for receiving audio data to be marked.
Analytic unit 420, for being divided using at least one training pattern of pre-training audio data to be marked
Analysis determines the affective tag of audio data to be marked.
Processing unit 430 is additionally operable to carry out Emotion tagging for audio data to be marked.
When needing to carry out emotional semantic classification specifically, working as and receiving audio data to be marked, and carrying out Emotion tagging, pass through
Advance trained at least one training pattern carries out sentiment analysis to audio data to be marked, by audio data to be marked
It is converted to sonograph and carries out sentiment analysis, it will be complete to the identification being converted to sonograph of audio data to be marked
Into the sentiment analysis of audio data to be marked, and it is labeled.
It, can be with it should be noted that the advance trained at least one training pattern used in embodiments of the present invention
At least one training pattern trained for S210, S220 and S230 in Fig. 2 and Fig. 3 or S210, S220, S240 and S250.
Optionally, in embodiments of the present invention, as shown in figure 4, the system can also include training unit 440.
Processing unit 430 is additionally operable to obtain the corresponding training set of each affective tag, instruction according at least one affective tag
Practice collection and include multiple audio datas to be trained.
Processing unit 430 is additionally operable to multiple audio datas to be trained that training set includes being respectively converted into sonograph.
Training unit 440, for each sonograph being converted to multiple audio datas to be trained that training set includes
Model training is carried out, obtains training pattern.
In embodiments of the present invention, processing unit 430 and training unit 440 can be at least one trained moulds of training in Fig. 2
Method/step S210 of at least one training pattern of training in the method for type/step S210, S220 and S230 or Fig. 3,
S220, S240 and S250 specifically describe the method/step for referring to Fig. 2 and Fig. 3, for succinct description, repeat no more herein.
Optionally, in embodiments of the present invention, processing unit 430 is specifically used for,
Multiple audio datas to be trained are subjected to Fourier transformation respectively, obtain the sound spectrum of multiple audio datas to be trained
Figure.
It should be noted that the processing procedure of the processing unit 430 is identical with the description process of the S220 in Fig. 2, for letter
Clean description, repeats no more herein.
Optionally, in embodiments of the present invention, processing unit 430 are additionally operable to multiple after the conversion of trained audio data
Sonograph carry out image scaling.
Training unit 440 is additionally operable to the sonograph after scaling carrying out model training, obtains training pattern.
The description of the process is identical with the description process of the S240 in Fig. 3 and S250, no longer superfluous herein for succinct description
It states.The method provided through the embodiment of the present invention, can improve the training pattern that training obtains treats annotated audio data
The accuracy of emotional semantic classification and the processing speed that the progress emotional semantic classification analysis of annotated audio data is treated using training pattern.
Optionally, in embodiments of the present invention, training unit 430 may be used AlexNet models and carry out mould to sonograph
Type training obtains training pattern, to improve calculated performance.In embodiments of the present invention, which can also be using GUDA+GPU's
Mode, to shorten the time of training.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further
It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention
Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
- A kind of 1. method of Emotion tagging, which is characterized in that the method includes:Receive audio data to be marked;The audio data to be marked is analyzed using at least one training pattern of pre-training, is determined described to be marked Audio data affective tag;Emotion tagging is carried out for the audio data to be marked.
- 2. according to the method described in claim 1, it is characterized in that, at least one training pattern pair using pre-training The audio data to be marked is analyzed, before the affective tag for determining the audio data to be marked, the method It further includes:The corresponding training set of each affective tag is obtained according at least one affective tag, the training set is waited to train including multiple Audio data;The multiple audio data to be trained that the training set includes is respectively converted into sonograph;Model training is carried out to each sonograph that the multiple audio data to be trained that the training set includes is converted to, Obtain training pattern.
- 3. according to the method described in claim 2, it is characterized in that, described the multiple wait to train by what the training set included Audio data is respectively converted into sonograph, including:The multiple audio data to be trained is subjected to Fourier transformation respectively, obtains the sound of the multiple audio data to be trained Spectrogram.
- 4. according to the method in claim 2 or 3, which is characterized in that the multiple wait to instruct include the training set Practice audio data to be respectively converted into after sonograph, the method further includes:Image scaling is carried out to the multiple transformed sonograph of audio data to be trained;Sonograph after scaling is subjected to model training, obtains training pattern.
- 5. according to the method described in claim 2, it is characterized in that, this method include using AlexNet models to sonograph into Row model training obtains training pattern.
- 6. a kind of system, which is characterized in that the system comprises:Receiving unit, for receiving audio data to be marked;Analytic unit, for being analyzed using at least one training pattern of pre-training the audio data to be marked, Determine the affective tag of the audio data to be marked;The processing unit is additionally operable to carry out Emotion tagging for the audio data to be marked.
- 7. system according to claim 6, which is characterized in that the system also includes training unit,The processing unit is additionally operable to obtain the corresponding training set of each affective tag according at least one affective tag, described Training set includes multiple audio datas to be trained;The processing unit is additionally operable to the multiple audio data to be trained that the training set includes being respectively converted into sound spectrum Figure;The training unit, for each sound being converted to the multiple audio data to be trained that the training set includes Spectrogram carries out model training, obtains training pattern.
- 8. system according to claim 7, which is characterized in that the processing unit is specifically used for,The multiple audio data to be trained is subjected to Fourier transformation respectively, obtains the sound of the multiple audio data to be trained Spectrogram.
- 9. system according to claim 7 or 8, which is characterized in thatThe processing unit is additionally operable to carry out image scaling to the multiple transformed sonograph of audio data to be trained;The training unit is additionally operable to the sonograph after scaling carrying out model training, obtains training pattern.
- 10. system according to claim 7, which is characterized in that the training unit is using AlexNet models to sonograph It carries out model training and obtains training pattern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611169265.XA CN108205535A (en) | 2016-12-16 | 2016-12-16 | The method and its system of Emotion tagging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611169265.XA CN108205535A (en) | 2016-12-16 | 2016-12-16 | The method and its system of Emotion tagging |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108205535A true CN108205535A (en) | 2018-06-26 |
Family
ID=62601677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611169265.XA Pending CN108205535A (en) | 2016-12-16 | 2016-12-16 | The method and its system of Emotion tagging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108205535A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036465A (en) * | 2018-06-28 | 2018-12-18 | 南京邮电大学 | Speech-emotion recognition method |
CN109800720A (en) * | 2019-01-23 | 2019-05-24 | 平安科技(深圳)有限公司 | Emotion identification model training method, Emotion identification method, apparatus, equipment and storage medium |
CN112233700A (en) * | 2020-10-09 | 2021-01-15 | 平安科技(深圳)有限公司 | Audio-based user state identification method and device and storage medium |
CN112420070A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Automatic labeling method and device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201980A (en) * | 2007-12-19 | 2008-06-18 | 北京交通大学 | Remote Chinese language teaching system based on voice affection identification |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN104882144A (en) * | 2015-05-06 | 2015-09-02 | 福州大学 | Animal voice identification method based on double sound spectrogram characteristics |
-
2016
- 2016-12-16 CN CN201611169265.XA patent/CN108205535A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201980A (en) * | 2007-12-19 | 2008-06-18 | 北京交通大学 | Remote Chinese language teaching system based on voice affection identification |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN104882144A (en) * | 2015-05-06 | 2015-09-02 | 福州大学 | Animal voice identification method based on double sound spectrogram characteristics |
Non-Patent Citations (1)
Title |
---|
W.Q.ZHENG ET AL: "An experimental study of speech emotion recognition based on deep convolutional neural networks", 《2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036465A (en) * | 2018-06-28 | 2018-12-18 | 南京邮电大学 | Speech-emotion recognition method |
CN109036465B (en) * | 2018-06-28 | 2021-05-11 | 南京邮电大学 | Speech emotion recognition method |
CN109800720A (en) * | 2019-01-23 | 2019-05-24 | 平安科技(深圳)有限公司 | Emotion identification model training method, Emotion identification method, apparatus, equipment and storage medium |
CN109800720B (en) * | 2019-01-23 | 2023-12-22 | 平安科技(深圳)有限公司 | Emotion recognition model training method, emotion recognition device, equipment and storage medium |
CN112420070A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Automatic labeling method and device, electronic equipment and computer readable storage medium |
CN112233700A (en) * | 2020-10-09 | 2021-01-15 | 平安科技(深圳)有限公司 | Audio-based user state identification method and device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Coffey et al. | DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations | |
US10565983B2 (en) | Artificial intelligence-based acoustic model training method and apparatus, device and storage medium | |
US20200097820A1 (en) | Method and apparatus for classifying class, to which sentence belongs, using deep neural network | |
CN108205535A (en) | The method and its system of Emotion tagging | |
CN108509411A (en) | Semantic analysis and device | |
CN108763326A (en) | A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based | |
CN105373529A (en) | Intelligent word segmentation method based on hidden Markov model | |
CN109800720A (en) | Emotion identification model training method, Emotion identification method, apparatus, equipment and storage medium | |
ATE386989T1 (en) | METHOD AND APPARATUS FOR DECODING HANDWRITTEN CHARACTERS | |
CN106886580A (en) | A kind of picture feeling polarities analysis method based on deep learning | |
CN106816147A (en) | Speech recognition system based on binary neural network acoustic model | |
CN108875045A (en) | The method and its system of machine-learning process are executed for text classification | |
CN110010136A (en) | The training and text analyzing method, apparatus, medium and equipment of prosody prediction model | |
CN115294427A (en) | Stylized image description generation method based on transfer learning | |
CN110610698A (en) | Voice labeling method and device | |
CN108090099A (en) | A kind of text handling method and device | |
CN109522413B (en) | Construction method and device of medical term library for guided medical examination | |
CN103345623B (en) | A kind of Activity recognition method based on robust relative priority | |
CN111883101B (en) | Model training and speech synthesis method, device, equipment and medium | |
Helaly et al. | Deep convolution neural network implementation for emotion recognition system | |
CN116959492A (en) | Dance motion determination method and device, electronic equipment and storage medium | |
CN111402919A (en) | Game cavity style identification method based on multiple scales and multiple views | |
CN113591472B (en) | Lyric generation method, lyric generation model training method and device and electronic equipment | |
CN113626614B (en) | Method, device, equipment and storage medium for constructing information text generation model | |
CN112347150B (en) | Method and device for labeling academic label of student and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180626 |