CN109977255A - Model generating method, audio-frequency processing method, device, terminal and storage medium - Google Patents
Model generating method, audio-frequency processing method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN109977255A CN109977255A CN201910134014.5A CN201910134014A CN109977255A CN 109977255 A CN109977255 A CN 109977255A CN 201910134014 A CN201910134014 A CN 201910134014A CN 109977255 A CN109977255 A CN 109977255A
- Authority
- CN
- China
- Prior art keywords
- audio
- sample
- mark
- audio data
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Auxiliary Devices For Music (AREA)
Abstract
The embodiment of the invention provides a kind of model generating method, audio-frequency processing method, device, terminal and computer readable storage mediums, the model generating method includes: to be labeled according to preset musical genre labels to sample audio data, generates annotated audio sample;The annotated audio sample is cut into multiple annotated audio data segments of preset length;Each annotated audio data segment is handled into the mark sample audio section feature vector for multiple default dimensions, using as mark sample set;The preset musical genre labels of the mark sample audio section feature vector each in the mark sample set are updated, mark sample audio training set is obtained;The mark sample audio training set is trained using deep learning method, obtains first music style marking model.It realizes and target audio data is inputted into first music style marking model, obtain the purpose of music style label.
Description
Technical field
The present invention relates to network technique fields, more particularly to model generating method, audio-frequency processing method, terminal and calculating
Machine readable storage medium storing program for executing.
Background technique
With the universal and development of video or audio network, many videos and audio website are emerged, user is facilitated to regard
Interested video or audio are searched on frequency or audio website, greatly enriches the life of user.
Currently, for a large amount of audio, video datas by user's self-control or official's production stored on video or audio website
For users to use, wherein recommend the function of audio-video to have great demand to user for the music style of audio, video data.
However, in the prior art, marked often through the artificial music style for carrying out audio-video website, low efficiency and at high cost.
Therefore, how the mark for carrying out music style to the audio, video data stored on audio-video website of efficiently and accurately is
There is technical problem to be solved at present.
Summary of the invention
The technical problem to be solved is that provide a kind of model generating method, audio-frequency processing method, dress for the embodiment of the present invention
It sets, terminal and computer readable storage medium, to solve to the music associated video data or audio number stored in video website
The technical issues of according to the mark for carrying out music style.
To solve the above-mentioned problems, the present invention is achieved through the following technical solutions:
First aspect provides a kind of model generating method, which comprises
Sample audio data is labeled according to preset musical genre labels, generates annotated audio sample;
The annotated audio sample is cut into multiple annotated audio data segments of preset length;
Each annotated audio data segment is handled into the mark sample audio section feature vector for multiple default dimensions, to make
To mark sample set;
By the preset musical genre labels of the mark sample audio section feature vector each in the mark sample set
It is updated, obtains mark sample audio training set;
The mark sample audio training set is trained using deep learning method, obtains first music style mark
Model.
Second aspect provides a kind of audio-frequency processing method, which comprises
The mark for carrying out music style to target audio data is received to request;
It is requested according to the label, using music style marking model, marks the music style of the target audio data.
The third aspect provides a kind of model generating means, and described device includes:
Annotated audio sample generation module, for being labeled according to preset musical genre labels to sample audio data,
Generate annotated audio sample;
Annotated audio data segment obtains module, for the annotated audio sample to be cut into multiple marks of preset length
Audio data section;
Sample set determining module is marked, for handling each annotated audio data segment for the mark of multiple default dimensions
Sample audio section feature vector, using as mark sample set;
Sample audio training set generation module is marked, is used for the mark sample audio section each in the mark sample set
The preset musical genre labels of feature vector are updated, and obtain mark sample audio training set;
First music style marking model training module, for being instructed using deep learning method to the mark sample audio
Practice collection to be trained, obtains first music style marking model.
Fourth aspect provides a kind of apparatus for processing audio, and described device includes:
Music style marks request receiving module, asks for receiving the mark for carrying out music style to target audio data
It asks;
Music style labeling module, using music style marking model, marks the mesh for requesting according to the label
Mark the music style of audio data.
5th aspect provides a kind of terminal, comprising: memory, processor and is stored on the memory and can be described
The computer program run on processor realizes such as above-mentioned model generation side when the computer program is executed by the processor
Step in method, or such as the step of above-mentioned audio-frequency processing method.
6th aspect provides a kind of computer readable storage medium, and calculating is stored on the computer readable storage medium
Machine program realizes the step in such as above-mentioned model generating method when the computer program is executed by processor, or such as above-mentioned
Audio-frequency processing method in step.
Compared with prior art, the embodiment of the present invention includes following advantages:
In the embodiment of the present invention, for the audio data in audio-video website, marked using preset musical genre labels
After note, by pretreatment, if audio data is cut into pieces post-processing as the feature vector of default dimension, then music is carried out
Mark sample audio training set is obtained after the update of genre labels, using deep learning method to mark sample audio training
Collection is trained, and obtains first music style marking model.Then, target audio data are inputted into above-mentioned first music style mark
Injection molding type obtains the music style of first music style marking model output.Wherein, above-mentioned music style is preset, such as
Pop music, hip-hop music, rock music, rhythm and blues etc..In this way, being marked by all music styles to realize audio-video
Data carry out the purpose of music style label, realize the mesh that watching focus type mark is carried out for various video data precise and high efficiencies
, have the beneficial effect that efficiently and accurately realizes the music style label of audio, video data.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The application can be limited.
Detailed description of the invention
Fig. 1 is a kind of flow chart of model generating method provided in an embodiment of the present invention;
Figure 1A is a kind of audio signal schematic diagram provided in an embodiment of the present invention;
Figure 1B is a kind of audio data windowing process schematic diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of audio-frequency processing method flow chart provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of model generating means provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of apparatus for processing audio provided in an embodiment of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Referring to Fig. 1, being a kind of flow chart of model generating method provided in an embodiment of the present invention, specifically include:
Step 101 is labeled sample audio data according to preset musical genre labels, generates annotated audio sample;
In the embodiment of the present invention, sample audio data is the audio, video data concentration extraction in the storage of audio-video website backstage
Out, wherein the storage mode of audio, video data collection generally can be with the form storage that the time marks, such as 1 year first quarter
User freely upload audio, video data and official production upload audio, video data gather, in these audio, video data collection
It is middle extraction wherein audio data as audio sample.
For example, extracting audio data in video data as audio sample, or by audio data directly as audio
The audio data extracted in video data and the audio data set stored naturally can also be synthesized audio sample by sample.
Wherein, the specific method for extracting video data sound intermediate frequency data is described as follows:
It is read by the video data in real-time messages transport protocol (RTMP, Real Time Messaging Protocol)
The method for taking packet RTMP_ReadPacket to obtain video and corresponding audio data are as follows:
1, the audio sync packet in video data is obtained;
2, the audio head decoding data AACDecoderSpecificInfo and audio data parsed in audio sync packet matches
Confidence ceases AudioSpecificConfig.Wherein, audio data configuration information AudioSpecificConfig is for generating
ADST (including sample rate, channel number, the frame length data in audio data).
3, other audio packs in video data are obtained, and parse original audio data (i.e. ES).
4, the ES stream of AAC is packaged by audio data head AAC decoder the format of ADTS, wherein be in AAC ES
The header file ADTSheader of 7 bytes of addition before stream, to parse audio data content.
As above, i.e., by the packets of audio data extracted in parsing video data, and then the specific interior of audio data is parsed
Hold, that is, has extracted the audio content in video data.
It is to be appreciated that the method that the audio data extracting mode in video data is not limited to foregoing description, the present invention is real
It is without restriction to the extracting mode of audio data to apply example.
After obtaining audio sample by the above method, by predefining 25, music style label, for example, popular sound
Pleasure, hip-hop music, rock music, rhythm and blues, soul music, Reggae, country music, Funk, folk rhyme, Middle East music,
Disco, classical music, electronic music, Latin music, Blues, child music, new generation music, vocal music, African music, Christmas
Music, Asian music, Ska, separate music, traditional music etc. manually mark above-mentioned audio sample, obtain annotated audio
Sample.
Step 102, multiple annotated audio data segments that the annotated audio sample is cut into preset length;
In practical applications, the length disunity of annotated audio sample will cause data mistake when carrying out batch processing
Difference finally obtains the training sample for meeting preset standard so needing to cut audio data, such as sample total is total to
140825, average every class 5633, when each sample, is about 10 seconds.
Wherein, annotated audio sample is split, obtains N number of annotated audio data segment of default size.It can will be upper
It states annotated audio sample and imports preset audio cutter and cut, cutting audio data can be pre-set in cutting
The duration of section, audio cutter may be implemented to carry out batch cutting according to the duration.
Certainly, the embodiment of the present invention is without restriction to the type of audio cutter.
It is to be appreciated that needing the preset requirement to training sample different on different models, therefore the embodiment of the present invention
It is without restriction to the specific length of audio section.
Each annotated audio data segment is handled the mark sample audio section feature for multiple default dimensions by step 103
Vector, using as mark sample set;
Preferably, step 103, further comprise:
Each annotated audio data segment is carried out sub-frame processing respectively by sub-step 1031, obtains each mark sound
Multiple framing annotated audio data segments of frequency data segment;
Specifically, as shown in Figure 1A, voice signal is being macroscopically jiggly, is smoothly, to have short on microcosmic
When stationarity (voice signal chosen in box in such as figure, in 10---30ms it is considered that voice signal approximation it is constant), this
A voice signal to be divided into segment to be handled, each segment is known as a frame (CHUNK), each certain piece
The duration of section is not limited to the 10---30ms of foregoing description, and the embodiment of the present invention is without restriction to the duration of frame.
Therefore, annotated audio data segment is further divided into the smaller mark framing audio data as unit of frame.
Each framing annotated audio data segment is multiplied by sub-step 1032 with windowed function respectively, obtains each described
The mark adding window audio data section of framing annotated audio data segment;
Specifically, when framing, each frame can repeat to intercept a part, i.e., the tail portion of previous true frame and present frame
After head respectively takes a part to be overlapped, then windowing process is carried out, so overall situation voice signal will not make one because of windowing process
The both ends part of frame signal, which is weakened, obtains the audio data of excessive noise reduction, so realizing the weight between frame and frame in framing
It is folded, so that the audio signal after windowing process is more continuous.
Wherein, mark framing audio data obtained above is subjected to windowing process, i.e., is as schemed by original audio signal
In 1B shown in left-hand component, by being multiplied with the intermediate windowed function as shown in the middle section Figure 1B, obtain on the right of Figure 1B
Logarithmic spectrum of every frame audio data on frequency domain shown in part, so that originally without periodic voice signal (such as mark point
Frame audio data) Partial Feature that shows periodic function, it is determined as the mark adding window audio data of above-mentioned framing audio data
Section.
Each mark adding window audio data section is carried out Meier transformation respectively by sub-step 1033, obtains each mark
Infuse the mark Meier frequency spectrum data of audio data section.
Further, in order to enable sound characteristic is intuitive in the mark adding window audio data obtained after framing and windowing process
It shows, needs to carry out Meier transformation, audio data is converted into mark Meier frequency spectrum data, wherein the unit of frequency is hertz
(Hz), the frequency range that human ear can be heard is 20-20000Hz, but human ear is not that linear perception is closed to this scale unit of Hz
System.For example, if pitch frequency is increased to 2000Hz, that ear can only be perceived if people have adapted to the tone of 1000Hz
A little is improved to frequency, frequency is detectable at all and is doubled.If converting Meier for common frequency scaling
Frequency scaling, then human ear is to the perceptibility of frequency just at linear relationship.That is, under Meier scale, if two sections of languages
The mel-frequency of sound differs twice, then the tone that human ear can perceive probably also differs twice.Having audio data realization can
Depending on the beneficial effect changed.
Each mark Meier frequency spectrum data is converted to the feature vector of default dimension respectively, obtained by sub-step 1034
To the mark sample audio section feature vector of each mark Meier frequency spectrum data.
In this step, above-mentioned mark Meier spectral image data are converted into the feature vector that machine can identify, wherein
Image data, which is converted to the common model of machine readable feature vector, BVLCGoogLeNet model, certainly, in practical application
In, it is not limited to the conversion regime of foregoing description, the embodiments of the present invention are not limited thereto.
Preferably, step 1034, further comprise:
Sub-step 10341, by it is described mark Meier frequency spectrum data in the corresponding Meier spectrum number of each frame audio data
According to being determined as sample framing Meier frequency spectrum data;
It in this step, extracts every frame audio in the annotated audio data segment of above-mentioned acquisition and corresponds to Meier frequency spectrum data figure, make
For framing Meier frequency spectrum data, it is determined as marking framing Meier frequency spectrum data.
The sample framing Meier frequency spectrum data is converted to sample framing audio feature vector by sub-step 10342;
In this step, each mark framing Meier spectrogram data is subjected to feature vector conversion.
Specifically, above-mentioned mark framing Meier spectral image data are converted to point by image feature vector transformation model
Frame audio feature vector, wherein known common image feature vector transformation model has BVLC GoogLeNet model, it is one
A 22 layers of deep convolutional network can detect the feature vector of 1000 kinds of different image types.
Certainly, foregoing description is not limited to for characteristics of image conversion method, the embodiments of the present invention are not limited thereto.
Sub-step 10343 splices the sample framing audio feature vector of default frame number, obtains default dimension
Mark sample audio section feature vector;
In this step, for the sample framing audio frequency characteristics for marking framing Meier frequency spectrum data obtained in step 10342
After vector, by multiple sample framing audio feature vectors merge into the mark sample audio section feature of a default dimension to
Amount is directed to the audio data of one second frame for example, framing audio feature vector is the feature vector of 128 dimensions, and for sound
The processing of frequency evidence, the information that one second frame is included are not enough to characterize the concrete type of audio data, so by the framing audio
The context-sensitive framing audio feature vector of feature vector merges, i.e. the corresponding feature vector of 3 seconds audio datas, i.e.,
3 framing audio feature vectors are spliced into the feature vector of 128*3=384 dimension.
Certainly, default dimension is not necessarily 384 dimensions mentioned above, is also possible to the group of five framing audio feature vectors
The audio feature vector for the default dimension of conjunction or ten framing audio feature vectors being composed, so, preset dimension
Whether what setting depended primarily on audio data includes enough information in case subsequent processing, therefore, the embodiment of the present invention is to pre-
If the specific value of dimension is not limited.
Each mark sample audio section set of eigenvectors is combined into mark sample set by sub-step 1035.
In this step, above-mentioned all mark sample audio section feature vectors are stored as a set, as mark sample
Collection.
Step 104 marks each preset musical for marking sample audio section feature vector in sample set for described
Genre labels are updated, and obtain mark sample audio training set;
In this step, the audio data typically directly downloaded there are strong noise (respectively data noise and label noise),
If directly training music style marking model, accuracy rate are lower.So further to above-mentioned mark sample audio training set
It carries out data cleansing, that is, with marking sample audio training set music style model, then is carried out with sample of the model to every class
The cleaning of label noise, the final music style data set for obtaining high quality, specific steps are described as follows:
Preferably, step 104, further comprise:
Sub-step 1041, according to preset ratio, from the mark sample set extract the mark sample audio section feature to
Amount, using as training sample feature set;
In this step, if marking sample set total amount totally 140825 mark sample audio section feature vectors, average every class
The mark sample audio section feature vector of music style has 5633, and when each sample is about 10 seconds, by default ratio therein
A part (such as 20%) in example (such as 50%) is extracted as training sample feature set, and is extracted and wherein 20% be used as training
Sample characteristics.
The training sample feature set is trained by sub-step 1042 by predetermined deep learning method, obtains second
Music style marking model;
In this step, training sample feature is trained by predetermined deep learning algorithm, obtains the second music style
Marking model, wherein predetermined deep learning algorithm can be Softmax classifier, certainly, in practical applications not
It is limited to Softmax classifier, the embodiment of the present invention is without restriction to specific deep learning method.
Sub-step 1043, using the mark sample audio section feature vector remaining in the mark sample set as test
Sample characteristics collection, and the test sample feature set is inputted into the second music style marking model, so that second sound
Happy style marking model exports the music style of each mark sample audio section feature vector in the test sample feature set
Label generates and updates mark sample set;
Wherein, remaining 30% in 50% in above-mentioned totally 140825 sample totals is extracted and is divided into conduct three times
Test set is cleaned, above-mentioned trained second music style marking model is inputted.
Specifically, music style label for labelling is carried out, the test set that will have been marked each time is again added to training set
In, it trains again, generates the second music style marking model of update, then extract 10% test set progress genre labels mark
Note, mark are put into training set, second of the second music style marking model updated of training, so until all tests after the completion
Collection all returns in training set, then the data in training set are to complete the sample data of cleaning, that is, update mark sample
Collection.
Sub-step 1044 merges update mark sample set with the training sample feature set, obtains mark sample
Audio training set.
In this step, the update mark sample set of above-mentioned completion cleaning merges with training sample feature set, as mark sample
This audio training set.
It unlabelled sample data is input to the second music style marking model marks it is to be appreciated that repeated multiple times
Note, mark are completed to gather training sample update training pattern again, can effectively improve mark accuracy rate, and training sample is got over
Huge, the accuracy rate that training pattern is used to mark is higher, the second music style marking model obtained by repetition training, finally
The music segmentation tag of all test sets marked out marks sample set in conjunction with above-mentioned update, and what is obtained is mark sample sound
Frequency training set.
Step 105 is trained the mark sample audio training set using deep learning method, obtains first music
Style marking model.
In this step, mark sample audio training set obtained above is carried out again by predetermined deep learning method
Training, finally obtains first music style marking model, effectively reduces the manpower of music label in artificial mark training sample
Cost, and training sample data amount is improved, improve model training efficiency and mark accuracy rate.
In embodiments of the present invention, it by being labeled according to preset musical genre labels to sample audio data, generates
Annotated audio sample;The annotated audio sample is cut into multiple annotated audio data segments of preset length;By each mark
Infuse audio data section processing be multiple default dimensions mark sample audio section feature vector, using as mark sample set;By institute
The preset musical genre labels for stating each mark sample audio section feature vector in mark sample set are updated, and are obtained
Mark sample audio training set;The mark sample audio training set is trained using deep learning method, obtains first
Music style marking model can carry out music style label to the audio data for not having music style label with efficiently and accurately
Label.
Referring to Fig. 2, be a kind of flow chart of audio-frequency processing method provided in an embodiment of the present invention, can specifically include as
Lower step:
Step 201 receives the mark request that target audio data are carried out with music style;
In the embodiment of the present invention, back-end server receives the music style mark that user is sent by application interface and asks
It asks, wherein music style mark requests one that the multitude of video data set of usual corresponding server storage or audio data are concentrated
Or multiple data sets carry out, wherein audio, video data set is usually to be stored according to the date, is also possible to according to upload
The data acquisition system of storage is marked in user identifier.Gather for example, the audio-video that user in February uploads is stored as one, in official
The audio-video of biography is stored as a set, initiating the label request of video watching focus is initiated for selected one or more set.
In practical applications, the mark for initiating music style for audio data sets or video data set is requested, such as
Fruit is directed to video data set, it is necessary to the audio data in video data set is extracted as target audio data, and
It is then further processed directly as target audio data for audio data sets.
Wherein, the extracting mode of video data sound intermediate frequency data is described in detail in a step 101, no longer superfluous herein
It states.
Certainly, for the concrete mode of audio-video storage set, it is not limited to foregoing description, this is not added in the embodiment of the present invention
With limitation.
Wherein, music style, i.e. style of song, the representative style for referring to that musical works shows on the whole are special
Point.
Therefore, it is concentrated for large data and carries out music style analysis, it can be automatically and efficiently to the short-sighted frequency of magnanimity or short
Audio data carries out music style analysis, to realize the purpose to user-customized recommended.
Step 202 is requested according to the label, using music style marking model, marks the target audio data
Music style;
Preferably, step 202, further comprise:
Sub-step 2021 is requested according to the mark, and the target audio data are divided into the audio number of preset length
According to section;
In the embodiment of the present invention, the audio data of said extracted is split, obtains the N section audio data of default size
Section, wherein above-mentioned audio data can be imported into preset audio cutter and cut, can manually selected and cut in cutting
The duration at audio data end is cut, and audio cutter may be implemented batch and cut.
Certainly, the embodiment of the present invention is without restriction to the type of audio cutting method.
Sub-step 2022, the feature vector by each audio data section processing to preset dimension;
In this step, the audio frequency characteristics of default dimension are converted to after pre-processing to each audio data section obtained above
Vector, description specific as follows:
Preferably, sub-step 2022 further comprise:
Sub-step 20221 carries out sub-frame processing to each audio data section, obtains framing audio data section;
In this step, in this step, framing windowing process will be carried out by second audio signal in above-mentioned each audio data section
It is converted with Meier.
Wherein, sub-frame processing is as shown in Figure 1A, voice signal be macroscopically it is jiggly, on microcosmic be smoothly,
With short-term stationarity (as shown in box, it is considered that voice signal approximation is constant in 10---30ms), this can
Handled so that voice signal is divided into segment, each segment is known as a frame (CHUNK), each certain segment when
The long 10---30ms for being not limited to foregoing description, the embodiment of the present invention are without restriction to the duration of frame.
The framing audio data section is multiplied by sub-step 20222 with windowed function, obtains adding window audio data section.
Wherein, it when framing, not intercept back-to-back, but overlapped a part, i.e., the tail of previous true frame
After portion respectively takes a part Chong Die with the head of present frame, then windowing process is carried out, so overall situation voice signal will not be because of adding window
It handles and the both ends part of a frame signal is weakened and obtains the audio data of excessive noise reduction, so realizing frame in framing
It is overlapping between frame, so that the audio signal after windowing process is more continuous.
Wherein, audio section downlink data upon handover obtained above is subjected to windowing process, i.e. original audio signal is as in Figure 1B
Shown in left-hand component, by being multiplied with the intermediate windowing process function as shown in the middle section Figure 1B, obtain on the right of Figure 1B
Logarithmic spectrum of every frame audio data on frequency domain shown in part, so that showing the period without periodic voice signal originally
The Partial Feature of function to get arrive audio section windowed data.
The adding window audio data section is carried out Meier transformation by sub-step 20223, obtains the Meier of the audio data section
Frequency spectrum data.
Further, in order to obtain after framing and windowing process to audio section windowed data in sound characteristic it is intuitive
It shows, needs to carry out Meier transformation to adding window audio data section, audio data is converted into Meier frequency spectrum data, have sound spy
The linear beneficial effect intuitively shown of sign.
Sub-step 20224, the feature vector that the Meier frequency spectrum data is converted to default dimension.
Preferably, sub-step 20224 further comprise:
Sub-step 202241, by the corresponding Meier frequency spectrum data of each frame audio data in the Meier frequency spectrum data,
It is determined as framing Meier frequency spectrum data;
In this step, intercept every frame audio in the audio data section of above-mentioned acquisition and correspond to Meier frequency spectrum data figure, as point
Frame Meier frequency spectrum data, that is, the Meier spectrogram data being segmented are determined as framing Meier frequency spectrum data.
The framing Meier frequency spectrum data is converted to framing audio feature vector by sub-step 202242;
In this step, each framing Meier spectrogram data is subjected to feature vector conversion.
Specifically, above-mentioned framing Meier spectral image data are converted into framing sound by image feature vector transformation model
Frequency feature vector, wherein known common image feature vector transformation model has BVLC GoogLeNet model, it is one 22
1000 kinds of different image format conversions can be machine-readable features vector by the deep convolutional network of layer.
Certainly, foregoing description is not limited to for characteristics of image conversion method, the embodiments of the present invention are not limited thereto.
Sub-step 202243 splices the framing audio feature vector of default frame number, obtains default dimension
Feature vector.
In this step, after the framing audio feature vector of above-mentioned framing Meier frequency spectrum data, by multiple framing sounds
Frequency feature vector merges into the audio feature vector of a default dimension, for example, framing audio feature vector is the feature of 128 dimensions
Vector is directed to the audio data of one second frame, and the processing for audio data, the information that one second frame is included are not enough to
The concrete type for characterizing audio data, so by the context-sensitive framing audio feature vector of the framing audio feature vector
It merges, i.e. the corresponding feature vector of 3 seconds audio datas, i.e. 3 framing audio feature vector splicings generate a 128*3=
The feature vector of 384 dimensions.
Certainly, default dimension is not necessarily 384 dimensions mentioned above, is also possible to the group of five framing audio feature vectors
The audio feature vector for the default dimension of conjunction or ten framing audio feature vectors being composed, so, preset dimension
Whether what setting depended primarily on audio data includes enough information in case subsequent processing, therefore, the embodiment of the present invention is to pre-
If the specific value of dimension is not limited.
The feature vector is input to music style marking model by sub-step 2023, so that the music wind
Lattice marking model exports the music style label of the feature vector;
In this step, the audio feature vector of the above-mentioned default dimension being spliced is inputted into trained first music wind
Lattice marking model exports the music style label of each audio feature vector.
Wherein, music style includes preset pop music, hip-hop music, rock music, rhythm and blues, soul sound
Pleasure, Reggae, country music, Funk, folk rhyme, Middle East music, Disco, classical music, electronic music, Latin music, Blues,
Child music, new generation music, vocal music, African music, Christmas, Asian music, Ska, separate music, traditional music etc..
Certainly, music style be not limited to it is above-mentioned enumerate, the present invention is without restriction to this.
Sub-step 2024 obtains the music style label of each audio data section in the target audio data
Number;
In this step, for multiple audio feature vectors that each audio data for being not fixed duration is handled, to wherein
Each audio feature vector carry out the output of music style label after, then entire audio data has multiple music styles label, this
When, it needs to take voting mechanism, the music style number of tags of each audio feature vector in entire audio data is counted.
Wherein, audio data is divided into the small fragment of 3s-5s, or also commonly uses the small fragment of 8s-10s, then will be above-mentioned small
Segment carries out framing and windowing process and Meier converts to obtain image feature data, and each image feature data obtains one
Music style label, then a segment of audio data may include multiple music style labels.
For example, each 3 seconds data segments correspond to different labels in one when a length of 5 minutes video datas, then whole
A 5 minutes video datas are made of 100 type labels, obtain the corresponding number of each type.
Sub-step 2025, by the number maximum value, or, the number is greater than or equal to the music style mark of preset threshold
Corresponding music style is signed, the music style of the target audio data is determined as.
In this step, as described above, 100 music style labels are right respectively in obtaining 5 minutes video data ends
After the number answered, the most music style label of number is determined as to the music style label of 5 minutes video datas, or will
Music style number of tags is ranked up, and takes out the label of sequence top N, the music style as the audio data section.
Certainly, in practical applications, a quantity threshold can also be preset, being more than in a certain music style number of tags should
When quantity threshold, that is, it is arranged to the music style label of the video data, for example, in 100 music style labels, it is pre- to be marked with
Signing quantity threshold is 30, wherein the music style label more than 30 has rock music and traditional music, then the audio data
Music style label be rock music and traditional music, and above-mentioned label is determined as the corresponding video of the audio data
The music style label of data carries out can be merged into traditional rock music when recommendation operates subsequent.
Music style labeling method of the present invention is illustrated below by way of specific example:
1) when carrying out music style label to video data, the audio data of video data is obtained first.
2) audio signal that will acquire carries out framing windowing process and Meier transformation, obtains the Meier frequency spectrum of audio data
Figure;
3) Meier spectrogram input VGGish depth model is obtained to the feature vector of the default dimension of Meier spectrogram;
4) by above-mentioned default dimensional characteristics vector input first pass through in advance machine learning algorithm Softmax Classifier into
The music style markup model of row training obtains the preset kind label of each default dimensional characteristics vector, such as hip-hop, rock and roll, stream
Row, folk rhyme, allusion, electronics etc.;
5) type of most music style number of labels, or the mark more than preset threshold will be finally obtained in audio data
Note is determined as the corresponding music style of the audio data.
The embodiment of the invention provides a kind of audio-frequency processing methods, carry out music style to target audio data by receiving
Mark request after, obtain target audio data, and according to the mark request, the target audio data are divided into default
Each audio data section processing is the feature vector of default dimension by the audio data section of length;By the audio
Section feature vector is input to trained first music style marking model, marks music style label;Obtain the target sound
The number of frequency music style label of each audio data section in;According to music style number of tags, determining pair
The final music style for answering video data realizes the mesh that batch is efficiently labeled video data sound intermediate frequency music style
, the cost of labor of music style label is saved, music style labeling effciency is improved.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Referring to Fig. 3, being a kind of structural schematic diagram of model generating means 300 provided in an embodiment of the present invention, specifically may be used
To include following module:
Annotated audio sample generation module 301, for being marked according to preset musical genre labels to sample audio data
Note generates annotated audio sample;
Annotated audio data segment obtains module 302, for the annotated audio sample to be cut into the multiple of preset length
Annotated audio data segment;
Sample set determining module 303 is marked, for handling each annotated audio data segment for multiple default dimensions
Mark sample audio section feature vector, using as mark sample set;
Preferably, the mark sample set determining module 303, comprising:
Annotated audio data segment generates submodule, for respectively carrying out each annotated audio data segment at framing
Reason, obtains multiple framing annotated audio data segments of each annotated audio data segment;
Mark adding window audio data section generate submodule, for respectively will each framing annotated audio data segment with add
Window function is multiplied, and obtains the mark adding window audio data section of each framing annotated audio data segment;
Mark Meier frequency spectrum data obtains submodule, for each mark adding window audio data section to be carried out plum respectively
You convert, and obtain the mark Meier frequency spectrum data of each annotated audio data segment;
Mark sample audio section feature vector obtains submodule, for respectively turning each mark Meier frequency spectrum data
It is changed to the feature vector of default dimension, obtains the mark sample audio section feature vector of each mark Meier frequency spectrum data;
Preferably, the mark sample audio section feature vector obtains submodule, comprising:
Sample framing Meier frequency spectrum data determination unit, for by it is described mark Meier frequency spectrum data in each frame audio
The corresponding Meier frequency spectrum data of data is determined as sample framing Meier frequency spectrum data;
Sample framing audio feature vector acquiring unit, for the sample framing Meier frequency spectrum data to be converted to sample
Framing audio feature vector;
Mark sample audio section feature vector obtain unit, for by the sample framing audio frequency characteristics of default frame number to
Amount is spliced, and the mark sample audio section feature vector of default dimension is obtained.
Mark sample set determines submodule, for each mark sample audio section set of eigenvectors to be combined into mark sample
Collection.
Sample audio training set generation module 304 is marked, is used for the mark sample sound each in the mark sample set
The preset musical genre labels of frequency range feature vector are updated, and obtain mark sample audio training set;
Preferably, the mark sample audio training set generation module 304, comprising:
Training sample feature generates submodule, for extracting the mark from the mark sample set according to preset ratio
Sample audio section feature vector, using as training sample feature set;
Second music style marking model training module, for learning the training sample feature set by predetermined depth
Method is trained, and obtains the second music style marking model;
Mark sample set submodule is updated, is used for the mark sample audio Duan Te remaining in the mark sample set
Vector is levied as test sample feature set, and the test sample feature set is inputted into the music style marking model, so that
The second music style marking model exports each mark sample audio section feature vector in the test sample feature set
Music style label, generate update mark sample set;
Sample audio training set acquisition submodule is marked, for update mark sample set and the training sample is special
Collection merges, and obtains mark sample audio training set.
First music style marking model training module 305, for utilizing deep learning method to the mark sample sound
Frequency training set is trained, and obtains first music style marking model.
In the embodiment of the present invention, by annotated audio sample generation module, it is used for according to preset musical genre labels to sample
This audio data is labeled, and generates annotated audio sample;Annotated audio data segment obtains module, is used for the annotated audio
Sample is cut into multiple annotated audio data segments of preset length;Sample set determining module is marked, is used for each mark sound
Frequency data segment processing be multiple default dimensions mark sample audio section feature vector, using as mark sample set;Mark sample
Audio training set generation module, for by it is described mark sample set in it is each it is described mark sample audio section feature vector it is described pre-
If music style label is updated, mark sample audio training set is obtained;First music style marking model training module is used
In being trained using deep learning method to the mark sample audio training set, first music style marking model is obtained,
It can be with the label for carrying out music style label to the audio data for not having music style label of efficiently and accurately.
Optionally, in another embodiment, as shown in figure 4, including a kind of apparatus for processing audio 400, described device includes:
Music style marks request receiving module 401, for receiving the mark that target audio data are carried out with music style
Request;
Music style labeling module 402, using music style marking model, marks institute for requesting according to the label
State the music style of target audio data.
Preferably, the music style labeling module 402, comprising:
The target audio data are divided into pre- by audio data section acquisition submodule for being requested according to the mark
If the audio data section of length;
Feature vector acquiring unit, for each audio data section processing is special for the audio section of default dimension
Levy vector;
Preferably, the feature vector acquiring unit includes:
Framing audio data section obtains unit, for carrying out sub-frame processing to each audio data section, obtains framing sound
Frequency data segment;
Adding window audio data section obtains unit, for the framing audio data section to be multiplied with windowed function, is added
Window audio data section;
Meier frequency spectrum data obtains unit, for the adding window audio data section to be carried out Meier transformation, obtains the sound
The Meier frequency spectrum data of frequency data segment;
Feature vector acquiring unit, the audio section for the Meier frequency spectrum data to be converted to default dimension are special
Levy vector.
Preferably, the feature vector acquiring unit, comprising:
Framing Meier frequency spectrum data determines subelement, for by each frame audio data pair in the Meier frequency spectrum data
The Meier frequency spectrum data answered is determined as framing Meier frequency spectrum data;
Framing audio feature vector obtains subelement, special for the framing Meier frequency spectrum data to be converted to framing audio
Levy vector;
Feature vector obtains subelement, for spelling the framing audio feature vector of default frame number
It connects, obtains the feature vector of default dimension.
Music style label acquisition submodule, for the feature vector to be input to music style mark mould
Type, so that the music style marking model exports the music style label of the feature vector;
Music style number of tags acquisition submodule, for obtaining each audio data section in the target audio data
The music style label number;
Music style label determines submodule, for being preset or, the number is greater than or equal to by the number maximum value
The corresponding music style of music style label of threshold value, is determined as the music style of the target audio data.
In the embodiment of the present invention, music style marks request receiving module, carries out sound to target audio data for receiving
The mark of happy style is requested;Music style labeling module, for according to label request, using music style marking model,
Mark the music style of the target audio data.Batch is realized efficiently to be labeled video data sound intermediate frequency music style
Purpose, save music style label cost of labor, improve music style labeling effciency.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
In the embodiment of the present invention, in the video search request for receiving user's input, first marks the video search and ask
The label and tag types are input in video semanteme label independence model by label and tag types in asking, screening
Semantic independent label out, and video search is carried out to semantic independent label, it obtains and the independent label phase of the semanteme
The video matched.The embodiment of the present invention is scanned for according to the independent label of semanteme filtered out, is reduced due to accidentally searching for label
Incoherent video search result is recalled, to improve the accuracy rate of video search.
Optionally, the embodiment of the present invention also provides a kind of terminal, including processor, and memory stores on a memory simultaneously
The computer program that can be run on the processor, the computer program realize above-mentioned model generation side when being executed by processor
Each process of method or audio-frequency processing method embodiment, and identical technical effect can be reached, it is no longer superfluous here to avoid repeating
It states.
Optionally, the embodiment of the present invention also provides a kind of computer readable storage medium, on computer readable storage medium
It is stored with computer program, which realizes above-mentioned model generating method or audio-frequency processing method when being executed by processor
Each process of embodiment, and identical technical effect can be reached, to avoid repeating, which is not described herein again.Wherein, the meter
Calculation machine readable storage medium storing program for executing, such as read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random
Access Memory, abbreviation RAM), magnetic or disk etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the claim is intended to be construed to
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
It above can to a kind of model generating method provided by the present invention, audio-frequency processing method, device, terminal and computer
Storage medium is read, is described in detail, specific case used herein carries out the principle of the present invention and embodiment
It illustrates, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile for this field
Those skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up
Described, the contents of this specification are not to be construed as limiting the invention.
Claims (18)
1. a kind of model generating method characterized by comprising
Sample audio data is labeled according to preset musical genre labels, generates annotated audio sample;
The annotated audio sample is cut into multiple annotated audio data segments of preset length;
Each annotated audio data segment is handled into the mark sample audio section feature vector for multiple default dimensions, using as mark
Infuse sample set;
The preset musical genre labels of the mark sample audio section feature vector each in the mark sample set are carried out
It updates, obtains mark sample audio training set;
The mark sample audio training set is trained using deep learning method, obtains first music style mark mould
Type.
2. the method according to claim 1, wherein described by the mark sample each in the mark sample set
The preset musical genre labels of feature vector are updated, and obtain mark sample audio training set, comprising:
According to preset ratio, the mark sample audio section feature vector is extracted from the mark sample set, using as trained sample
Eigen collection;
The training sample feature set is trained by predetermined deep learning method, obtains the second music style mark mould
Type;
Using the mark sample audio section feature vector remaining in the mark sample set as test sample feature set, and will
The test sample feature set inputs the second music style marking model, so that the second music style marking model is defeated
Each music style label for marking sample audio section feature vector in the test sample feature set out, generates and updates mark
Sample set;
Update mark sample set is merged with the training sample feature set, obtains mark sample audio training set.
3. the method according to claim 1, wherein described handle each annotated audio data segment is multiple
The mark sample audio section feature vector of default dimension, using as mark sample set, comprising:
Each annotated audio data segment is subjected to sub-frame processing respectively, obtains multiple points of each annotated audio data segment
Frame annotated audio data segment;
Each framing annotated audio data segment is multiplied with windowed function respectively, obtains each framing annotated audio data
The mark adding window audio data section of section;
Each mark adding window audio data section is subjected to Meier transformation respectively, obtains the mark of each annotated audio data segment
Infuse Meier frequency spectrum data;
The feature vector that each mark Meier frequency spectrum data is converted to default dimension respectively, obtains each mark Meier
The mark sample audio section feature vector of frequency spectrum data;
Each mark sample audio section set of eigenvectors is combined into mark sample set.
4. according to the method described in claim 3, it is characterized in that, described respectively turn each mark Meier frequency spectrum data
It is changed to the feature vector of default dimension, obtains the mark sample audio section feature vector of each mark Meier frequency spectrum data, packet
It includes:
By the corresponding Meier frequency spectrum data of each frame audio data in the mark Meier frequency spectrum data, it is determined as sample framing
Meier frequency spectrum data;
The sample framing Meier frequency spectrum data is converted into sample framing audio feature vector;
The sample framing audio feature vector of default frame number is spliced, the mark sample audio section of default dimension is obtained
Feature vector.
5. a kind of audio-frequency processing method characterized by comprising
The mark for carrying out music style to target audio data is received to request;
It is requested according to the label, using music style marking model, marks the music style of the target audio data;It is described
Music style marking model is to be obtained using any one of any one of claims 1 to 44 the method.
6. according to the method described in claim 5, it is characterized in that, it is described according to the label request, utilize music style mark
Injection molding type marks the music style of the target audio data, comprising:
It is requested according to the mark, the target audio data is divided into the audio data section of preset length;
It is the feature vector of default dimension by each audio data section processing;
The feature vector is input to music style marking model, so that the music style marking model exports institute
State the music style label of feature vector;
Obtain the number of the music style label of each audio data section in the target audio data;
By the number maximum value, or, the number is greater than or equal to the corresponding music wind of music style label of preset threshold
Lattice are determined as the music style of the target audio data.
7. according to the method described in claim 6, it is characterized in that, described handle each audio data section for default dimension
Feature vector, comprising:
Sub-frame processing is carried out to each audio data section, obtains framing audio data section;
The framing audio data section is multiplied with windowed function, obtains adding window audio data section;
The adding window audio data section is subjected to Meier transformation, obtains the Meier frequency spectrum data of the audio data section;
The Meier frequency spectrum data is converted to the feature vector of default dimension.
8. the method according to the description of claim 7 is characterized in that described be converted to default dimension for the Meier frequency spectrum data
Feature vector, comprising:
By the corresponding Meier frequency spectrum data of each frame audio data in the Meier frequency spectrum data, it is determined as framing Meier frequency spectrum
Data;
The framing Meier frequency spectrum data is converted into framing audio feature vector;
The framing audio feature vector of default frame number is spliced, the feature vector of default dimension is obtained.
9. a kind of model generating means characterized by comprising
Annotated audio sample generation module is generated for being labeled according to preset musical genre labels to sample audio data
Annotated audio sample;
Annotated audio data segment obtains module, for the annotated audio sample to be cut into multiple annotated audios of preset length
Data segment;
Sample set determining module is marked, for handling each annotated audio data segment for the mark sample of multiple default dimensions
Feature vector, using as mark sample set;
Sample audio training set generation module is marked, is used for the mark sample audio section feature each in the mark sample set
The preset musical genre labels of vector are updated, and obtain mark sample audio training set;
First music style marking model training module, for utilizing deep learning method to the mark sample audio training set
It is trained, obtains first music style marking model.
10. device according to claim 9, which is characterized in that the mark sample audio training set generation module, packet
It includes:
Training sample feature generates submodule, for extracting the mark sample from the mark sample set according to preset ratio
Feature vector, using as training sample feature set;
Second music style marking model training module, for the training sample feature set to be passed through predetermined deep learning method
It is trained, obtains the second music style marking model;
Update mark sample set submodule, for by the mark sample audio section feature remaining in the mark sample set to
Amount is used as test sample feature set, and the test sample feature set is inputted the second music style marking model, so that
The second music style marking model exports each mark sample audio section feature vector in the test sample feature set
Music style label, generate update mark sample set;
Sample audio training set acquisition submodule is marked, for the update to be marked sample set and the training sample feature set
Merge, obtains mark sample audio training set.
11. device according to claim 9, which is characterized in that the mark sample set determining module, comprising:
Annotated audio data segment generates submodule, for each annotated audio data segment to be carried out sub-frame processing respectively, obtains
To multiple framing annotated audio data segments of each annotated audio data segment;
It marks adding window audio data section and generates submodule, for respectively by each framing annotated audio data segment and adding window letter
Number is multiplied, and obtains the mark adding window audio data section of each framing annotated audio data segment;
Mark Meier frequency spectrum data obtains submodule, for each mark adding window audio data section to be carried out Meier change respectively
It changes, obtains the mark Meier frequency spectrum data of each annotated audio data segment;
Mark sample audio section feature vector obtains submodule, for being respectively converted to each mark Meier frequency spectrum data
The feature vector of default dimension obtains the mark sample audio section feature vector of each mark Meier frequency spectrum data;
Mark sample set determines submodule, for each mark sample audio section set of eigenvectors to be combined into mark sample set.
12. device according to claim 11, which is characterized in that the mark sample audio section feature vector obtains submodule
Block, comprising:
Sample framing Meier frequency spectrum data determination unit, for by it is described mark Meier frequency spectrum data in each frame audio data
Corresponding Meier frequency spectrum data is determined as sample framing Meier frequency spectrum data;
Sample framing audio feature vector acquiring unit, for the sample framing Meier frequency spectrum data to be converted to sample framing
Audio feature vector;
Mark sample audio section feature vector obtain unit, for by the sample framing audio feature vector of default frame number into
Row splicing obtains the mark sample audio section feature vector of default dimension.
13. a kind of apparatus for processing audio characterized by comprising
Music style marks request receiving module, for receiving the mark request that target audio data are carried out with music style;
Music style labeling module, using music style marking model, marks the target sound for requesting according to the label
The music style of frequency evidence.
14. device according to claim 13, which is characterized in that the music style labeling module, comprising:
The target audio data are divided into default length for requesting according to the mark by audio data section acquisition submodule
The audio data section of degree;
Feature vector acquiring unit, for by each audio data section processing for default dimension feature to
Amount;
Music style label acquisition submodule, for the feature vector to be input to music style marking model, with
The music style marking model is set to export the music style label of the feature vector;
Music style number of tags acquisition submodule, for obtaining the institute of each audio data section in the target audio data
State the number of music style label;
Music style label determines submodule, is used for the number maximum value, or, the number is greater than or equal to preset threshold
The corresponding music style of music style label, be determined as the music style of the target audio data.
15. device according to claim 14, which is characterized in that the feature vector acquiring unit includes:
Framing audio data section obtains unit, for carrying out sub-frame processing to each audio data section, obtains framing audio number
According to section;
Adding window audio data section obtains unit, for the framing audio data section to be multiplied with windowed function, obtains adding window sound
Frequency data segment;
Meier frequency spectrum data obtains unit, for the adding window audio data section to be carried out Meier transformation, obtains the audio number
According to the Meier frequency spectrum data of section;
Feature vector acquiring unit, for the Meier frequency spectrum data is converted to the feature of default dimension to
Amount.
16. device according to claim 15, which is characterized in that the feature vector acquiring unit, comprising:
Framing Meier frequency spectrum data determines subelement, for each frame audio data in the Meier frequency spectrum data is corresponding
Meier frequency spectrum data is determined as framing Meier frequency spectrum data;
Framing audio feature vector obtain subelement, for by the framing Meier frequency spectrum data be converted to framing audio frequency characteristics to
Amount;
Feature vector obtains subelement, for splicing the framing audio feature vector of default frame number, obtains
To the feature vector of default dimension.
17. a kind of terminal characterized by comprising memory, processor and be stored on the memory and can be at the place
The computer program run on reason device is realized when the computer program is executed by the processor as appointed in Claims 1-4
Step in one model generating method, or the step of the audio-frequency processing method as described in any one of claim 5 to 8
Suddenly.
18. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program is realized as described in any one of claims 1 to 4 when the computer program is executed by processor in model generating method
The step of, or the step of audio-frequency processing method as described in any one of claim 5 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134014.5A CN109977255A (en) | 2019-02-22 | 2019-02-22 | Model generating method, audio-frequency processing method, device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134014.5A CN109977255A (en) | 2019-02-22 | 2019-02-22 | Model generating method, audio-frequency processing method, device, terminal and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977255A true CN109977255A (en) | 2019-07-05 |
Family
ID=67077283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910134014.5A Pending CN109977255A (en) | 2019-02-22 | 2019-02-22 | Model generating method, audio-frequency processing method, device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977255A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555117A (en) * | 2019-09-10 | 2019-12-10 | 联想(北京)有限公司 | data processing method and device and electronic equipment |
CN110853605A (en) * | 2019-11-15 | 2020-02-28 | 中国传媒大学 | Music generation method and device and electronic equipment |
CN111026908A (en) * | 2019-12-10 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Song label determination method and device, computer equipment and storage medium |
CN111261244A (en) * | 2020-01-19 | 2020-06-09 | 戴纳智慧医疗科技有限公司 | Sample information acquisition and storage system and method |
CN111326136A (en) * | 2020-02-13 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN111429942A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN111859011A (en) * | 2020-07-16 | 2020-10-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, storage medium and electronic equipment |
CN113593362A (en) * | 2021-06-30 | 2021-11-02 | 江苏第二师范学院 | Vocal music teaching equipment with degree of depth learning function |
CN114582366A (en) * | 2022-03-02 | 2022-06-03 | 浪潮云信息技术股份公司 | Method for realizing audio segmentation labeling based on LapSVM |
CN114820888A (en) * | 2022-04-24 | 2022-07-29 | 广州虎牙科技有限公司 | Animation generation method and system and computer equipment |
CN116959393A (en) * | 2023-09-18 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Training data generation method, device, equipment and medium of music generation model |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053836A (en) * | 2018-01-18 | 2018-05-18 | 成都嗨翻屋文化传播有限公司 | A kind of audio automation mask method based on deep learning |
US20180276540A1 (en) * | 2017-03-22 | 2018-09-27 | NextEv USA, Inc. | Modeling of the latent embedding of music using deep neural network |
CN108764372A (en) * | 2018-06-08 | 2018-11-06 | Oppo广东移动通信有限公司 | Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set |
CN108897829A (en) * | 2018-06-22 | 2018-11-27 | 广州多益网络股份有限公司 | Modification method, device and the storage medium of data label |
CN108962279A (en) * | 2018-07-05 | 2018-12-07 | 平安科技(深圳)有限公司 | New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data |
CN109147826A (en) * | 2018-08-22 | 2019-01-04 | 平安科技(深圳)有限公司 | Music emotion recognition method, device, computer equipment and computer storage medium |
KR101943075B1 (en) * | 2017-11-06 | 2019-01-28 | 주식회사 아티스츠카드 | Method for automatical tagging metadata of music content using machine learning |
-
2019
- 2019-02-22 CN CN201910134014.5A patent/CN109977255A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180276540A1 (en) * | 2017-03-22 | 2018-09-27 | NextEv USA, Inc. | Modeling of the latent embedding of music using deep neural network |
KR101943075B1 (en) * | 2017-11-06 | 2019-01-28 | 주식회사 아티스츠카드 | Method for automatical tagging metadata of music content using machine learning |
CN108053836A (en) * | 2018-01-18 | 2018-05-18 | 成都嗨翻屋文化传播有限公司 | A kind of audio automation mask method based on deep learning |
CN108764372A (en) * | 2018-06-08 | 2018-11-06 | Oppo广东移动通信有限公司 | Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set |
CN108897829A (en) * | 2018-06-22 | 2018-11-27 | 广州多益网络股份有限公司 | Modification method, device and the storage medium of data label |
CN108962279A (en) * | 2018-07-05 | 2018-12-07 | 平安科技(深圳)有限公司 | New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data |
CN109147826A (en) * | 2018-08-22 | 2019-01-04 | 平安科技(深圳)有限公司 | Music emotion recognition method, device, computer equipment and computer storage medium |
Non-Patent Citations (1)
Title |
---|
韩凝: ""基于深度神经网络的音乐自动标注技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555117B (en) * | 2019-09-10 | 2022-05-31 | 联想(北京)有限公司 | Data processing method and device and electronic equipment |
CN110555117A (en) * | 2019-09-10 | 2019-12-10 | 联想(北京)有限公司 | data processing method and device and electronic equipment |
CN110853605A (en) * | 2019-11-15 | 2020-02-28 | 中国传媒大学 | Music generation method and device and electronic equipment |
CN111026908A (en) * | 2019-12-10 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Song label determination method and device, computer equipment and storage medium |
CN111026908B (en) * | 2019-12-10 | 2023-09-08 | 腾讯科技(深圳)有限公司 | Song label determining method, device, computer equipment and storage medium |
CN111261244A (en) * | 2020-01-19 | 2020-06-09 | 戴纳智慧医疗科技有限公司 | Sample information acquisition and storage system and method |
CN111326136A (en) * | 2020-02-13 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN111326136B (en) * | 2020-02-13 | 2022-10-14 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN111429942B (en) * | 2020-03-19 | 2023-07-14 | 北京火山引擎科技有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN111429942A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN111859011A (en) * | 2020-07-16 | 2020-10-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, storage medium and electronic equipment |
CN111859011B (en) * | 2020-07-16 | 2024-08-23 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method and device, storage medium and electronic equipment |
CN113593362A (en) * | 2021-06-30 | 2021-11-02 | 江苏第二师范学院 | Vocal music teaching equipment with degree of depth learning function |
CN114582366A (en) * | 2022-03-02 | 2022-06-03 | 浪潮云信息技术股份公司 | Method for realizing audio segmentation labeling based on LapSVM |
CN114820888A (en) * | 2022-04-24 | 2022-07-29 | 广州虎牙科技有限公司 | Animation generation method and system and computer equipment |
CN116959393A (en) * | 2023-09-18 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Training data generation method, device, equipment and medium of music generation model |
CN116959393B (en) * | 2023-09-18 | 2023-12-22 | 腾讯科技(深圳)有限公司 | Training data generation method, device, equipment and medium of music generation model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977255A (en) | Model generating method, audio-frequency processing method, device, terminal and storage medium | |
CN110008372A (en) | Model generating method, audio-frequency processing method, device, terminal and storage medium | |
CN111182347B (en) | Video clip cutting method, device, computer equipment and storage medium | |
CN111488489B (en) | Video file classification method, device, medium and electronic equipment | |
CN105869629B (en) | Audio recognition method and device | |
CN107526809B (en) | Method and device for pushing music based on artificial intelligence | |
CN110019852A (en) | Multimedia resource searching method and device | |
CN113573161B (en) | Multimedia data processing method, device, equipment and storage medium | |
CN112802446B (en) | Audio synthesis method and device, electronic equipment and computer readable storage medium | |
CN107680584B (en) | Method and device for segmenting audio | |
CN111125384B (en) | Multimedia answer generation method and device, terminal equipment and storage medium | |
CN111626049A (en) | Title correction method and device for multimedia information, electronic equipment and storage medium | |
CN111508466A (en) | Text processing method, device and equipment and computer readable storage medium | |
CN113259763B (en) | Teaching video processing method and device and electronic equipment | |
CN106098081B (en) | Sound quality identification method and device for sound file | |
CN109982137A (en) | Model generating method, video marker method, apparatus, terminal and storage medium | |
CN105718486A (en) | Online query by humming method and system | |
CN109376145B (en) | Method and device for establishing movie and television dialogue database and storage medium | |
CN110889008B (en) | Music recommendation method and device, computing device and storage medium | |
CN111354325A (en) | Automatic word and song creation system and method thereof | |
CN110555117B (en) | Data processing method and device and electronic equipment | |
CN114117096B (en) | Multimedia data processing method and related equipment | |
CN113299271B (en) | Speech synthesis method, speech interaction method, device and equipment | |
CN113112993B (en) | Audio information processing method and device, electronic equipment and storage medium | |
CN113781988A (en) | Subtitle display method, subtitle display device, electronic equipment and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |
|
RJ01 | Rejection of invention patent application after publication |