CN110400559A - A kind of audio synthetic method, device and equipment - Google Patents
A kind of audio synthetic method, device and equipment Download PDFInfo
- Publication number
- CN110400559A CN110400559A CN201910579288.5A CN201910579288A CN110400559A CN 110400559 A CN110400559 A CN 110400559A CN 201910579288 A CN201910579288 A CN 201910579288A CN 110400559 A CN110400559 A CN 110400559A
- Authority
- CN
- China
- Prior art keywords
- slice
- audio
- candidate
- candidate audio
- son
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010189 synthetic method Methods 0.000 title claims abstract description 14
- 239000002131 composite material Substances 0.000 claims abstract description 59
- 239000013598 vector Substances 0.000 claims description 149
- 238000010606 normalization Methods 0.000 claims description 29
- 238000000034 method Methods 0.000 claims description 26
- 230000015572 biosynthetic process Effects 0.000 claims description 20
- 238000003786 synthesis reaction Methods 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
The disclosure is directed to a kind of audio synthetic method, device and equipment, for improving the efficiency of production Composite tone file.The audio synthetic method includes: to obtain at least two candidate audios slice;Obtain the first son slice and the second son slice of each candidate audio slice in at least two candidate audios slice;Wherein, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio, the slice of the predetermined time period before the finish time that the second son slice is sliced for corresponding candidate audio;According to the first son slice and the second son slice of each candidate audio slice, the similarity between the every two candidate audio slice in at least two candidate audios slice from different audio files is calculated;It will be stitched together some or all of at least two candidate audio slice according to the similarity between every two candidate audio slice, obtain Composite tone file.
Description
Technical field
This disclosure relates to audio signal processing technique field more particularly to a kind of audio synthetic method, device and equipment.
Background technique
With the rapid development of various application software, the user of electronic equipment can be used application software and carry out various amusements
Activity, to enrich the life of user.Such as user can be used software and carry out audio synthesis, that is, by different audio pieces
Section is stitched together, and forms new, the distinctive audio file of tool.
Current existing audio synthetic method, is some audio composite softwares to be borrowed, from time by artificial mode mostly
It selects interception in audio file to need the part synthesized, then the part synthesized will be needed to be stitched together, whole operation process is more multiple
Miscellaneous, producing efficiency is low.
Summary of the invention
The disclosure provides a kind of audio synthetic method, device and equipment, at least to solve Composite tone in the related technology
The low technical problem of the producing efficiency of file.The technical solution of the disclosure is as follows:
According to the first aspect of the embodiments of the present disclosure, a kind of audio synthetic method is provided, comprising:
Obtain at least two candidate audios slice;Wherein, at least two candidate audios slice comes from least two sounds
Frequency file;
The the first son slice and the second son for obtaining each candidate audio slice in at least two candidate audios slice are cut
Piece;Wherein, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio,
The slice of the predetermined time period before the finish time that the second son slice is sliced for corresponding candidate audio;
According to the first son slice and the second son slice of each candidate audio slice, it is candidate to calculate described at least two
The similarity between every two candidate audio slice in audio slice from different audio files;
It will be at least two candidate audio slice according to the similarity between every two candidate audio slice
It is partly or entirely stitched together, obtains Composite tone file.
Optionally, it is sliced according to the first sub- slice of each candidate audio slice and the second son, calculating is described at least
The similarity between every two candidate audio slice in two candidate audio slices from different audio files, comprising:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained
Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son
Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two
Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set
An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one
A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set
Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection
The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and
Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set
Levy the distance between vector.
Optionally, at least two candidate audio is cut according to the similarity between every two candidate audio slice
It is stitched together some or all of in piece, obtains Composite tone file, comprising:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice
Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after
Similarity between first son slice of continuous candidate audio slice and the second son slice of second candidate audio slice
Highest;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
It is stitched together, obtains the Composite tone file.
Optionally, according to preset rules, subsequent candidate audio slice is determined, comprising:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to
Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, by the subsequent candidate audio slice, second candidate audio slice and the first candidate
Audio slice is stitched together, and obtains the Composite tone file, comprising:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and
The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization
The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, at least two candidate audios slice is obtained, comprising:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file
Frequency is sliced.
According to the second aspect of an embodiment of the present disclosure, a kind of audio synthesizer is provided, comprising:
First obtains module, for obtaining at least two candidate audios slice;Wherein, at least two candidate audio is cut
Piece comes from least two audio files;
Second obtains module, for obtaining first of each candidate audio slice in at least two candidate audios slice
Son slice and the second son slice;Wherein, the first son slice is the pre- since initial time of corresponding candidate audio slice
If the slice of time span, the second son slice is the preset time before the finish time of corresponding candidate audio slice
The slice of length;
Computing module calculates institute for being sliced according to the first son slice and the second son of each candidate audio slice
State the similarity between the every two candidate audio slice at least two candidate audios slice from different audio files;
Third obtains module, for according to the similarity between every two candidate audio slice by described at least two
It is stitched together some or all of in candidate audio slice, obtains Composite tone file.
Optionally, it is cut in the computing module according to the first son slice and the second son of each candidate audio slice
Piece calculates the phase between the every two candidate audio slice in at least two candidate audios slice from different audio files
When seemingly spending, it is specifically used for:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained
Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son
Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two
Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set
An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one
A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set
Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection
The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and
Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set
Levy the distance between vector.
Optionally, obtaining module in the third will be described according to the similarity between every two candidate audio slice
It is stitched together some or all of at least two candidate audios slice, when obtaining Composite tone file, is specifically used for:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice
Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after
First subcharacter vector of the first son slice of continuous candidate audio slice and the second son of second candidate audio slice
Similarity highest between second subcharacter vector of slice;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
It is stitched together, obtains the Composite tone file.
Optionally, module is obtained according to preset rules, when determining subsequent candidate audio slice, specifically in the third
For:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to
Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, module is obtained by the subsequent candidate audio slice, the second candidate audio in the third
Slice and the first candidate audio slice are stitched together, and when obtaining the Composite tone file, are specifically used for:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and
The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization
The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, described first module is obtained when obtaining at least two candidate audios slice, be specifically used for:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file
Frequency is sliced.
According to the third aspect of the disclosure, a kind of audio synthesis apparatus is provided, comprising:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is configured to executing described instruction, any one of such as first aspect or first aspect is realized
Plant method involved in may designing.
According to the fourth aspect of the disclosure, a kind of storage medium is provided, for being stored as described in above-mentioned second aspect
Computer software instructions used in audio synthesis apparatus described in audio synthesizer or the third aspect, and include for executing
It is program designed by audio synthesizer in any one design of above-mentioned first aspect or first aspect.
According to the 5th of the disclosure the aspect, a kind of computer program product is provided, which calls by computer
When execution, computer can be made to execute any one method designed of first aspect, first aspect.
The technical scheme provided by this disclosed embodiment at least act on behalf of it is following the utility model has the advantages that
It in the embodiments of the present disclosure, is according at least two candidate audios after obtaining at least two candidate audios slice
The the first son slice and the second son slice of each candidate audio slice, calculate and come from least two candidate audios slice in slice
Similarity between the every two candidate audio slice of different audio files, then according to calculate the similarity obtained will at least two
It is stitched together some or all of in a candidate audio slice, to obtain Composite tone file.Without Composite tone file
Producer manually divide, interception need composite part, then composite part will be needed to be stitched together again, so as to improve
The producing efficiency of Composite tone file, and the requirement to reducing to producer, so that technical solution provided by the present disclosure has more
Universality.
It further, is to be sliced according to the first son of each candidate audio slice and the second son slice in the embodiments of the present disclosure
Come calculate at least two candidate audios slice in from different audio files every two candidate audio slice between similarity,
It is continuous between the run-out of previous candidate audio slice to be spliced and the piece head of latter candidate audio slice so as to guarantee
Property and continuity, and then can be improved the accuracy of splicing.Simultaneously as the length phase of the first son slice and the second son slice
Together, it can be avoided because slice length is inconsistent to being subsequently generated interference brought by the corresponding feature vector of sub- slice, so as to
It is enough to improve the accuracy for calculating similarity, and then the accuracy of splicing can be further increased.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure, do not constitute the improper restriction to the disclosure.
Fig. 1 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 2 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 3 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 4 is a kind of audio synthetic method schematic diagram shown according to an exemplary embodiment;
Fig. 5 is a kind of block diagram of audio synthesizer shown according to an exemplary embodiment;
Fig. 6 is a kind of block diagram of audio synthesis apparatus shown according to an exemplary embodiment.
Specific embodiment
To make those of ordinary skill in the art more fully understand the technical solution of the disclosure, below in conjunction with attached drawing, to this
Technical solution in open embodiment is clearly and completely described.
Hereinafter, the part term in the embodiment of the present disclosure is explained, in order to those of ordinary skill in the art
Understand.
(1) audio synthesizer and audio synthesis apparatus, can be portable device, as example mobile device, such as
Mobile phone, tablet computer, notebook computer or the wearable device (such as smartwatch or intelligent glasses that have wireless communication function
Deng) etc..The exemplary embodiment of the mobile device includes but is not limited to carry
Or the equipment of other operating systems.It is to be further understood that in some other embodiment of the disclosure, audio synthesizer and
Audio synthesis apparatus is also possible to non-portable device, such as desktop computer.
(2) candidate audio is sliced, and refers to a certain sentence in audio file, a certain segment.The disclosure is to " candidate audio is cut
The title of piece " with no restrictions, as long as expression be concept as above, such as candidate audio segment.
(3) the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist three
Kind relationship, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition,
Character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or" unless otherwise specified.And in disclosure reality
It applies in the description of example, the vocabulary such as " first ", " second ", is only used for distinguishing the purpose of description, be not understood to indicate or imply
Relative importance can not be interpreted as indication or suggestion sequence.
Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with this disclosure
Formula.On the contrary, they are only and as detailed in the attached claim, the consistent device of some aspects of the disclosure and side
The example of method.
Fig. 1 is a kind of flow chart of audio synthetic method shown according to an exemplary embodiment, the process of this method
It is described as follows:
S11: at least two candidate audios slice is obtained;Wherein, at least two candidate audios slice comes from least two
A audio file.
In the embodiments of the present disclosure, at least two audio files can be Chinese audio file, English audio file, allusion
Audio file, popular audio file, rock and roll audio file, easy listening audio file etc., or be other types of audio text
Part.In addition, the format of at least two audio files includes but is not limited to dynamic image expert's compression standard audio level 3
(Moving Picture Experts Group Audio Layer III, MP3), MP4, Windows Media Video form
The audio file of (Windows Media Video, WMV) etc. or other formats, herein with no restrictions.
In the embodiments of the present disclosure, the mode for obtaining at least two candidate audios slice includes but is not limited to following three kinds of sides
Formula: example one receives producer's input.In this example, it is sharp in advance to can be producer at least two candidate audios slice
With Software for producing Manual interception, to the more demanding of producer.Example two is obtained from other electronic equipments, such as from cloud
End obtains.Example three obtains at least two audio file;Multiplicity is extracted from least two audio file to exceed
At least two candidate audio slice of default multiplicity.
In the following, the process for obtaining at least two candidate audios slice in example three will be discussed in detail in conjunction with specific example.Its
In, the mode for obtaining at least two audio files can be to be imported from music application (Application, APP), or
What reception was inputted by producer.
At least two audio files are analyzed first, obtain the multiplicity changing rule of at least two audio files, Jin Ergen
Determine that multiplicity exceeds default multiplicity at least two audio files according to the multiplicity changing rule of at least two audio files
Segment.Since audio file includes original singer part and full band section, then the multiplicity of audio file can refer to original singer portion
The multiplicity of the voice divided or the multiplicity of full band section tune.Specifically, the multiplicity of audio file can be same sentence song
Word repeat or repeating for same tune or repeating for same segment.
Multiplicity with audio file is the same sentence lyrics when repeating, and the multiplicity for calculating the same sentence lyrics can be with
It is to count the total degree that same song sentence occurs in an audio file.Then by the multiplicity of the same sentence lyrics and default multiplicity
It is compared, determines that multiplicity exceeds at least two lyrics of default multiplicity.Wherein, default multiplicity can be producer
It is preset, such as multiplicity can be number of repetition 2,3,4 or for other values.
Assuming that at least two audio files are audio file A, audio file B and audio file C, and to extract audio file
Multiplicity exceeds the candidate audio slice of default multiplicity in A, and default multiplicity is for 3.The weight of audio file A is analyzed first
Then multiple changing rule determines that the multiplicity of a lyrics in audio file A is 3 times according to repetition changing rule, another sentence
The multiplicity of the lyrics is 4 times, is all larger than default multiplicity, then this two lyrics, audio file can be extracted from audio file A
The method of determination of B and audio file C are with audio file A, and details are not described herein.Herein, multiplicity will be extracted from audio file A
Candidate audio slice beyond default multiplicity is expressed as A1And A2, multiplicity is extracted from audio file B exceeds default multiplicity
Candidate audio slice be expressed as B1And B2, the candidate audio slice that multiplicity exceeds default multiplicity is extracted from audio file C
Indicate C1And C2。
In the embodiments of the present disclosure, the candidate that multiplicity is read beyond default repetition in determining at least two audio files
When audio is sliced, initial time and the end time of candidate audio slice can also be determined, then according to each candidate audio
The initial time of slice and end time extract each candidate audio slice, obtain at least two candidate audios slice.
By way of at least two audio candidate of the acquisition slice that example three provides, cut manually without producer oneself
Point, it can be improved the efficiency of audio synthesis, while requirement of the reduction to the audio signal processing technique of producer improves the disclosure and mention
For the scope of application of technical solution.
When carrying out audio splicing, what producer was often more concerned about is the tail of former and later two candidate audios slice to be spliced
It is smooth whether portion and stem can splice, to guarantee continuity.Therefore, in the embodiments of the present disclosure, waited in acquisition at least two
After selecting audio to be sliced, step S12 is executed: obtaining first of each candidate audio slice in at least two candidate audios slice
Son slice and the second son slice;Wherein, the first son slice is the pre- since initial time of corresponding candidate audio slice
If the slice of time span, the second son slice is the preset time before the finish time of corresponding candidate audio slice
The slice of length.
In the embodiments of the present disclosure, predetermined time period is to can be 8 seconds (s), 10s or 15s, or is other values.With
For predetermined time period is 15s, then extract at least two candidate audios slice the preceding 15s of each candidate audio slice and after
15s.Continue to continue to use the example above, the candidate audio of extraction is sliced A1First son slice and second son slice be expressed as A11、
A12, the candidate audio of extraction is sliced A2First son slice and second son slice be expressed as A21And A22... ... by the time of extraction
Audio is selected to be sliced C2First son slice and second son slice be expressed as C21And C22。
In the embodiments of the present disclosure, the first son slice is identical with the second son slice time span, can be avoided factor slice
Length is inconsistent to interference brought by the corresponding feature vector of sub- slice is subsequently generated, and calculates similarity so as to improve
Accuracy, and then improve the accuracy of splicing.
S13: it is sliced according to the first son slice of each candidate audio slice and the second son, calculates described at least two
The similarity between every two candidate audio slice in candidate audio slice from different audio files.
Fig. 2 is referred to, for the specific implementation process of step S13, may include steps of:
S131: the first son slice of each candidate audio slice and the second son slice are input to feature extraction mould
Type obtains the first subcharacter vector sum corresponding with the first son slice and described second of each candidate audio slice
Son is sliced corresponding second subcharacter vector;
S132: first subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least
Two feature vector set;Each feature vector set corresponds at least two sound in at least two feature vectors set
An audio file in frequency file;
S133: the distance between every two feature vector set in at least two feature vectors set is calculated, is obtained
At least one distance set.
It in the embodiments of the present disclosure, can be by free music archive (Free Music Archive, FMA) database conduct
Training set extracts the audio frequency characteristics of each audio file in training set, including but not limited to audio power, audio rhythm, audio
Then the audio frequency characteristics extracted as training sample and are carried out convolutional neural networks by style, beat, emotion etc.
(Convolutional Neural Networks, CNN) training, the convolutional neural networks after being trained, train herein after
Convolutional neural networks are exactly Feature Selection Model mentioned above.In the embodiments of the present disclosure, being trained to convolutional neural networks can
With include 4 layers of convolutional layer, maximum pond layer (Max Pooling), (such as shot and long term remembers (Long Short Term to recurrence layer
Memory, LSTM)) and full articulamentum.Wherein, 4 layers of convolutional layer and maximum pond layer are used to extract the local feature of audio frequency characteristics
Sequence, and LSTM layers are input to, the sequence signature of local feature in time is further extracted, this layer output is 256 features
Then output exports 64 features via full articulamentum.
Wherein, FMA database be so far, it is largest in music style taxonomy database, mark it is more perfect
One library.Comprising effective audio fragment 50,000,16 general styles, such as Blues, allusion, rural area, electronics, folk rhyme, jazz,
Rock and roll, a Chinese musical telling etc..
Or in the embodiments of the present disclosure, each audio text in training set can be extracted using FMA database as training set
The audio frequency characteristics of part, then using the audio frequency characteristics extracted as training sample, tectonic model function f (x, w), wherein x is used for
The corresponding feature vector of audio frequency characteristics extracted is characterized, w characteristic feature parameter is parameter to be calculated, is determining feature w
Afterwards, the Feature Selection Model in the disclosure can be constructed.
Here it will be understood that in the embodiments of the present disclosure, being also possible to extract the first son of each candidate audio slice respectively
The feature subvector of each dimension of the feature subvector of each dimension of slice and the second son slice.To extract the first son slice
Each dimension feature subvector for, different weighted values is set for the feature subvector of each dimension, passes through setting
Different weighted values obtains the feature subvector that can characterize each dimensional characteristics of the first son slice.
Continue to continue to use the example above, is input to spy in the first son slice and the second son slice for being sliced each candidate audio
After sign extracts model, obtains the first subcharacter corresponding with the first son slice of each candidate audio slice and be sliced with the second son
Corresponding second subcharacter vector, the first son slice and the second son slice corresponding first that each candidate audio is sliced
Subcharacter vector sum the second subcharacter vector is expressed as: A11、A12... ... C21、C22。
In the embodiments of the present disclosure, calculate at least two feature vector set between every two feature vector set away from
From calculation it is identical, at this with the first eigenvector set and second feature vector at least two feature vector set
For set.
For the first eigenvector set and second feature vector set in at least two feature vectors set, institute
It is every in the first eigenvector set for stating the distance between first eigenvector set and the second feature vector set
The distance between each second subcharacter vector and described the in a first subcharacter vector and the second feature vector set
Each second subcharacter vector and each first subcharacter vector in the second feature vector set in one feature vector set
The distance between.
It was also mentioned when description obtains the son slice of each candidate audio slice, what producer was often more concerned about when splicing
It is the piece of the run-out of previous candidate audio slice and latter candidate audio slice in former and later two candidate audios slice to be spliced
Similarity between head, thus calculate between first eigenvector set and second feature vector set apart from when, be meter
Calculate each of first eigenvector set the second subcharacter of each of the first subcharacter vector and second feature vector set
In the distance between vector and each of first eigenvector set the second subcharacter vector and second feature vector set
The distance between each first subcharacter vector.
Continue to continue to use the example above, if being { A by first eigenvector set expression11、A12、A21、A22, second feature to
Duration set is expressed as { B11、B12、B21、B22, then first eigenvector set is first at a distance from second feature vector set
Subcharacter vector A11With the second subcharacter vector B12The distance between, the first subcharacter vector A11With the second subcharacter vector B22
The distance between, the first subcharacter vector A21With the second subcharacter vector B12The distance between, the first subcharacter vector A21With
Two subcharacter vector B22The distance between, the second subcharacter vector A12With the first subcharacter vector B11The distance between, the second son
Feature vector A12With the first subcharacter vector B21The distance between, the second subcharacter vector A22With the first subcharacter vector B11It
Between distance, the second subcharacter vector A22With the first subcharacter vector B21The distance between.
Due to the calculation of the distance between other every two feature vector set at least two feature vector set
With the distance between first eigenvector set and second feature vector set, details are not described herein.
Continue to continue to use the example above, with the first subcharacter vector A11With the second subcharacter vector B12The distance between for,
It specifically can be and calculate the first subcharacter vector A11With the second subcharacter vector B12Between Euclidean distance, that is, m-dimensional space
In actual distance or vector between two points natural length.The every two in calculating at least two feature vector set
After the distance between feature vector set, 3 distance sets are obtained, AB { A can be expressed as11B12、A11B22、A21B12、
A21B22、A12B11、A12B21、A22B11、A22B21};AC{A11C12、A11C22、A21C12、A21C22、A12C11、A12C21、A22C11、
A22C21};BC{B11C12、B11C22、B21C12、B21C22、B12C11、B12C21、B22C11、B22C21}。
Herein it should be understood that with A11B12For, A11B12It is smaller, then show that candidate audio is sliced A1First son slice with
Candidate audio is sliced B1Second son slice between similarity it is higher.
After distance in calculating at least two feature vector set between every two feature vector set, then execute
Step S14: will be at least two candidate audio slice according to the similarity between every two candidate audio slice
It is partly or entirely stitched together, obtains Composite tone file.
For the specific implementation process of step S14, including but not limited to following two implementation:
Mode one
To make the Composite tone file more personalized of production, diversification or more interest, producer can set in advance
Synthetic strategy is set, including but not limited to: the first candidate audio slice of setting Composite tone file sets Composite tone file most
It include how many a candidate audio slices, last bit candidate audio slice of setting Composite tone file etc., setting Composite tone text more
Splicing rule in part is A-B-C-C-B-A etc., or is other synthetic strategies.
In this case, it needs first to determine in the synthetic strategy of producer's setting either with or without the head of setting Composite tone file
Position candidate audio slice executes following steps if setting the first candidate audio slice of Composite tone file:
Step 1: first place candidate audio slice being sliced as current candidate, according at least one distance for calculating acquisition
Set, the smallest fisrt feature of the distance between determining second subcharacter vector of the second son slice being sliced with current candidate
Then the corresponding candidate audio slice of the first subcharacter vector is sliced by vector as second candidate audio;
Step 2: repeating step 1, determine that third position candidate audio is sliced.
…
Until having traversed at least two candidate audios slice, the candidate audio determined slice is stitched together.
Mode two
Fig. 3 is referred to, mode two may include steps of during specific implementation:
S141: minimum value is determined from least one described distance set;
S142: according to the minimum value, the Composite tone file is determined from least two candidate audio slice
The first candidate audio slice and second candidate audio slice;
S143: according to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least
Between first son slice of the subsequent candidate audio slice and the second son slice of second candidate audio slice
Similarity highest;
S144: by the subsequent candidate audio slice, second candidate audio slice and the first candidate sound
Frequency slice is stitched together, and obtains the Composite tone file.
In mode two, minimum value is determined from least one distance set first, is continued to continue to use the example above, is then
The minimum value determined from three distance sets AB, AC, BC, is A with minimum value11B12For, then by A11Corresponding candidate's sound
Frequency slice A1As the first candidate audio slice of Composite tone slice, candidate audio is sliced B2As Composite tone slice
Second candidate audio slice.
In this mode, after determining the first candidate audio slice and second candidate audio slice, it is determined that subsequent
Candidate audio slice.Wherein it is determined that subsequent candidate audio slice can specifically include following steps:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to
Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Default synthetic strategy herein is with aforesaid way one, and details are not described herein.
I.e. before determining subsequent candidate audio slice, the first candidate audio slice and second candidate sound are first determined whether
Whether frequency slice meets default synthetic strategy, when meeting default synthetic strategy, just continues to determine subsequent candidate audio slice.
It is that subsequent candidate audio slice is determined according to preset rules, which is that next bit is waited during specific implementation
The similarity highest between audio the first son slice being sliced and the second son slice of upper candidate audio slice is selected, that is,
The first eigenvector and the distance between the second subcharacter vector of first son slice are minimum.Such as second candidate audio is cut
Piece is sliced as current candidate audio, determines that the second son being sliced with current candidate audio is cut from least one distance set
The distance between the corresponding second subcharacter vector of piece minimum value, with B12C11For, then candidate audio is sliced C1As third
Position candidate audio slice.
During specific implementation, after determining third position candidate audio slice, determine that the 4th candidate audio is cut
Before piece, it is also necessary to judge whether candidate audio slice in third position meets default synthetic strategy, when meeting default synthetic strategy,
Just determine the 4th candidate audio slice.It is determined here that the mode of the 4th candidate audio slice is the same as determining third position candidate audio
Slice, the method for determination of same subsequent candidate audio slice is with the mode of third position candidate audio slice, herein no longer
It repeats.
In the embodiments of the present disclosure, determining that the first candidate audio is sliced, second candidate audio is sliced and subsequent
After candidate audio slice, then by the first candidate audio determined slice, second candidate audio slice and subsequent candidate sound
Frequency slice is stitched together in sequence, forms Composite tone file.
However during specific implementation, it is sliced since Composite tone file is related to multiple candidate audios, and multiple times
The volume for selecting audio to be sliced may have difference, thus low when will lead to high when the volume of Composite tone file, to user with
Carry out bad experience.That is avoids the occurrence of this, in the embodiments of the present disclosure, by subsequent candidate audio slice, the
Before two candidate audio slices and the first candidate audio slice are stitched together, it is also necessary to subsequent candidate audio is sliced,
Second candidate audio slice and the first candidate audio slice do following processing:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and
The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization
The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
In the embodiments of the present disclosure, to subsequent candidate audio slice, second candidate audio slice and the first candidate
When audio slice does normalized, the candidate audio slice, second candidate audio slice and first place of calculated for subsequent can be
The average value of the intensity of the volume of candidate audio slice, then by adjusting the mode of gain, subsequent candidate audio is sliced,
The volume intensity of second candidate audio slice and the first candidate audio slice is adjusted to target volume intensity, that is, is calculated
Volume intensity average value, after being normalized the first candidate audio slice, normalization after second candidate audio
The first candidate audio after normalization, is then sliced, after normalization by the subsequent candidate audio slice after slice and normalization
Second candidate audio slice and normalization after subsequent candidate audio slice be stitched together, obtain Composite tone text
Part.
In order to facilitate the understanding of those skilled in the art, disclosed technique scheme is shown below with reference to specific example.Refer to figure
4, at least two audio files are obtained from music libraries first, wherein each audio file includes at least two audio files
Former bent and accompaniment, former song i.e. original singer part mentioned above;After obtaining at least two audio files, then audio text is carried out
The cutting of part, obtain at least two candidate audios slice, be expressed as music music_1, music_2 ... music_n,
Wherein, each candidate audio slice can split into two parts at least two candidate audios slice, that is, original singer part and
Full band section, as shown in figure 4, original singer part utilizes text (TeXT, TXT) to indicate, full band section utilizes the tables such as m1_a, m1_b
Show;Then the similarity (specifically being indicated with distance) between at least two candidate audios slice is calculated;Then to calculating
Distance be ranked up, after to the distance-taxis calculated, then in conjunction with Composite tone file producer it is pre-set
Synthetic strategy (splicing logic strategy) determines candidate audio slice to be spliced, finally to the candidate audio determined be sliced into
Row normalized, the candidate audio slice after being normalized, by the candidate audio slice splicing after normalized one
It rises, obtains skewered song, that is, above-mentioned Composite tone file, wherein as shown in figure 4, the candidate audio after normalization is cut
Piece is indicated using m ... 1-m ... N.In the embodiments of the present disclosure, the skewered song of acquisition can be used as the material of song-video,
It can be as Karaoke (Karaoke, KTV) song.
Fig. 5 is a kind of audio synthesizer block diagram 500 shown according to an exemplary embodiment.Referring to Fig. 5, the device packet
The first acquisition module 501 is included, second obtains module 502, computing module 503, and third obtains module 504;
The first acquisition module 501 is configured as obtaining at least two candidate audios slice;Wherein, it described at least two waits
Select audio slice from least two audio files;
Second acquisition module 502 is configured as obtaining each candidate audio slice in at least two candidate audios slice
First son slice and second son slice;Wherein, the first son slice is opened for the slave initial time of corresponding candidate audio slice
The slice of the predetermined time period of beginning, the second son slice are described pre- before the finish time of corresponding candidate audio slice
If the slice of time span;
Computing module 503 is configured as the first son slice and the second son slice according to each candidate audio slice,
It calculates similar between the every two candidate audio slice in at least two candidate audios slice from different audio files
Degree;
Third acquisition module 504 is configured as described extremely according to the similarity general between every two candidate audio slice
It is stitched together some or all of in few two candidate audios slice, obtains Composite tone file.
Optionally, the first son slice according to each candidate audio slice is configured as in the computing module 503
With the second son slice, calculates the every two candidate audio from different audio files in at least two candidate audios slice and cut
When similarity between piece, it is specifically used for:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained
Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son
Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two
Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set
An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one
A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set
Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection
The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and
Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set
Levy the distance between vector.
Optionally, module 504 is obtained in the third to be configured as according between every two candidate audio slice
Similarity will be stitched together some or all of at least two candidate audio slice, when obtaining Composite tone file,
It is specifically used for:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice
Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after
First subcharacter vector of the first son slice of continuous candidate audio slice and the second son of second candidate audio slice
Similarity highest between second subcharacter vector of slice;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
It is stitched together, obtains the Composite tone file.
Optionally, module 504 is obtained in the third to be configured as determining subsequent candidate audio according to preset rules
When slice, it is specifically used for:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to
Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, module 504 is obtained in the third to be configured as the subsequent candidate audio slice, described second
Position candidate audio slice and the first candidate audio slice are stitched together, specific to use when obtaining the Composite tone file
In:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice
Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and
The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization
The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, the first acquisition module 501 is configured as when obtaining at least two candidate audios slice, specific to use
In:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file
Frequency is sliced.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in the reality of this method
It applies in example and is described in detail, will be not set forth in detail herein.
Fig. 6 is a kind of block diagram 600 of audio synthesis apparatus shown according to an exemplary embodiment.Including processor 601
With memory 602;
Memory 602 can be used for storing computer executable program code, and the executable program code includes instruction.
Processor 601 is stored in the instruction of memory 602 by operation, thereby executing the various function application of audio synthesis apparatus 600
And data processing.Memory 602 may include storing program area and storage data area.Wherein, storing program area can store behaviour
Make system, application program needed at least one function (such as audio playing function) etc..Storage data area can store audio conjunction
Data (such as audio file, candidate audio slice etc.) created in 600 use process of forming apparatus etc..Wherein, memory 602
May include high-speed random access memory, can also include nonvolatile memory, a for example, at least disk memory,
Flush memory device, generic flash memory (universal flash storage, UFS) etc..
In the embodiments of the present disclosure, audio synthesis apparatus 600 can also include some peripheral equipments, such as external memory
Interface 603, audio-frequency module 604 (including loudspeaker, receiver, microphone, earphone interface etc. are not shown in the figure), sensor module
605, key 606 and display screen 607 etc..It is understood that the structure of embodiment of the present disclosure signal is not constituted to audio
The specific restriction of synthesis device 600.In other embodiments of the disclosure, audio synthesis apparatus 600 may include than illustrating more
More or less component perhaps combines certain components and perhaps splits certain components or different component layouts.The portion of diagram
Part can realize display screen for showing image, video etc. with hardware, the combination of software or software and hardware.In some embodiments
In, audio synthesis apparatus 600 may include 1 or N number of display screen 607, and N is the positive integer greater than 1.In addition, audio synthesis is set
Standby 600 can realize audio by audio-frequency module 604 (loudspeaker, receiver, microphone, earphone interface) and processor 601
Function.Such as the broadcasting of Composite tone file, recording etc..Audio synthesis apparatus 600 can receive key-press input, generation and audio
The related key signals input of the user setting and function control of synthesis device 600.
Although Fig. 6 is not shown, audio synthesis apparatus 600 can also include camera, such as front camera, postposition image
Head;It can also include motor, for generating vibration prompt (such as calling vibration prompt);It can also include that indicator such as indicates
Lamp, is used to indicate charged state, and electric quantity change can be used for instruction message, missed call, notice etc..
In the exemplary embodiment, a kind of storage medium including instruction is additionally provided, the memory for example including instruction,
Above-metioned instruction can be executed by the processor of equipment 600 to complete the above method.Optionally, storage medium can be non-transitory meter
Calculation machine readable storage medium storing program for executing, such as the non-transitorycomputer readable storage medium can be ROM, random access memory
(RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following
Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.
Claims (10)
1. a kind of audio synthetic method characterized by comprising
Obtain at least two candidate audios slice;Wherein, at least two candidate audios slice is from least two audios text
Part;
Obtain the first son slice and the second son slice of each candidate audio slice in at least two candidate audios slice;Its
In, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio is described
The slice of the predetermined time period before the finish time that second son slice is sliced for corresponding candidate audio;
According to the first son slice and the second son slice of each candidate audio slice, at least two candidate audio is calculated
The similarity between every two candidate audio slice in slice from different audio files;
According to the every two candidate audio slice between similarity by least two candidate audio slice in part
Or be all stitched together, obtain Composite tone file.
2. the method according to claim 1, wherein being sliced according to the first son of each candidate audio slice
With the second son slice, calculates the every two candidate audio from different audio files in at least two candidate audios slice and cut
Similarity between piece, comprising:
The first son slice that each candidate audio is sliced is sliced with the second son is input to Feature Selection Model, described in acquisition
The first subcharacter vector sum corresponding with the first son slice of each candidate audio slice is corresponding with the described second sub- slice
The second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two features
Vector set;Each feature vector set corresponds at least two audio file in at least two feature vectors set
An audio file;
Calculate the distance between every two feature vector set in at least two feature vectors set, obtain at least one away from
From set.
3. according to the method described in claim 2, it is characterized in that, for first in at least two feature vectors set
Feature vector set and second feature vector set, between the first eigenvector set and the second feature vector set
Distance be each in each first subcharacter vector and the second feature vector set in the first eigenvector set
Each second subcharacter vector and described second in the distance between second subcharacter vector and the first eigenvector set
The distance between each first subcharacter vector in feature vector set.
4. according to the method in claim 2 or 3, which is characterized in that according between every two candidate audio slice
Similarity will be stitched together some or all of at least two candidate audio slice, obtain Composite tone file, packet
It includes:
Minimum value is determined from least one described distance set;
According to the minimum value, determine that the first place of the Composite tone file is candidate from least two candidate audio slice
Audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least described subsequent
The the first subcharacter vector of the first son slice and the second son of second candidate audio slice of candidate audio slice are sliced
The distance between the second subcharacter vector minimum;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice splicing
Together, the Composite tone file is obtained.
5. according to the method described in claim 4, it is characterized in that, determining that subsequent candidate audio is cut according to preset rules
Piece, comprising:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to default
Rule determines the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
6. according to the method described in claim 5, it is characterized in that, by the subsequent candidate audio slice, the second
Candidate audio slice and the first candidate audio slice are stitched together, and obtain the Composite tone file, comprising:
The subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice are done and returned
One change processing, the second candidate audio slice and normalizing after subsequent candidate audio slice, normalization after being normalized
The first candidate audio slice after change;
By after the normalization subsequent candidate audio slice, second candidate audio slice and described return after the normalization
The first candidate audio slice after one change is stitched together, and obtains the Composite tone file.
7. method according to claim 1-6, which is characterized in that obtain at least two candidate audios slice, packet
It includes:
Obtain at least two audio file;
Multiplicity is extracted from least two audio file to cut beyond at least two candidate audio of default multiplicity
Piece.
8. a kind of audio synthesizer characterized by comprising
First obtains module, for obtaining at least two candidate audios slice;Wherein, at least two candidate audios slice comes
From at least two audio files;
Second obtains module, and the first son for obtaining each candidate audio slice in at least two candidate audios slice is cut
Piece and the second son slice;Wherein, the first son slice be corresponding candidate audio slice since initial time it is default when
Between length slice, the second son slice is the predetermined time period before the finish time of corresponding candidate audio slice
Slice;
Computing module, for being sliced according to the first sub- slice and the second son of each candidate audio slice, calculating is described extremely
The similarity between every two candidate audio slice in few two candidate audios slice from different audio files;
Third obtains module, for candidate by described at least two according to the similarity between every two candidate audio slice
It is stitched together some or all of in audio slice, obtains Composite tone file.
9. a kind of audio synthesis apparatus characterized by comprising
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is configured to executing described instruction, to realize such as side of any of claims 1-7
Method.
10. a kind of storage medium, which is characterized in that when the instruction in the storage medium is held by the processor of audio synthesis apparatus
When row, so that the audio synthesis apparatus is able to carry out such as method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579288.5A CN110400559B (en) | 2019-06-28 | 2019-06-28 | Audio synthesis method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579288.5A CN110400559B (en) | 2019-06-28 | 2019-06-28 | Audio synthesis method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110400559A true CN110400559A (en) | 2019-11-01 |
CN110400559B CN110400559B (en) | 2020-09-29 |
Family
ID=68323643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910579288.5A Active CN110400559B (en) | 2019-06-28 | 2019-06-28 | Audio synthesis method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110400559B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112382310A (en) * | 2020-11-12 | 2021-02-19 | 北京猿力未来科技有限公司 | Human voice audio recording method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090048841A1 (en) * | 2007-08-14 | 2009-02-19 | Nuance Communications, Inc. | Synthesis by Generation and Concatenation of Multi-Form Segments |
CN102024033A (en) * | 2010-12-01 | 2011-04-20 | 北京邮电大学 | Method for automatically detecting audio templates and chaptering videos |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN108419123A (en) * | 2018-03-28 | 2018-08-17 | 广州市创新互联网教育研究院 | A kind of virtual sliced sheet method of instructional video |
CN108831424A (en) * | 2018-06-15 | 2018-11-16 | 广州酷狗计算机科技有限公司 | Audio splicing method, apparatus and storage medium |
-
2019
- 2019-06-28 CN CN201910579288.5A patent/CN110400559B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090048841A1 (en) * | 2007-08-14 | 2009-02-19 | Nuance Communications, Inc. | Synthesis by Generation and Concatenation of Multi-Form Segments |
CN102024033A (en) * | 2010-12-01 | 2011-04-20 | 北京邮电大学 | Method for automatically detecting audio templates and chaptering videos |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN108419123A (en) * | 2018-03-28 | 2018-08-17 | 广州市创新互联网教育研究院 | A kind of virtual sliced sheet method of instructional video |
CN108831424A (en) * | 2018-06-15 | 2018-11-16 | 广州酷狗计算机科技有限公司 | Audio splicing method, apparatus and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112382310A (en) * | 2020-11-12 | 2021-02-19 | 北京猿力未来科技有限公司 | Human voice audio recording method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110400559B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101452696B (en) | Signal processing device, signal processing method and program | |
EP2816550B1 (en) | Audio signal analysis | |
CN104395953B (en) | The assessment of bat, chord and strong beat from music audio signal | |
US12027165B2 (en) | Computer program, server, terminal, and speech signal processing method | |
CN105788589A (en) | Audio data processing method and device | |
CN107666638B (en) | A kind of method and terminal device for estimating tape-delayed | |
CN1937462A (en) | Content-preference-score determining method, content playback apparatus, and content playback method | |
WO2015114216A2 (en) | Audio signal analysis | |
CN111292717B (en) | Speech synthesis method, speech synthesis device, storage medium and electronic equipment | |
CN110675886A (en) | Audio signal processing method, audio signal processing device, electronic equipment and storage medium | |
WO2012171583A1 (en) | Audio tracker apparatus | |
WO2019162703A1 (en) | Method of combining audio signals | |
CN110010159B (en) | Sound similarity determination method and device | |
CN113691909B (en) | Digital audio workstation with audio processing recommendations | |
US20240004606A1 (en) | Audio playback method and apparatus, computer readable storage medium, and electronic device | |
CN113936629B (en) | Music file processing method and device and music singing equipment | |
CN113781989A (en) | Audio animation playing and rhythm stuck point identification method and related device | |
WO2016102738A1 (en) | Similarity determination and selection of music | |
US20180173400A1 (en) | Media Content Selection | |
CN112037739B (en) | Data processing method and device and electronic equipment | |
CN106775567B (en) | Sound effect matching method and system | |
CN110400559A (en) | A kind of audio synthetic method, device and equipment | |
JP6288197B2 (en) | Evaluation apparatus and program | |
JP6102076B2 (en) | Evaluation device | |
CN107025902B (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |