CN110400559A - A kind of audio synthetic method, device and equipment - Google Patents

A kind of audio synthetic method, device and equipment Download PDF

Info

Publication number
CN110400559A
CN110400559A CN201910579288.5A CN201910579288A CN110400559A CN 110400559 A CN110400559 A CN 110400559A CN 201910579288 A CN201910579288 A CN 201910579288A CN 110400559 A CN110400559 A CN 110400559A
Authority
CN
China
Prior art keywords
slice
audio
candidate
candidate audio
son
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910579288.5A
Other languages
Chinese (zh)
Other versions
CN110400559B (en
Inventor
王晨辉
刘子岳
张文文
苑少飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910579288.5A priority Critical patent/CN110400559B/en
Publication of CN110400559A publication Critical patent/CN110400559A/en
Application granted granted Critical
Publication of CN110400559B publication Critical patent/CN110400559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • G10L13/07Concatenation rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The disclosure is directed to a kind of audio synthetic method, device and equipment, for improving the efficiency of production Composite tone file.The audio synthetic method includes: to obtain at least two candidate audios slice;Obtain the first son slice and the second son slice of each candidate audio slice in at least two candidate audios slice;Wherein, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio, the slice of the predetermined time period before the finish time that the second son slice is sliced for corresponding candidate audio;According to the first son slice and the second son slice of each candidate audio slice, the similarity between the every two candidate audio slice in at least two candidate audios slice from different audio files is calculated;It will be stitched together some or all of at least two candidate audio slice according to the similarity between every two candidate audio slice, obtain Composite tone file.

Description

A kind of audio synthetic method, device and equipment
Technical field
This disclosure relates to audio signal processing technique field more particularly to a kind of audio synthetic method, device and equipment.
Background technique
With the rapid development of various application software, the user of electronic equipment can be used application software and carry out various amusements Activity, to enrich the life of user.Such as user can be used software and carry out audio synthesis, that is, by different audio pieces Section is stitched together, and forms new, the distinctive audio file of tool.
Current existing audio synthetic method, is some audio composite softwares to be borrowed, from time by artificial mode mostly It selects interception in audio file to need the part synthesized, then the part synthesized will be needed to be stitched together, whole operation process is more multiple Miscellaneous, producing efficiency is low.
Summary of the invention
The disclosure provides a kind of audio synthetic method, device and equipment, at least to solve Composite tone in the related technology The low technical problem of the producing efficiency of file.The technical solution of the disclosure is as follows:
According to the first aspect of the embodiments of the present disclosure, a kind of audio synthetic method is provided, comprising:
Obtain at least two candidate audios slice;Wherein, at least two candidate audios slice comes from least two sounds Frequency file;
The the first son slice and the second son for obtaining each candidate audio slice in at least two candidate audios slice are cut Piece;Wherein, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio, The slice of the predetermined time period before the finish time that the second son slice is sliced for corresponding candidate audio;
According to the first son slice and the second son slice of each candidate audio slice, it is candidate to calculate described at least two The similarity between every two candidate audio slice in audio slice from different audio files;
It will be at least two candidate audio slice according to the similarity between every two candidate audio slice It is partly or entirely stitched together, obtains Composite tone file.
Optionally, it is sliced according to the first sub- slice of each candidate audio slice and the second son, calculating is described at least The similarity between every two candidate audio slice in two candidate audio slices from different audio files, comprising:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set Levy the distance between vector.
Optionally, at least two candidate audio is cut according to the similarity between every two candidate audio slice It is stitched together some or all of in piece, obtains Composite tone file, comprising:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after Similarity between first son slice of continuous candidate audio slice and the second son slice of second candidate audio slice Highest;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice It is stitched together, obtains the Composite tone file.
Optionally, according to preset rules, subsequent candidate audio slice is determined, comprising:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, by the subsequent candidate audio slice, second candidate audio slice and the first candidate Audio slice is stitched together, and obtains the Composite tone file, comprising:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, at least two candidate audios slice is obtained, comprising:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file Frequency is sliced.
According to the second aspect of an embodiment of the present disclosure, a kind of audio synthesizer is provided, comprising:
First obtains module, for obtaining at least two candidate audios slice;Wherein, at least two candidate audio is cut Piece comes from least two audio files;
Second obtains module, for obtaining first of each candidate audio slice in at least two candidate audios slice Son slice and the second son slice;Wherein, the first son slice is the pre- since initial time of corresponding candidate audio slice If the slice of time span, the second son slice is the preset time before the finish time of corresponding candidate audio slice The slice of length;
Computing module calculates institute for being sliced according to the first son slice and the second son of each candidate audio slice State the similarity between the every two candidate audio slice at least two candidate audios slice from different audio files;
Third obtains module, for according to the similarity between every two candidate audio slice by described at least two It is stitched together some or all of in candidate audio slice, obtains Composite tone file.
Optionally, it is cut in the computing module according to the first son slice and the second son of each candidate audio slice Piece calculates the phase between the every two candidate audio slice in at least two candidate audios slice from different audio files When seemingly spending, it is specifically used for:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set Levy the distance between vector.
Optionally, obtaining module in the third will be described according to the similarity between every two candidate audio slice It is stitched together some or all of at least two candidate audios slice, when obtaining Composite tone file, is specifically used for:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after First subcharacter vector of the first son slice of continuous candidate audio slice and the second son of second candidate audio slice Similarity highest between second subcharacter vector of slice;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice It is stitched together, obtains the Composite tone file.
Optionally, module is obtained according to preset rules, when determining subsequent candidate audio slice, specifically in the third For:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, module is obtained by the subsequent candidate audio slice, the second candidate audio in the third Slice and the first candidate audio slice are stitched together, and when obtaining the Composite tone file, are specifically used for:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, described first module is obtained when obtaining at least two candidate audios slice, be specifically used for:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file Frequency is sliced.
According to the third aspect of the disclosure, a kind of audio synthesis apparatus is provided, comprising:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is configured to executing described instruction, any one of such as first aspect or first aspect is realized Plant method involved in may designing.
According to the fourth aspect of the disclosure, a kind of storage medium is provided, for being stored as described in above-mentioned second aspect Computer software instructions used in audio synthesis apparatus described in audio synthesizer or the third aspect, and include for executing It is program designed by audio synthesizer in any one design of above-mentioned first aspect or first aspect.
According to the 5th of the disclosure the aspect, a kind of computer program product is provided, which calls by computer When execution, computer can be made to execute any one method designed of first aspect, first aspect.
The technical scheme provided by this disclosed embodiment at least act on behalf of it is following the utility model has the advantages that
It in the embodiments of the present disclosure, is according at least two candidate audios after obtaining at least two candidate audios slice The the first son slice and the second son slice of each candidate audio slice, calculate and come from least two candidate audios slice in slice Similarity between the every two candidate audio slice of different audio files, then according to calculate the similarity obtained will at least two It is stitched together some or all of in a candidate audio slice, to obtain Composite tone file.Without Composite tone file Producer manually divide, interception need composite part, then composite part will be needed to be stitched together again, so as to improve The producing efficiency of Composite tone file, and the requirement to reducing to producer, so that technical solution provided by the present disclosure has more Universality.
It further, is to be sliced according to the first son of each candidate audio slice and the second son slice in the embodiments of the present disclosure Come calculate at least two candidate audios slice in from different audio files every two candidate audio slice between similarity, It is continuous between the run-out of previous candidate audio slice to be spliced and the piece head of latter candidate audio slice so as to guarantee Property and continuity, and then can be improved the accuracy of splicing.Simultaneously as the length phase of the first son slice and the second son slice Together, it can be avoided because slice length is inconsistent to being subsequently generated interference brought by the corresponding feature vector of sub- slice, so as to It is enough to improve the accuracy for calculating similarity, and then the accuracy of splicing can be further increased.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure, do not constitute the improper restriction to the disclosure.
Fig. 1 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 2 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 3 is a kind of flow diagram of audio synthetic method shown according to an exemplary embodiment;
Fig. 4 is a kind of audio synthetic method schematic diagram shown according to an exemplary embodiment;
Fig. 5 is a kind of block diagram of audio synthesizer shown according to an exemplary embodiment;
Fig. 6 is a kind of block diagram of audio synthesis apparatus shown according to an exemplary embodiment.
Specific embodiment
To make those of ordinary skill in the art more fully understand the technical solution of the disclosure, below in conjunction with attached drawing, to this Technical solution in open embodiment is clearly and completely described.
Hereinafter, the part term in the embodiment of the present disclosure is explained, in order to those of ordinary skill in the art Understand.
(1) audio synthesizer and audio synthesis apparatus, can be portable device, as example mobile device, such as Mobile phone, tablet computer, notebook computer or the wearable device (such as smartwatch or intelligent glasses that have wireless communication function Deng) etc..The exemplary embodiment of the mobile device includes but is not limited to carry Or the equipment of other operating systems.It is to be further understood that in some other embodiment of the disclosure, audio synthesizer and Audio synthesis apparatus is also possible to non-portable device, such as desktop computer.
(2) candidate audio is sliced, and refers to a certain sentence in audio file, a certain segment.The disclosure is to " candidate audio is cut The title of piece " with no restrictions, as long as expression be concept as above, such as candidate audio segment.
(3) the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist three Kind relationship, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, Character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or" unless otherwise specified.And in disclosure reality It applies in the description of example, the vocabulary such as " first ", " second ", is only used for distinguishing the purpose of description, be not understood to indicate or imply Relative importance can not be interpreted as indication or suggestion sequence.
Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with this disclosure Formula.On the contrary, they are only and as detailed in the attached claim, the consistent device of some aspects of the disclosure and side The example of method.
Fig. 1 is a kind of flow chart of audio synthetic method shown according to an exemplary embodiment, the process of this method It is described as follows:
S11: at least two candidate audios slice is obtained;Wherein, at least two candidate audios slice comes from least two A audio file.
In the embodiments of the present disclosure, at least two audio files can be Chinese audio file, English audio file, allusion Audio file, popular audio file, rock and roll audio file, easy listening audio file etc., or be other types of audio text Part.In addition, the format of at least two audio files includes but is not limited to dynamic image expert's compression standard audio level 3 (Moving Picture Experts Group Audio Layer III, MP3), MP4, Windows Media Video form The audio file of (Windows Media Video, WMV) etc. or other formats, herein with no restrictions.
In the embodiments of the present disclosure, the mode for obtaining at least two candidate audios slice includes but is not limited to following three kinds of sides Formula: example one receives producer's input.In this example, it is sharp in advance to can be producer at least two candidate audios slice With Software for producing Manual interception, to the more demanding of producer.Example two is obtained from other electronic equipments, such as from cloud End obtains.Example three obtains at least two audio file;Multiplicity is extracted from least two audio file to exceed At least two candidate audio slice of default multiplicity.
In the following, the process for obtaining at least two candidate audios slice in example three will be discussed in detail in conjunction with specific example.Its In, the mode for obtaining at least two audio files can be to be imported from music application (Application, APP), or What reception was inputted by producer.
At least two audio files are analyzed first, obtain the multiplicity changing rule of at least two audio files, Jin Ergen Determine that multiplicity exceeds default multiplicity at least two audio files according to the multiplicity changing rule of at least two audio files Segment.Since audio file includes original singer part and full band section, then the multiplicity of audio file can refer to original singer portion The multiplicity of the voice divided or the multiplicity of full band section tune.Specifically, the multiplicity of audio file can be same sentence song Word repeat or repeating for same tune or repeating for same segment.
Multiplicity with audio file is the same sentence lyrics when repeating, and the multiplicity for calculating the same sentence lyrics can be with It is to count the total degree that same song sentence occurs in an audio file.Then by the multiplicity of the same sentence lyrics and default multiplicity It is compared, determines that multiplicity exceeds at least two lyrics of default multiplicity.Wherein, default multiplicity can be producer It is preset, such as multiplicity can be number of repetition 2,3,4 or for other values.
Assuming that at least two audio files are audio file A, audio file B and audio file C, and to extract audio file Multiplicity exceeds the candidate audio slice of default multiplicity in A, and default multiplicity is for 3.The weight of audio file A is analyzed first Then multiple changing rule determines that the multiplicity of a lyrics in audio file A is 3 times according to repetition changing rule, another sentence The multiplicity of the lyrics is 4 times, is all larger than default multiplicity, then this two lyrics, audio file can be extracted from audio file A The method of determination of B and audio file C are with audio file A, and details are not described herein.Herein, multiplicity will be extracted from audio file A Candidate audio slice beyond default multiplicity is expressed as A1And A2, multiplicity is extracted from audio file B exceeds default multiplicity Candidate audio slice be expressed as B1And B2, the candidate audio slice that multiplicity exceeds default multiplicity is extracted from audio file C Indicate C1And C2
In the embodiments of the present disclosure, the candidate that multiplicity is read beyond default repetition in determining at least two audio files When audio is sliced, initial time and the end time of candidate audio slice can also be determined, then according to each candidate audio The initial time of slice and end time extract each candidate audio slice, obtain at least two candidate audios slice.
By way of at least two audio candidate of the acquisition slice that example three provides, cut manually without producer oneself Point, it can be improved the efficiency of audio synthesis, while requirement of the reduction to the audio signal processing technique of producer improves the disclosure and mention For the scope of application of technical solution.
When carrying out audio splicing, what producer was often more concerned about is the tail of former and later two candidate audios slice to be spliced It is smooth whether portion and stem can splice, to guarantee continuity.Therefore, in the embodiments of the present disclosure, waited in acquisition at least two After selecting audio to be sliced, step S12 is executed: obtaining first of each candidate audio slice in at least two candidate audios slice Son slice and the second son slice;Wherein, the first son slice is the pre- since initial time of corresponding candidate audio slice If the slice of time span, the second son slice is the preset time before the finish time of corresponding candidate audio slice The slice of length.
In the embodiments of the present disclosure, predetermined time period is to can be 8 seconds (s), 10s or 15s, or is other values.With For predetermined time period is 15s, then extract at least two candidate audios slice the preceding 15s of each candidate audio slice and after 15s.Continue to continue to use the example above, the candidate audio of extraction is sliced A1First son slice and second son slice be expressed as A11、 A12, the candidate audio of extraction is sliced A2First son slice and second son slice be expressed as A21And A22... ... by the time of extraction Audio is selected to be sliced C2First son slice and second son slice be expressed as C21And C22
In the embodiments of the present disclosure, the first son slice is identical with the second son slice time span, can be avoided factor slice Length is inconsistent to interference brought by the corresponding feature vector of sub- slice is subsequently generated, and calculates similarity so as to improve Accuracy, and then improve the accuracy of splicing.
S13: it is sliced according to the first son slice of each candidate audio slice and the second son, calculates described at least two The similarity between every two candidate audio slice in candidate audio slice from different audio files.
Fig. 2 is referred to, for the specific implementation process of step S13, may include steps of:
S131: the first son slice of each candidate audio slice and the second son slice are input to feature extraction mould Type obtains the first subcharacter vector sum corresponding with the first son slice and described second of each candidate audio slice Son is sliced corresponding second subcharacter vector;
S132: first subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least Two feature vector set;Each feature vector set corresponds at least two sound in at least two feature vectors set An audio file in frequency file;
S133: the distance between every two feature vector set in at least two feature vectors set is calculated, is obtained At least one distance set.
It in the embodiments of the present disclosure, can be by free music archive (Free Music Archive, FMA) database conduct Training set extracts the audio frequency characteristics of each audio file in training set, including but not limited to audio power, audio rhythm, audio Then the audio frequency characteristics extracted as training sample and are carried out convolutional neural networks by style, beat, emotion etc. (Convolutional Neural Networks, CNN) training, the convolutional neural networks after being trained, train herein after Convolutional neural networks are exactly Feature Selection Model mentioned above.In the embodiments of the present disclosure, being trained to convolutional neural networks can With include 4 layers of convolutional layer, maximum pond layer (Max Pooling), (such as shot and long term remembers (Long Short Term to recurrence layer Memory, LSTM)) and full articulamentum.Wherein, 4 layers of convolutional layer and maximum pond layer are used to extract the local feature of audio frequency characteristics Sequence, and LSTM layers are input to, the sequence signature of local feature in time is further extracted, this layer output is 256 features Then output exports 64 features via full articulamentum.
Wherein, FMA database be so far, it is largest in music style taxonomy database, mark it is more perfect One library.Comprising effective audio fragment 50,000,16 general styles, such as Blues, allusion, rural area, electronics, folk rhyme, jazz, Rock and roll, a Chinese musical telling etc..
Or in the embodiments of the present disclosure, each audio text in training set can be extracted using FMA database as training set The audio frequency characteristics of part, then using the audio frequency characteristics extracted as training sample, tectonic model function f (x, w), wherein x is used for The corresponding feature vector of audio frequency characteristics extracted is characterized, w characteristic feature parameter is parameter to be calculated, is determining feature w Afterwards, the Feature Selection Model in the disclosure can be constructed.
Here it will be understood that in the embodiments of the present disclosure, being also possible to extract the first son of each candidate audio slice respectively The feature subvector of each dimension of the feature subvector of each dimension of slice and the second son slice.To extract the first son slice Each dimension feature subvector for, different weighted values is set for the feature subvector of each dimension, passes through setting Different weighted values obtains the feature subvector that can characterize each dimensional characteristics of the first son slice.
Continue to continue to use the example above, is input to spy in the first son slice and the second son slice for being sliced each candidate audio After sign extracts model, obtains the first subcharacter corresponding with the first son slice of each candidate audio slice and be sliced with the second son Corresponding second subcharacter vector, the first son slice and the second son slice corresponding first that each candidate audio is sliced Subcharacter vector sum the second subcharacter vector is expressed as: A11、A12... ... C21、C22
In the embodiments of the present disclosure, calculate at least two feature vector set between every two feature vector set away from From calculation it is identical, at this with the first eigenvector set and second feature vector at least two feature vector set For set.
For the first eigenvector set and second feature vector set in at least two feature vectors set, institute It is every in the first eigenvector set for stating the distance between first eigenvector set and the second feature vector set The distance between each second subcharacter vector and described the in a first subcharacter vector and the second feature vector set Each second subcharacter vector and each first subcharacter vector in the second feature vector set in one feature vector set The distance between.
It was also mentioned when description obtains the son slice of each candidate audio slice, what producer was often more concerned about when splicing It is the piece of the run-out of previous candidate audio slice and latter candidate audio slice in former and later two candidate audios slice to be spliced Similarity between head, thus calculate between first eigenvector set and second feature vector set apart from when, be meter Calculate each of first eigenvector set the second subcharacter of each of the first subcharacter vector and second feature vector set In the distance between vector and each of first eigenvector set the second subcharacter vector and second feature vector set The distance between each first subcharacter vector.
Continue to continue to use the example above, if being { A by first eigenvector set expression11、A12、A21、A22, second feature to Duration set is expressed as { B11、B12、B21、B22, then first eigenvector set is first at a distance from second feature vector set Subcharacter vector A11With the second subcharacter vector B12The distance between, the first subcharacter vector A11With the second subcharacter vector B22 The distance between, the first subcharacter vector A21With the second subcharacter vector B12The distance between, the first subcharacter vector A21With Two subcharacter vector B22The distance between, the second subcharacter vector A12With the first subcharacter vector B11The distance between, the second son Feature vector A12With the first subcharacter vector B21The distance between, the second subcharacter vector A22With the first subcharacter vector B11It Between distance, the second subcharacter vector A22With the first subcharacter vector B21The distance between.
Due to the calculation of the distance between other every two feature vector set at least two feature vector set With the distance between first eigenvector set and second feature vector set, details are not described herein.
Continue to continue to use the example above, with the first subcharacter vector A11With the second subcharacter vector B12The distance between for, It specifically can be and calculate the first subcharacter vector A11With the second subcharacter vector B12Between Euclidean distance, that is, m-dimensional space In actual distance or vector between two points natural length.The every two in calculating at least two feature vector set After the distance between feature vector set, 3 distance sets are obtained, AB { A can be expressed as11B12、A11B22、A21B12、 A21B22、A12B11、A12B21、A22B11、A22B21};AC{A11C12、A11C22、A21C12、A21C22、A12C11、A12C21、A22C11、 A22C21};BC{B11C12、B11C22、B21C12、B21C22、B12C11、B12C21、B22C11、B22C21}。
Herein it should be understood that with A11B12For, A11B12It is smaller, then show that candidate audio is sliced A1First son slice with Candidate audio is sliced B1Second son slice between similarity it is higher.
After distance in calculating at least two feature vector set between every two feature vector set, then execute Step S14: will be at least two candidate audio slice according to the similarity between every two candidate audio slice It is partly or entirely stitched together, obtains Composite tone file.
For the specific implementation process of step S14, including but not limited to following two implementation:
Mode one
To make the Composite tone file more personalized of production, diversification or more interest, producer can set in advance Synthetic strategy is set, including but not limited to: the first candidate audio slice of setting Composite tone file sets Composite tone file most It include how many a candidate audio slices, last bit candidate audio slice of setting Composite tone file etc., setting Composite tone text more Splicing rule in part is A-B-C-C-B-A etc., or is other synthetic strategies.
In this case, it needs first to determine in the synthetic strategy of producer's setting either with or without the head of setting Composite tone file Position candidate audio slice executes following steps if setting the first candidate audio slice of Composite tone file:
Step 1: first place candidate audio slice being sliced as current candidate, according at least one distance for calculating acquisition Set, the smallest fisrt feature of the distance between determining second subcharacter vector of the second son slice being sliced with current candidate Then the corresponding candidate audio slice of the first subcharacter vector is sliced by vector as second candidate audio;
Step 2: repeating step 1, determine that third position candidate audio is sliced.
Until having traversed at least two candidate audios slice, the candidate audio determined slice is stitched together.
Mode two
Fig. 3 is referred to, mode two may include steps of during specific implementation:
S141: minimum value is determined from least one described distance set;
S142: according to the minimum value, the Composite tone file is determined from least two candidate audio slice The first candidate audio slice and second candidate audio slice;
S143: according to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least Between first son slice of the subsequent candidate audio slice and the second son slice of second candidate audio slice Similarity highest;
S144: by the subsequent candidate audio slice, second candidate audio slice and the first candidate sound Frequency slice is stitched together, and obtains the Composite tone file.
In mode two, minimum value is determined from least one distance set first, is continued to continue to use the example above, is then The minimum value determined from three distance sets AB, AC, BC, is A with minimum value11B12For, then by A11Corresponding candidate's sound Frequency slice A1As the first candidate audio slice of Composite tone slice, candidate audio is sliced B2As Composite tone slice Second candidate audio slice.
In this mode, after determining the first candidate audio slice and second candidate audio slice, it is determined that subsequent Candidate audio slice.Wherein it is determined that subsequent candidate audio slice can specifically include following steps:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Default synthetic strategy herein is with aforesaid way one, and details are not described herein.
I.e. before determining subsequent candidate audio slice, the first candidate audio slice and second candidate sound are first determined whether Whether frequency slice meets default synthetic strategy, when meeting default synthetic strategy, just continues to determine subsequent candidate audio slice. It is that subsequent candidate audio slice is determined according to preset rules, which is that next bit is waited during specific implementation The similarity highest between audio the first son slice being sliced and the second son slice of upper candidate audio slice is selected, that is, The first eigenvector and the distance between the second subcharacter vector of first son slice are minimum.Such as second candidate audio is cut Piece is sliced as current candidate audio, determines that the second son being sliced with current candidate audio is cut from least one distance set The distance between the corresponding second subcharacter vector of piece minimum value, with B12C11For, then candidate audio is sliced C1As third Position candidate audio slice.
During specific implementation, after determining third position candidate audio slice, determine that the 4th candidate audio is cut Before piece, it is also necessary to judge whether candidate audio slice in third position meets default synthetic strategy, when meeting default synthetic strategy, Just determine the 4th candidate audio slice.It is determined here that the mode of the 4th candidate audio slice is the same as determining third position candidate audio Slice, the method for determination of same subsequent candidate audio slice is with the mode of third position candidate audio slice, herein no longer It repeats.
In the embodiments of the present disclosure, determining that the first candidate audio is sliced, second candidate audio is sliced and subsequent After candidate audio slice, then by the first candidate audio determined slice, second candidate audio slice and subsequent candidate sound Frequency slice is stitched together in sequence, forms Composite tone file.
However during specific implementation, it is sliced since Composite tone file is related to multiple candidate audios, and multiple times The volume for selecting audio to be sliced may have difference, thus low when will lead to high when the volume of Composite tone file, to user with Carry out bad experience.That is avoids the occurrence of this, in the embodiments of the present disclosure, by subsequent candidate audio slice, the Before two candidate audio slices and the first candidate audio slice are stitched together, it is also necessary to subsequent candidate audio is sliced, Second candidate audio slice and the first candidate audio slice do following processing:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
In the embodiments of the present disclosure, to subsequent candidate audio slice, second candidate audio slice and the first candidate When audio slice does normalized, the candidate audio slice, second candidate audio slice and first place of calculated for subsequent can be The average value of the intensity of the volume of candidate audio slice, then by adjusting the mode of gain, subsequent candidate audio is sliced, The volume intensity of second candidate audio slice and the first candidate audio slice is adjusted to target volume intensity, that is, is calculated Volume intensity average value, after being normalized the first candidate audio slice, normalization after second candidate audio The first candidate audio after normalization, is then sliced, after normalization by the subsequent candidate audio slice after slice and normalization Second candidate audio slice and normalization after subsequent candidate audio slice be stitched together, obtain Composite tone text Part.
In order to facilitate the understanding of those skilled in the art, disclosed technique scheme is shown below with reference to specific example.Refer to figure 4, at least two audio files are obtained from music libraries first, wherein each audio file includes at least two audio files Former bent and accompaniment, former song i.e. original singer part mentioned above;After obtaining at least two audio files, then audio text is carried out The cutting of part, obtain at least two candidate audios slice, be expressed as music music_1, music_2 ... music_n, Wherein, each candidate audio slice can split into two parts at least two candidate audios slice, that is, original singer part and Full band section, as shown in figure 4, original singer part utilizes text (TeXT, TXT) to indicate, full band section utilizes the tables such as m1_a, m1_b Show;Then the similarity (specifically being indicated with distance) between at least two candidate audios slice is calculated;Then to calculating Distance be ranked up, after to the distance-taxis calculated, then in conjunction with Composite tone file producer it is pre-set Synthetic strategy (splicing logic strategy) determines candidate audio slice to be spliced, finally to the candidate audio determined be sliced into Row normalized, the candidate audio slice after being normalized, by the candidate audio slice splicing after normalized one It rises, obtains skewered song, that is, above-mentioned Composite tone file, wherein as shown in figure 4, the candidate audio after normalization is cut Piece is indicated using m ... 1-m ... N.In the embodiments of the present disclosure, the skewered song of acquisition can be used as the material of song-video, It can be as Karaoke (Karaoke, KTV) song.
Fig. 5 is a kind of audio synthesizer block diagram 500 shown according to an exemplary embodiment.Referring to Fig. 5, the device packet The first acquisition module 501 is included, second obtains module 502, computing module 503, and third obtains module 504;
The first acquisition module 501 is configured as obtaining at least two candidate audios slice;Wherein, it described at least two waits Select audio slice from least two audio files;
Second acquisition module 502 is configured as obtaining each candidate audio slice in at least two candidate audios slice First son slice and second son slice;Wherein, the first son slice is opened for the slave initial time of corresponding candidate audio slice The slice of the predetermined time period of beginning, the second son slice are described pre- before the finish time of corresponding candidate audio slice If the slice of time span;
Computing module 503 is configured as the first son slice and the second son slice according to each candidate audio slice, It calculates similar between the every two candidate audio slice in at least two candidate audios slice from different audio files Degree;
Third acquisition module 504 is configured as described extremely according to the similarity general between every two candidate audio slice It is stitched together some or all of in few two candidate audios slice, obtains Composite tone file.
Optionally, the first son slice according to each candidate audio slice is configured as in the computing module 503 With the second son slice, calculates the every two candidate audio from different audio files in at least two candidate audios slice and cut When similarity between piece, it is specifically used for:
First son slice of each candidate audio slice and the second son slice are input to Feature Selection Model, obtained Each candidate audio slice is sliced with the described first sub- corresponding first subcharacter vector sum of slice and second son Corresponding second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two Feature vector set;Each feature vector set corresponds at least two audios text in at least two feature vectors set An audio file in part;
The distance between every two feature vector set in at least two feature vectors set is calculated, obtains at least one A distance set.
Optionally, for the first eigenvector set and second feature vector in at least two feature vectors set Set, the distance between the first eigenvector set and the second feature vector set are the first eigenvector collection The distance between each first subcharacter vector and each second subcharacter vector in the second feature vector set in conjunction, and Each second subcharacter vector and the first son each in the second feature vector set are special in the first eigenvector set Levy the distance between vector.
Optionally, module 504 is obtained in the third to be configured as according between every two candidate audio slice Similarity will be stitched together some or all of at least two candidate audio slice, when obtaining Composite tone file, It is specifically used for:
Minimum value is determined from least one described distance set;
According to the minimum value, the first place of the Composite tone file is determined from least two candidate audio slice Candidate audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least it is described after First subcharacter vector of the first son slice of continuous candidate audio slice and the second son of second candidate audio slice Similarity highest between second subcharacter vector of slice;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice It is stitched together, obtains the Composite tone file.
Optionally, module 504 is obtained in the third to be configured as determining subsequent candidate audio according to preset rules When slice, it is specifically used for:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to Preset rules determine the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
Optionally, module 504 is obtained in the third to be configured as the subsequent candidate audio slice, described second Position candidate audio slice and the first candidate audio slice are stitched together, specific to use when obtaining the Composite tone file In:
To the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice Do normalized, second candidate audio slice after subsequent candidate audio after being normalized slice, normalization and The first candidate audio slice after normalization;
Subsequent candidate audio after the normalization is sliced, second candidate audio slice and institute after the normalization The first candidate audio slice after stating normalization is stitched together, and obtains the Composite tone file.
Optionally, the first acquisition module 501 is configured as when obtaining at least two candidate audios slice, specific to use In:
Obtain at least two audio file;
The described at least two candidate sounds that multiplicity exceeds default multiplicity are extracted from least two audio file Frequency is sliced.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in the reality of this method It applies in example and is described in detail, will be not set forth in detail herein.
Fig. 6 is a kind of block diagram 600 of audio synthesis apparatus shown according to an exemplary embodiment.Including processor 601 With memory 602;
Memory 602 can be used for storing computer executable program code, and the executable program code includes instruction. Processor 601 is stored in the instruction of memory 602 by operation, thereby executing the various function application of audio synthesis apparatus 600 And data processing.Memory 602 may include storing program area and storage data area.Wherein, storing program area can store behaviour Make system, application program needed at least one function (such as audio playing function) etc..Storage data area can store audio conjunction Data (such as audio file, candidate audio slice etc.) created in 600 use process of forming apparatus etc..Wherein, memory 602 May include high-speed random access memory, can also include nonvolatile memory, a for example, at least disk memory, Flush memory device, generic flash memory (universal flash storage, UFS) etc..
In the embodiments of the present disclosure, audio synthesis apparatus 600 can also include some peripheral equipments, such as external memory Interface 603, audio-frequency module 604 (including loudspeaker, receiver, microphone, earphone interface etc. are not shown in the figure), sensor module 605, key 606 and display screen 607 etc..It is understood that the structure of embodiment of the present disclosure signal is not constituted to audio The specific restriction of synthesis device 600.In other embodiments of the disclosure, audio synthesis apparatus 600 may include than illustrating more More or less component perhaps combines certain components and perhaps splits certain components or different component layouts.The portion of diagram Part can realize display screen for showing image, video etc. with hardware, the combination of software or software and hardware.In some embodiments In, audio synthesis apparatus 600 may include 1 or N number of display screen 607, and N is the positive integer greater than 1.In addition, audio synthesis is set Standby 600 can realize audio by audio-frequency module 604 (loudspeaker, receiver, microphone, earphone interface) and processor 601 Function.Such as the broadcasting of Composite tone file, recording etc..Audio synthesis apparatus 600 can receive key-press input, generation and audio The related key signals input of the user setting and function control of synthesis device 600.
Although Fig. 6 is not shown, audio synthesis apparatus 600 can also include camera, such as front camera, postposition image Head;It can also include motor, for generating vibration prompt (such as calling vibration prompt);It can also include that indicator such as indicates Lamp, is used to indicate charged state, and electric quantity change can be used for instruction message, missed call, notice etc..
In the exemplary embodiment, a kind of storage medium including instruction is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by the processor of equipment 600 to complete the above method.Optionally, storage medium can be non-transitory meter Calculation machine readable storage medium storing program for executing, such as the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (10)

1. a kind of audio synthetic method characterized by comprising
Obtain at least two candidate audios slice;Wherein, at least two candidate audios slice is from least two audios text Part;
Obtain the first son slice and the second son slice of each candidate audio slice in at least two candidate audios slice;Its In, the slice for the predetermined time period since initial time that the first son slice is sliced for corresponding candidate audio is described The slice of the predetermined time period before the finish time that second son slice is sliced for corresponding candidate audio;
According to the first son slice and the second son slice of each candidate audio slice, at least two candidate audio is calculated The similarity between every two candidate audio slice in slice from different audio files;
According to the every two candidate audio slice between similarity by least two candidate audio slice in part Or be all stitched together, obtain Composite tone file.
2. the method according to claim 1, wherein being sliced according to the first son of each candidate audio slice With the second son slice, calculates the every two candidate audio from different audio files in at least two candidate audios slice and cut Similarity between piece, comprising:
The first son slice that each candidate audio is sliced is sliced with the second son is input to Feature Selection Model, described in acquisition The first subcharacter vector sum corresponding with the first son slice of each candidate audio slice is corresponding with the described second sub- slice The second subcharacter vector;
First subcharacter vector sum the second subcharacter vector of each candidate audio slice is classified as at least two features Vector set;Each feature vector set corresponds at least two audio file in at least two feature vectors set An audio file;
Calculate the distance between every two feature vector set in at least two feature vectors set, obtain at least one away from From set.
3. according to the method described in claim 2, it is characterized in that, for first in at least two feature vectors set Feature vector set and second feature vector set, between the first eigenvector set and the second feature vector set Distance be each in each first subcharacter vector and the second feature vector set in the first eigenvector set Each second subcharacter vector and described second in the distance between second subcharacter vector and the first eigenvector set The distance between each first subcharacter vector in feature vector set.
4. according to the method in claim 2 or 3, which is characterized in that according between every two candidate audio slice Similarity will be stitched together some or all of at least two candidate audio slice, obtain Composite tone file, packet It includes:
Minimum value is determined from least one described distance set;
According to the minimum value, determine that the first place of the Composite tone file is candidate from least two candidate audio slice Audio slice and second candidate audio slice;
According to preset rules, subsequent candidate audio slice is determined;Wherein, the preset rules include at least described subsequent The the first subcharacter vector of the first son slice and the second son of second candidate audio slice of candidate audio slice are sliced The distance between the second subcharacter vector minimum;
By the subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice splicing Together, the Composite tone file is obtained.
5. according to the method described in claim 4, it is characterized in that, determining that subsequent candidate audio is cut according to preset rules Piece, comprising:
When the first candidate audio slice and second candidate audio slice meet default synthetic strategy, according to default Rule determines the subsequent candidate audio slice;
Wherein, the default synthetic strategy is the pre-set strategy of producer of the Composite tone file.
6. according to the method described in claim 5, it is characterized in that, by the subsequent candidate audio slice, the second Candidate audio slice and the first candidate audio slice are stitched together, and obtain the Composite tone file, comprising:
The subsequent candidate audio slice, second candidate audio slice and the first candidate audio slice are done and returned One change processing, the second candidate audio slice and normalizing after subsequent candidate audio slice, normalization after being normalized The first candidate audio slice after change;
By after the normalization subsequent candidate audio slice, second candidate audio slice and described return after the normalization The first candidate audio slice after one change is stitched together, and obtains the Composite tone file.
7. method according to claim 1-6, which is characterized in that obtain at least two candidate audios slice, packet It includes:
Obtain at least two audio file;
Multiplicity is extracted from least two audio file to cut beyond at least two candidate audio of default multiplicity Piece.
8. a kind of audio synthesizer characterized by comprising
First obtains module, for obtaining at least two candidate audios slice;Wherein, at least two candidate audios slice comes From at least two audio files;
Second obtains module, and the first son for obtaining each candidate audio slice in at least two candidate audios slice is cut Piece and the second son slice;Wherein, the first son slice be corresponding candidate audio slice since initial time it is default when Between length slice, the second son slice is the predetermined time period before the finish time of corresponding candidate audio slice Slice;
Computing module, for being sliced according to the first sub- slice and the second son of each candidate audio slice, calculating is described extremely The similarity between every two candidate audio slice in few two candidate audios slice from different audio files;
Third obtains module, for candidate by described at least two according to the similarity between every two candidate audio slice It is stitched together some or all of in audio slice, obtains Composite tone file.
9. a kind of audio synthesis apparatus characterized by comprising
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is configured to executing described instruction, to realize such as side of any of claims 1-7 Method.
10. a kind of storage medium, which is characterized in that when the instruction in the storage medium is held by the processor of audio synthesis apparatus When row, so that the audio synthesis apparatus is able to carry out such as method of any of claims 1-7.
CN201910579288.5A 2019-06-28 2019-06-28 Audio synthesis method, device and equipment Active CN110400559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910579288.5A CN110400559B (en) 2019-06-28 2019-06-28 Audio synthesis method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910579288.5A CN110400559B (en) 2019-06-28 2019-06-28 Audio synthesis method, device and equipment

Publications (2)

Publication Number Publication Date
CN110400559A true CN110400559A (en) 2019-11-01
CN110400559B CN110400559B (en) 2020-09-29

Family

ID=68323643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910579288.5A Active CN110400559B (en) 2019-06-28 2019-06-28 Audio synthesis method, device and equipment

Country Status (1)

Country Link
CN (1) CN110400559B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112382310A (en) * 2020-11-12 2021-02-19 北京猿力未来科技有限公司 Human voice audio recording method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090048841A1 (en) * 2007-08-14 2009-02-19 Nuance Communications, Inc. Synthesis by Generation and Concatenation of Multi-Form Segments
CN102024033A (en) * 2010-12-01 2011-04-20 北京邮电大学 Method for automatically detecting audio templates and chaptering videos
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108831424A (en) * 2018-06-15 2018-11-16 广州酷狗计算机科技有限公司 Audio splicing method, apparatus and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090048841A1 (en) * 2007-08-14 2009-02-19 Nuance Communications, Inc. Synthesis by Generation and Concatenation of Multi-Form Segments
CN102024033A (en) * 2010-12-01 2011-04-20 北京邮电大学 Method for automatically detecting audio templates and chaptering videos
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal
CN108419123A (en) * 2018-03-28 2018-08-17 广州市创新互联网教育研究院 A kind of virtual sliced sheet method of instructional video
CN108831424A (en) * 2018-06-15 2018-11-16 广州酷狗计算机科技有限公司 Audio splicing method, apparatus and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112382310A (en) * 2020-11-12 2021-02-19 北京猿力未来科技有限公司 Human voice audio recording method and device

Also Published As

Publication number Publication date
CN110400559B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN101452696B (en) Signal processing device, signal processing method and program
EP2816550B1 (en) Audio signal analysis
CN104395953B (en) The assessment of bat, chord and strong beat from music audio signal
US12027165B2 (en) Computer program, server, terminal, and speech signal processing method
CN105788589A (en) Audio data processing method and device
CN107666638B (en) A kind of method and terminal device for estimating tape-delayed
CN1937462A (en) Content-preference-score determining method, content playback apparatus, and content playback method
WO2015114216A2 (en) Audio signal analysis
CN111292717B (en) Speech synthesis method, speech synthesis device, storage medium and electronic equipment
CN110675886A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
WO2012171583A1 (en) Audio tracker apparatus
WO2019162703A1 (en) Method of combining audio signals
CN110010159B (en) Sound similarity determination method and device
CN113691909B (en) Digital audio workstation with audio processing recommendations
US20240004606A1 (en) Audio playback method and apparatus, computer readable storage medium, and electronic device
CN113936629B (en) Music file processing method and device and music singing equipment
CN113781989A (en) Audio animation playing and rhythm stuck point identification method and related device
WO2016102738A1 (en) Similarity determination and selection of music
US20180173400A1 (en) Media Content Selection
CN112037739B (en) Data processing method and device and electronic equipment
CN106775567B (en) Sound effect matching method and system
CN110400559A (en) A kind of audio synthetic method, device and equipment
JP6288197B2 (en) Evaluation apparatus and program
JP6102076B2 (en) Evaluation device
CN107025902B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant