WO2017028658A1 - 一种语音数据的调整方法及装置 - Google Patents

一种语音数据的调整方法及装置 Download PDF

Info

Publication number
WO2017028658A1
WO2017028658A1 PCT/CN2016/091618 CN2016091618W WO2017028658A1 WO 2017028658 A1 WO2017028658 A1 WO 2017028658A1 CN 2016091618 W CN2016091618 W CN 2016091618W WO 2017028658 A1 WO2017028658 A1 WO 2017028658A1
Authority
WO
WIPO (PCT)
Prior art keywords
length
frame
adjustment
target
compression
Prior art date
Application number
PCT/CN2016/091618
Other languages
English (en)
French (fr)
Inventor
史巍
刘丹
刘建敏
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017028658A1 publication Critical patent/WO2017028658A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion

Definitions

  • This document relates to, but is not limited to, the field of audio signal processing, and in particular, to a method and apparatus for adjusting voice data.
  • Time scale changes are mainly used in the field of variable speed playback and vocalization. They are also suitable for network jitter, delay and packet loss, and require a voice patching environment.
  • the time-scale change algorithm is used to stretch or compress the voice signal, which can effectively reduce the impact of the harsh network environment on voice quality and improve subjective listening in this environment. Sound experience.
  • the airflow passes through the glottis to cause the vocal cord to produce a oscillating vibration, which produces a quasi-periodic pulsed airflow.
  • This airflow excites the channel to produce voiced sound, also known as voiced speech, which carries most of the energy in the voice. .
  • the frequency of this vocal cord vibration is called the fundamental frequency, and the corresponding period is called the pitch period.
  • the pitch period is gradually opened from the vocal cord to the largest area (about 50% of the pitch period), and gradually closed to fully closed (about 35% of the pitch period and fully closed (about 15% of the pitch period) are composed of three parts.
  • the concept of pitch period involves the pitch delay.
  • the pitch delay is based on a certain limit, so that the autocorrelation function of the residual signal has the maximum delay.
  • the calculation of the pitch delay of each frame is performed through two estimation windows. Wherein, the range of the first estimation window is the entire current frame signal, the range of the second estimation window is the second half of the current frame and the lookahead (prefetch) portion; and the two estimation windows (prediction windows) respectively obtain one of the most After the delay parameter is selected, according to a certain logic judgment, one of the two optimal delay parameters is selected as the delay parameter of the current frame, that is, the pitch period is determined.
  • Synchronization overlap-and-add SOLA
  • the principle of the algorithm is: the original signal according to the frame spacing S a , the frame length N
  • the framing is performed and then synthesized at the frame spacing S s , and the ratio of S a and S s determines the stretching/compression ratio of the speech.
  • Pitch Synchronization overlap-and-add PSOLA
  • the main principle of the algorithm is to first estimate the pitch period. Then, the input waveform is pitch-marked to match the original speech signal with a series of pitches.
  • the synchronized window functions are multiplied to obtain a series of overlapping analysis short-term signals; then the analysis short-term signals are adjusted at a fixed ratio such as fundamental frequency, duration and amplitude to obtain a series of short-time synthesis synchronized with the target pitch curve.
  • the signal sequence; finally, the synthesized short-term signal sequence is arranged in synchronization with the target pitch period, and the accumulated speech waveform is obtained by overlapping and accumulating.
  • each frame signal is adjusted according to a fixed stretch/compression ratio, and the obtained voice waveform signal (voice data) is of poor quality, and for the above defects, There is currently no effective solution.
  • the embodiment of the invention provides a method and a device for adjusting voice data, which can improve the quality of voice data.
  • a method for adjusting voice data including: acquiring parameter information of a specified frame in the voice data to be processed, and a first target stretch or compression length of the specified frame, where
  • the parameter information of the specified frame includes: a pitch period, a first frame length, and a first correction value; calculating a sum of the first target stretch or compression length and the first correction value, and obtaining the sum as the first a second target stretching or compression length; calculating the adjustment parameter according to the obtained second target stretching or compression length and the pitch period; adjusting the length of the specified frame according to the adjustment parameter; determining adjustment The second frame length of the length of the specified frame and the second correction value, and updating the correction value of the next frame of the specified frame performing the stretching or compression operation according to the second correction value.
  • the adjusting the length of the specified frame according to the adjusting parameter comprises: adjusting the specified frame to the first according to the first frame length and the second target stretched or compressed length Subframe length; calculating the first subframe length minus the first frame length to obtain a first difference; calculating the first target stretch or compression length minus the first difference to obtain a second difference Determining whether the second difference is greater than 0; and determining that the second difference is less than or equal to 0
  • One subframe length is the second frame length;
  • the adjusting method further includes: when the obtained second difference is greater than 0, performing, according to the first subframe length and the third target stretch length, the frame corresponding to the first subframe length Adjusting to obtain the second frame length, wherein the third target stretch length is an absolute value of a difference between the second difference and the pitch period.
  • the calculating the adjustment parameter according to the second target stretching or compression length and the pitch period comprises: dividing the second target stretching or compression length by the pitch period to obtain a quotient value Comparing the obtained quotient value with the size of 1; if the quotient value is greater than or equal to 1, the largest positive integer less than or equal to the quotient value is used as the adjustment base; if the quotient is less than 1, then Taking 1 as the adjustment base; setting a product of the pitch period and the adjustment base as the adjustment parameter.
  • the method further includes: comparing the adjustment parameter with a size of the first frame length; If the adjustment parameter is greater than the first frame length, the adjustment parameter is updated with the first frame length.
  • an apparatus for adjusting voice data including: an acquiring module, configured to acquire parameter information of a specified frame in the voice data to be processed, and a first target pull of the specified frame Extending or compressing the length, wherein the parameter information of the specified frame comprises: a pitch period, a first frame length, and a first correction value;
  • a first calculating module configured to calculate a sum of the first target tensile or compressive length and the first correction value, and obtain the sum as a second target stretch or compression length;
  • a second calculating module configured to calculate an adjustment parameter according to the obtained second target stretching or compression length and the pitch period
  • a processing module configured to adjust a length of the specified frame according to the adjustment parameter and parameter information of the specified frame
  • an update module configured to determine a second frame length and a second correction value for adjusting a length of the specified frame, and update a next frame of the specified frame performing a stretching or compression operation according to the second correction value Corrected value.
  • the processing module includes:
  • a first adjusting unit configured to adjust the specified frame to a first subframe length according to the first frame length and the second target stretch or compression length
  • a first calculating unit configured to calculate the first subframe length minus the first frame length to obtain a first difference
  • a determining unit configured to calculate the first target stretch or compression length minus the first difference to obtain a second difference, and determine whether the second difference is greater than 0;
  • a determining unit configured to determine that the first subframe length is the second frame length when the obtained second difference is less than or equal to 0;
  • a second adjusting unit configured to adjust a length of the specified frame according to the determined length of the second frame.
  • processing module further includes:
  • a third adjusting unit configured to adjust, according to the first subframe length and the third target stretch length, a frame corresponding to the first subframe length to the a second frame length, wherein the third target stretch length is an absolute value of a difference between the second difference and the pitch period.
  • the second calculating module includes:
  • a second calculating unit configured to divide the second target stretch or compression length by the pitch period to obtain a quotient value
  • a first comparison unit configured to compare the obtained quotient value with a size of 1
  • a first setting unit configured to set a maximum positive integer less than or equal to the quotient value to an adjustment base if the quotient value is greater than or equal to 1; or, if the quotient is less than 1, set 1 as a Adjustment base
  • a second setting unit configured to set a product of the pitch period and the adjustment base as the adjustment parameter.
  • the second calculating module further includes:
  • a second comparing unit configured to compare the adjustment parameter and the size of the first frame length after the product of the pitch period and the adjustment base is set to the adjustment parameter
  • an updating unit configured to update the adjustment parameter by using the first frame length if the adjustment parameter is greater than the first frame length.
  • the technical solution provided by the embodiment of the present invention includes: acquiring parameter information of a specified frame in the voice data to be processed, and a first target stretched or compressed length of the specified frame, where the parameter information of the specified frame is obtained.
  • the method includes: a pitch period, a first frame length, and a first correction value; calculating a sum of the first target stretching or compression length and the first correction value to obtain a second target stretching or compression length; and stretching or compressing the length according to the second target And the pitch period calculation obtains the adjustment parameter; the length of the specified frame is adjusted according to the adjustment parameter.
  • the embodiment of the invention implements real-time adjustment for each frame of voice data, and improves the signal quality of the voice data.
  • the embodiment obtains parameter information of a specified frame in the voice data to be processed, and specifies a first target stretch or compression length of the frame; and calculates a sum of the first target stretch or compression length and the first correction value.
  • the second target is stretched or compressed, and the adjustment parameter is calculated according to the second target stretching or compression length and the pitch period; the length of the specified frame is adjusted according to the adjustment parameter; and the second frame length and the length of the specified frame are determined.
  • the second correction value is updated, and the correction value of the next frame of the specified frame in which the stretching or compression operation is performed is updated according to the second correction value.
  • the frame-by-frame iteration adjustment method is adopted for each frame of the entire voice data to be processed, and the adjustment result of the previous frame affects the adjustment ratio of the next frame, which solves the problem that the stretch/compression ratio of each frame is the same in the related art, and cannot be changed in real time.
  • the stretching/compression ratio is limited, and the technical problem of the control cannot be controlled as a whole, thereby achieving some sudden changes in the transmission data of the voice data by changing the stretching/compression ratio of each frame in real time (for example, The technical effect of improving the overall voice quality by jitter, packet loss, and delay) effectively reduces the impact of the harsh network environment on voice quality.
  • FIG. 1 is a flowchart of a method for adjusting voice data according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing the structure of an apparatus for adjusting voice data according to an embodiment of the present invention
  • FIG. 3 is a block diagram 1 of an optional structure of an apparatus for adjusting voice data according to an embodiment of the present invention
  • FIG. 4 is a block diagram 2 of an optional structure of an apparatus for adjusting voice data according to an embodiment of the present invention
  • FIG. 5 is a block diagram 3 of an optional structure of an apparatus for adjusting voice data according to an embodiment of the present invention.
  • FIG. 6 is a block diagram 4 of an optional structure of an apparatus for adjusting voice data according to an embodiment of the present invention.
  • FIG. 7 is a schematic flow chart of adjusting voice data according to an optional embodiment of the present invention.
  • Figure 8 is a flow chart of a stretching operation of an alternative embodiment of the present invention.
  • Figure 9 is a schematic view of an extension of an alternative embodiment of the present invention.
  • Figure 10 is a second perspective view of an alternative embodiment of the present invention.
  • FIG. 11 is a flow chart of a compression operation of an alternative embodiment of the present invention.
  • Figure 12 is a three schematic diagram of compression performed at different pitch periods in accordance with an alternative embodiment of the present invention.
  • a method for adjusting voice data is provided.
  • This embodiment can be applied to all areas and scenarios that require time scale changes, such as in a multimedia device, by stretching/compressing multimedia data.
  • Variable-speed playback and voice-changing functions in digital communication or Internet communication, can effectively handle burst delay, jitter and packet loss during voice transmission by properly stretching/compressing voice data, especially unvoiced frames. And so on, thus ensuring the quality of the voice during transmission.
  • 1 is a flowchart of a method for adjusting voice data according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • Step 102 Obtain parameter information of a specified frame in the voice data to be processed, and a first target stretch or compression length of the specified frame.
  • the specified frame may be any frame in the entire to-be-processed voice data.
  • the specified frame may be the first frame sequentially arranged in the voice data, and the parameter information of the specified frame may be represented.
  • the frame is The units of long, corrected values and pitch periods are represented by a wide range of unit "points" used in the art.
  • Step 104 Calculate a sum of the first target tensile or compression length and the first correction value, and obtain the sum as the second target to stretch or compress the length;
  • the second target stretch or compression length indicates the length of the specified frame that needs to be stretched or compressed after considering the correction value, such as the first target stretch length is 100 points, and the first correction value is -20 points.
  • the correction value such as the first target stretch length is 100 points
  • the first correction value is -20 points.
  • the length of stretching or compression of each frame is related to the self-parameter of the specified frame, it can only be adjusted in units of the length of the pitch period. In the case, an error is generated, and the error of the previous frame is transmitted to the next frame by the correction value, thereby effectively reducing the adjustment error of the entire voice data to a minimum.
  • Step 106 Calculate the adjustment parameter according to the second target stretching or compression length and the pitch period; in the embodiment of the present invention, the adjustment parameter is used to indicate the length of stretching or compressing the specified frame;
  • the adjustment parameter of the specified frame can be determined according to the adjustment type of the specified frame, that is, according to whether the length is stretched or compressed, stretched or compressed, and the length of the first frame of the specified frame is related, and the specified frame can be adjusted only once by calculation.
  • the adjustment parameter indicates the length of the stretch or compression
  • the adjustment parameters include: the number of times of stretching or compression, each time pulling The length of the stretch or compression.
  • Step 108 Adjust the length of the specified frame according to the adjustment parameter and the parameter information of the specified frame.
  • Step 1010 Determine a second frame length and a second correction value for adjusting a length of the specified frame and update a correction value of a next frame of the specified frame that performs the stretching or compression operation according to the second correction value.
  • updating the correction value of the next frame of the specified frame performing the stretching or compression operation according to the second correction value may include: using the second correction value as the next frame of the specified frame performing the stretching or compression operation Correction value.
  • the compression adjustment needs only one adjustment process to complete the adjustment; however, when performing the stretching operation, the second frame length and the first frame length and the second target are required to be performed. Stretched and compared, if the length of the second frame is greater than or equal to the length of the first frame And the second target stretched, the stretching operation ends, otherwise the second, third... stretching process is performed until the obtained third frame length, the fourth frame length... appears N+1 The frame length is greater than the sum, and the second correction value is determined by subtracting and subtracting the length of the (N+1)th frame.
  • the second frame length is the adjusted length of the specified frame
  • the second correction value indicates an adjustment error of the specified frame
  • the adjustment error of the previous frame is transmitted to the next frame by using a correction value.
  • the parameter information of the specified frame in the voice data to be processed is obtained, and the first target stretched or compressed length of the specified frame is used, where the parameter information of the specified frame includes: a pitch period, a first frame length, a first correction value, and then calculating a sum of the first target tensile or compression length and the first correction value to obtain a second target tensile or compression length, and calculating an adjustment parameter according to the second target tensile or compression length and the pitch period, wherein
  • the adjustment parameter is used to indicate the length of the specified frame to be stretched or compressed, and the length of the specified frame is adjusted according to the adjustment parameter; when the length of the specified frame is adjusted, the second frame length and the length of the length of the specified frame are determined.
  • the adjustment result affects the adjustment ratio of the next frame, and solves the problem that the stretching/compression ratio of each frame in the related art is the same, cannot be changed in real time, and the stretching/ The shrinkage ratio is limited, and the technical problem of control can not be controlled as a whole.
  • the stretch/compression ratio of each frame in real time By changing the stretch/compression ratio of each frame in real time, the sudden situation (for example, jitter, packet loss, delay) of voice data in transmission communication is realized.
  • the compensation improves the overall voice quality and effectively reduces the impact of the harsh network environment on voice quality.
  • adjusting the length of the specified frame according to the adjustment parameter and the parameter information of the specified frame includes:
  • the first stretch length (that is, the first difference) can be calculated first, and then the difference between the first target stretch length and the first stretch length can be determined. Value, if the difference is less than or equal to 0, if the first stretch length reaches or exceeds the first target stretch length, the result of the first stretch is taken as the stretch result of the specified frame, and the adjustment continues. Next frame.
  • the third target stretch length adjusts the frame corresponding to the first subframe length to obtain a second frame length, wherein the third target stretch length is an absolute value of a difference between the second difference and the pitch period.
  • the stretching requirement of the specified frame is not reached by the first stretching, and the frame of the second frame length is not obtained, and the stretching needs to be continued, but the target length of the stretching is first pulled.
  • the extension is based on less, and may be the absolute value of the difference between the second difference and the pitch period, and the absolute value is used as the third target stretch length, and the second stretch is performed to obtain the final second of the specified frame. Frame length.
  • calculating adjustment parameters based on the second target stretch or compression length and pitch period comprises:
  • the second target stretch length is 160 points, and an example is given. If the pitch period is 50 points, the quotient value is calculated to be 3.2, and greater than or equal to 1, the first set of algorithms is used. Obtaining a maximum positive integer 3 less than or equal to 3.2, the maximum positive integer and the pitch period Multiply the adjustment parameter 150; if the pitch period is 200 points, the calculated quotient is 0.8, less than 1, and another set of algorithms is used to directly multiply 1 and the pitch period to obtain the adjustment parameter 200.
  • the method may further include:
  • the adjustment parameter is updated with the first frame length.
  • updating the adjustment parameter may include using the first frame length as an adjustment parameter.
  • the adjustment parameter may be too large to adjust the specified frame, or the adjustment effect is not good.
  • the adjustment parameter needs to be adjusted, and may be adjusted according to the first frame length of the currently specified frame. If the adjustment parameter is 150 points and the length of the first frame is 120 points, the length of the adjustment parameter is found to be greater than the length of the first frame, and 120 is updated to the adjustment parameter.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM or RAM, a disk,
  • a storage medium such as ROM or RAM, a disk
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the embodiments of the present invention.
  • an apparatus for adjusting voice data is further provided.
  • the apparatus may be disposed in a device that can process or transmit voice data, and the apparatus is used to implement the foregoing embodiments and preferred embodiments. Let me repeat.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments may be implemented in software, hardware, or a combination of software and hardware, is also possible and conceived;
  • the embodiment of the present invention may be disposed in an independent device connected to the sound collection device, or may be disposed in the sound collection device. Regardless of the setting form, when receiving voice data, the embodiment of the present invention may be adopted. The method is processed.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores the meter
  • the computer executable instructions are used to execute the adjustment method of the voice data.
  • the apparatus includes: an obtaining module 20, a first calculating module 22, a second calculating module 24, and a processing module 26, where
  • the obtaining module 20 is configured to obtain parameter information of a specified frame in the voice data to be processed, and a first target stretch or compression length of the specified frame, where the parameter information of the specified frame includes: a pitch period, a first frame length, and a first a correction value;
  • the specified frame may be any frame in the entire to-be-processed voice data.
  • the designated frame is the first frame sequentially arranged in the voice data
  • the parameter information of the specified frame indicates the Specifying information of a frame self parameter, such as a pitch period, a first frame length, and a first correction value, wherein the first frame length represents a frame length of the specified frame, and the first correction value represents a computable error of the specified frame length
  • the correction value of each frame may default to 0 before adjustment, and the correction value of each frame may be transmitted between frames of the entire speech data, and the first target stretch or compressed length indicates the length of the specified frame that needs to be stretched or compressed. It may be set in advance or calculated, and in the present embodiment, the units of the frame length, the correction value, and the pitch period are expressed by a unit "point" which is widely used in the art.
  • the first calculation module 22 is coupled to the acquisition module 20 and configured to calculate a sum of the first target tensile or compression length and the first correction value, and obtain the sum as the second target to stretch or compress the length;
  • the second target stretch or compression length indicates the length of the specified frame that needs to be stretched or compressed after considering the correction value, such as the first target stretch length is 100 points, and the first correction value is -20 points.
  • the correction value such as the first target stretch length is 100 points
  • the first correction value is -20 points.
  • the length of stretching or compression of each frame is related to the self-parameter of the specified frame, it can only be adjusted in units of the length of the pitch period. In the case, an error is generated, and by transmitting the error of the previous frame to the next frame by the correction value, the adjustment error of the entire voice data can be effectively reduced to a minimum.
  • the second calculation module 24 is coupled to the first calculation module 22 and configured to calculate an adjustment parameter according to the obtained second target stretching or compression length and pitch period, wherein the adjustment parameter is used to indicate that the specified frame is stretched or Length of compression;
  • the adjustment parameters of the specified frame are determined according to the adjustment type of the specified frame, and can be stretched or pressed.
  • the processing module 26 is coupled to the second computing module 24 and configured to adjust the length of the specified frame according to the adjustment parameter and the parameter information of the specified frame.
  • the updating module 28 is configured to, when the processing module 26 adjusts the length of the specified frame, determine a second frame length and a second correction value for adjusting the length of the specified frame, and update the designation of performing the stretching or compression operation according to the second correction value. The correction value of the next frame of the frame.
  • the second frame length is the adjusted length of the specified frame
  • the second correction value indicates an adjustment error of the specified frame
  • the adjustment error of the previous frame is transmitted to the next frame by using a correction value.
  • the processing module 26 includes: a first adjusting unit 30, in addition to all the modules shown in FIG. a first calculating unit 32, a determining unit 34, and a determining unit 36, wherein
  • the first adjusting unit 30 is configured to adjust the specified frame to the first subframe length according to the first frame length and the second target stretch or compression length;
  • the first calculating unit 32 is coupled to the first adjusting unit 30, and is configured to calculate a first subframe length minus a first frame length to obtain a first difference;
  • the determining unit 34 is coupled to the first calculating unit 32, and configured to calculate a first target stretch or compression length minus the first difference to obtain a second difference, and determine whether the second difference is greater than 0;
  • the determining unit 36 is coupled to the determining unit 34.
  • the determining unit 34 determines that the obtained second difference is less than or equal to 0, it determines that the first subframe length is the second frame length.
  • the first stretch length can be calculated first, and then the first target stretch length minus the first stretch length difference, if the difference is small At or equal to 0, if the first stretch length reaches or exceeds the first target stretch length, the result of the first stretch is taken as the stretch result of the specified frame, and the next frame is continuously adjusted.
  • the processing module 26 includes: a third adjustment unit 40, in addition to all the modules shown in FIG. And being coupled to the determining unit 34, configured to adjust the frame corresponding to the first subframe length according to the first subframe length and the third target stretch length when the second difference obtained by the determining unit 34 is greater than 0. a second frame length, wherein the third target stretch length is an absolute value of a difference between the second difference and the pitch period.
  • the stretching requirement of the specified frame is not reached by the first stretching, and the frame of the second frame length is not obtained, and the stretching needs to be continued, but the target length of the stretching is first pulled.
  • the extension is based on less, and may be the absolute value of the difference between the second difference and the pitch period, and the absolute value is used as the third target stretch length, and the second stretch is performed to obtain the final second of the specified frame. Frame length.
  • FIG. 5 is a block diagram 3 of an optional structure of a voice data adjusting apparatus according to an embodiment of the present invention.
  • the second calculating module 24 includes: a second computing unit, except for all the modules shown in FIG. 50, set to divide the second target stretch or compression length by the pitch period to obtain a quotient value; the first comparison unit 52, set to compare the obtained quotient value and the size of 1; the first setting unit 54, set to a quotient value If the value is greater than or equal to 1, the maximum positive integer less than or equal to the quotient value is set as the adjustment base; or, if the quotient is less than 1, the 1 is set as the adjustment base; the second setting unit 56 is set to set the pitch period and the adjustment base The product of the setting is adjusted to the parameter.
  • the second target stretch length is 160 points, and an example is given.
  • the pitch period is 50 points
  • the quotient value is calculated to be 3.2, and greater than or equal to 1, the first set of algorithms is used. Obtaining a maximum positive integer 3 less than or equal to 3.2, multiplying the largest positive integer by the pitch period to obtain an adjustment parameter 150; if the pitch period is 200 points, the calculated quotient is 0.8, less than 1, and another set is used. The algorithm directly multiplies 1 and the pitch period to obtain an adjustment parameter 200.
  • FIG. 6 is a block diagram showing an optional structure of the apparatus for adjusting voice data according to an embodiment of the present invention.
  • the second calculating module further includes: a second comparing unit 60 configured to set the pitch period in the second setting unit 56. After the product of the adjustment base is set to the adjustment parameter, the adjustment parameter and the size of the first frame length are compared; the updating unit 62 is configured to update the adjustment parameter with the first frame length if the adjustment parameter is greater than the first frame length.
  • the adjustment parameter may be too large to adjust the specified frame, or the adjustment effect is not good.
  • the adjustment parameter needs to be adjusted, and may be adjusted according to the first frame length of the currently specified frame. If the adjustment parameter is 150 points and the length of the first frame is 120 points, the length of the adjustment parameter is found to be greater than the length of the first frame, and 120 is updated to the adjustment parameter.
  • FIG. 7 is a schematic flowchart of adjusting voice data according to an optional embodiment of the present invention. As shown in FIG. 7 , after starting the process, input voice data to be processed into a cache to determine whether voice data needs to be adjusted, if not voice data that needs to be adjusted. Then, if it is necessary to adjust the voice data, the parameters of the pitch period and the stretching/compression are calculated, stretched/compressed, and finally the adjusted voice data is output.
  • the voice data it is determined whether the voice data needs to be adjusted, and may include two types, one is mainly a voice variable speed playback, and one network packet loss and a delay compensation, and the two methods may be used to determine whether it is needed.
  • the voice data is adjusted to determine whether the voice data adjustment needs to be performed by a person skilled in the art, and details are not described herein.
  • PitchTime pitch period
  • FrameTag the number of target points, that is, the number of points that need to be stretched/compressed; (equivalent to the first target stretch/compression length)
  • TagRES target point correction value, passing information between frames; (equivalent to correction value)
  • OptLength the number of points of this stretch/compression; (equivalent to adjusting parameters)
  • DataLength current data length; (equivalent to the length of the first frame)
  • FIG. 8 is a flow chart of a stretching operation according to an alternative embodiment of the present invention, as shown in FIG.
  • the current data length DataLength is the data length of the frame FrameLength;
  • steps 72-74 can be summarized as: calculating/updating related parameters of the stretched voice data
  • S76 Determine whether the stretching requirement is met; and: update the DataLength by using DataLength plus OptLength. Subtract OptLength from FrameTag to get the new FrameTag. If the FrameTag is less than or equal to 0, the stretching ends, otherwise the above operations S73 to S76 are cycled until the stretching or compression ends;
  • Update voice related information including: correcting interframe information such as TagRES by using a deviation value between the stretching result of the frame and the expected result.
  • FIG. 9 is a schematic view of an extension of an alternative embodiment of the present invention. As shown in FIG. 9, a case where a signal having a pitch period of 100 and a frame length of 160 stretches out 160 point signals is illustrated.
  • the first frame expansion is then performed on the data. Since OptLength is greater than half of the entire sequence length DataLength, the data length of the two segments located at the head and tail of the source data is 60 points for smoothing. That is, the first to 100th points of the original data s of the speech data are first copied to the 1st to 100th points of the post-stretched speech s'. Then, the first to 60th points and the 101st point to the 160th point of the original data s are smoothed and then placed at the 101st to 160th points of the stretched speech s'. The 61st to 160th points of the original data s are then directly copied to the 161th to 260th points of the post-stretched speech s'.
  • OptLength is less than half of the entire sequence length DataLength, so the data of the two consecutive lengths of OptLength starting from the head of the source data is smoothed. That is, the first to 100th points of the post-stretched speech s' are first copied to the 1st to 100th points of the second stretched speech s". Then the first of the post-stretched speech s' After the 100th point and the 101st point to the 200th point are smoothed, the 101st to 200th points of the speech s" after the second stretching are placed. Finally, the 101st to 260th points of the original data s' are directly copied after the 200th point of the post-stretched speech s".
  • the length of the final stretched sequence is 360, not our desired 320, and more than 40 samples, but TagRES has been recorded.
  • FIG. 10 is a second schematic diagram of the stretching of an alternative embodiment of the present invention. As shown in FIG. 10, in this example, a case where a signal having a pitch period of 40 and a frame length of 160 is stretched out by 150 dot signals is shown.
  • the first frame expansion is performed on the data, because OptLength is equal to half of the entire sequence length DataLength, so the data of the two consecutive lengths of OptLength starting from the head of the source data is smoothed. That is, the first to 80th points of the original data s are first copied to the 1st to 80th points of the post-stretched speech s'. Then, the first to 80th points and the 81st point to the 160th point of the voice data s are smoothed and then placed at the 81st to 160th points of the first stretched speech s'. Finally, the 81st to 160th points of the original data s are directly copied after the 160th point of the first post-stretched speech s'.
  • OptLength is less than half of the entire sequence length DataLength, so the data of the two consecutive lengths of OptLength starting from the head of the source data is smoothed. That is, the first to 40th points of the original data s are first copied to the first to 40th points of the first stretched speech s'. Then, the first to 40th points and the 41st point to the 80th point of the voice data s are smoothed and then placed at the 41st to 80th points of the first stretched speech s'. Finally, the 41st to 240th points of the original data s are directly copied after the 80th point of the post-stretched speech s'.
  • This example is the signal stretching of a frame immediately after stretching Example 1.
  • the length of the pre-stretching sequence is 160, and it is necessary to stretch 160 points, and the actual length of the sequence after stretching is 360.
  • the length of the sequence before stretching is 160.
  • FIG. 11 is a flow chart of a compression operation of an alternative embodiment of the present invention. As shown in Figure 11, it includes:
  • Steps 82-84 are: calculating compression related parameters; wherein
  • the method includes: correcting interframe information, such as TagRES, by using a deviation value between the compression result of the frame and the expected result.
  • FIG. 12 is a schematic diagram of three types of compression performed under different pitch periods according to an alternative embodiment of the present invention. As shown in FIG. 12, three compression diagrams are shown when the pitch period is 40, 60, and 100, respectively, where in represents the original data. , that is, the data before processing, and out represents the compressed data.
  • the data is then frame compressed. Since OptLength is exactly half of the entire sequence length DataLength, the first half and the second half of the source data are smoothed. That is, the first to 80th points and the 81st point to the 160th point of the original data in1 are smoothed to obtain the compressed speech out1.
  • the data is then frame compressed.
  • OptLength is less than half of the entire original sequence length DataLength, so the data of the two consecutive lengths of OptLength starting from the head of the source data is smoothed, and then the remaining data is directly copied to the smoothed data. That is, the first to 60th points and the 61st point to the 120th point of the original data in2 are smoothed and then placed at the 1st to 60th points of the compressed speech out2. The 121st to 160th points of the original data in2 are then directly copied after the 60th point of the speech out2.
  • the data is then frame compressed. Because OptLength is greater than half of the entire original sequence length DataLength, the data length of the two segments located at the head and tail of the source data is 60. Just do a smoothing of the points. That is, the first to 60th points and the 101st point to the 160th point of the original data in3 are smoothed and then placed at the 1st to 60th points of the compressed speech out3. Then the 61st to 100th points of the original data in3 are discarded directly.
  • modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • S1 Obtain parameter information of a specified frame in the voice data to be processed, and a first target stretch or compression length of the specified frame;
  • modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by a computing device such that they may be stored in a storage device by a computing device and, in some cases, may be executed in a different order than herein.
  • the steps shown or described are either fabricated as integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
  • each module/unit in the above embodiment may be implemented in the form of hardware, for example, by using an integrated circuit to implement its corresponding work.
  • it can be implemented in the form of a software function module, for example, by executing a program/instruction stored in the memory by the processor to implement its corresponding function.
  • the invention is not limited to any specific form of combination of hardware and software.
  • the above technical solution realizes real-time adjustment of each frame of voice data, and improves the signal quality of the voice data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种语音数据的调整方法及装置,调整方法包括:获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度(102),指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;计算第一目标拉伸或压缩长度和第一修正值的和,将获得的和作为第二目标拉伸或压缩长度(104);依据第二目标拉伸或压缩长度和基音周期计算得到调整参数(106);依据调整参数和指定帧的参数信息对指定帧的长度进行调整(108);确定调整所述指定帧的长度的第二帧长度和第二修正值,并根据所述第二修正值更新执行拉伸或压缩操作的所述指定帧的下一帧的修正值(1010)。对每帧语音数据实现了实时调整,提高了语音数据的信号质量。

Description

一种语音数据的调整方法及装置 技术领域
本文涉及但不限于音频信号处理领域,尤其涉及一种语音数据的调整方法及装置。
背景技术
时间尺度变化算法(Time-scale modification)是一种语音在时域进行拉伸和压缩的方法;比如、以S(t)=sin(2t)来表示的信号,如果改变t的系数把信号变成sin(4t),则对S(t)=sin(2t)表示的信号进行了时间尺度变化。时间尺度变化主要用于变速播放和变声领域,也适用于网络抖动、延时和丢包而需要语音修补环境。在遇到网络抖动、延时和丢包等情况时,通过时间尺度变化算法对语音信号进行拉伸或压缩,可以有效减小恶劣网络环境对语音质量的影响,提高在此环境下的主观听音感受。
人在发浊音时,气流通过声门使声带产生张驰振荡式振动,产生一股准周期脉冲气流,这一气流激励声道就产生浊音,又称有声语音,它携带着语音中的大部分能量。这种声带振动的频率称为基频,相应的周期就称为基音周期(Pitch),基音周期由声带逐渐开启到面积最大(约占基音周期的50%)、逐渐关闭到完全闭合(约占基音周期的35%)和完全闭合(约占基音周期的15%)三部分组成。基音周期的概念涉及到基音延时,基音延时是在一定限制的基础上,使残差信号的自相关函数最大的延时,对每帧的基音延时的计算通过两个估计窗分别进行;其中,第一个估计窗的范围是整个当前帧信号,第二个估计窗的范围是当前帧的后一半和lookahead(预取)部分;经过两个估计窗(预测窗)分别得到一个最佳延时参数后,根据一定的逻辑判断,在两个最佳延时参数中选取一个作为当前帧的延时参数,即确定基音周期。
在相关技术的调整语音数据的方法中,研究比较多的是同步叠加算法(Synchronization overlap-and-add,简称为SOLA),该算法的原理是:将原始信号按帧间距Sa,帧长N进行分帧,再以帧间距Ss进行合成,Sa和Ss的比值随之决定了语音的拉伸/压缩比例。后来又提出了基音同步叠加算法(Pitch  Synchronization overlap-and-add,简称为PSOLA),该算法的主要原理是:首先估计基音周期;接着对输入波形进行基音标记,将原始语音信号与一系列基音同步的窗函数相乘,得到一系列重叠的分析短时信号;然后将分析短时信号按固定比例进行如基频、时长和幅度调整,得到相应的与目标基音曲线同步的一系列短时合成信号序列;最后将合成的短时信号序列与目标基音周期同步排列,重叠累加得到合成的语音波形。
相关技术中,在语音数据的时间尺度调整算法中,存在以下缺点:每帧信号按照固定的拉伸/压缩比例进行调整,获得的语音波形信号(语音数据)的质量较差,针对上述缺陷,目前尚没有有效的解决方法。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提供了一种语音数据的调整方法及装置,能够提高语音数据的质量。
根据本发明实施例的一个方面,提供了一种语音数据的调整方法,包括:获取待处理的语音数据中指定帧的参数信息,以及所述指定帧的第一目标拉伸或压缩长度,其中,所述指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;计算所述第一目标拉伸或压缩长度和所述第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;依据获得的所述第二目标拉伸或压缩长度和所述基音周期计算得到所述调整参数;依据所述调整参数对所述指定帧的长度进行调整;确定调整所述指定帧的长度的第二帧长度和第二修正值,并根据所述第二修正值更新执行拉伸或压缩操作的所述指定帧的下一帧的修正值。
可选地,所述依据所述调整参数对所述指定帧的长度进行调整包括:根据所述第一帧长度和所述第二目标拉伸或压缩长度对所述指定帧进行调整到第一子帧长度;计算所述第一子帧长度减去所述第一帧长度得到第一差值;计算所述第一目标拉伸或压缩长度减去所述第一差值得到第二差值,判断所述第二差值是否大于0;得到的所述第二差值小于或等于0时,确定所述第 一子帧长度为所述第二帧长度;
根据确定的所述第二帧长度调整所述指定帧的长度。
可选地,所述调整方法还包括:得到的所述第二差值大于0时,根据所述第一子帧长度和第三目标拉伸长度对所述第一子帧长度对应的帧进行调整得到所述第二帧长度,其中,所述第三目标拉伸长度为所述第二差值和所述基音周期的差值的绝对值。
可选地,所述依据所述第二目标拉伸或压缩长度和所述基音周期计算得到所述调整参数包括:将所述第二目标拉伸或压缩长度除以所述基音周期得到商值;比较得到的所述商值和1的大小;若所述商值大于或等于1,将小于或等于所述商值的最大正整数作为所述调整基数;若所述商值小于1,则将1作为所述调整基数;将所述基音周期和所述调整基数的乘积设置为所述调整参数。
可选地,在所述将所述基音周期和所述调整基数的乘积设置为所述调整参数之后,所述方法还包括:比较所述调整参数和所述第一帧长度的大小;若所述调整参数大于所述第一帧长度,则用所述第一帧长度更新所述调整参数。
根据本发明实施例的另一方面,提供了一种语音数据的调整装置,包括:获取模块,设置为获取待处理的语音数据中指定帧的参数信息,以及所述指定帧的第一目标拉伸或压缩长度,其中,所述指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;
第一计算模块,设置为计算所述第一目标拉伸或压缩长度和所述第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;
第二计算模块,设置为依据获得的所述第二目标拉伸或压缩长度和所述基音周期计算得到调整参数;
处理模块,设置为依据所述调整参数和指定帧的参数信息对所述指定帧的长度进行调整;
更新模块,设置为确定调整所述指定帧的长度的第二帧长度和第二修正值,并根据所述第二修正值更新执行拉伸或压缩操作的所述指定帧的下一帧 的修正值。
可选地,所述处理模块包括:
第一调整单元,设置为根据所述第一帧长度和所述第二目标拉伸或压缩长度对所述指定帧进行调整到第一子帧长度;
第一计算单元,设置为计算所述第一子帧长度减去所述第一帧长度得到第一差值;
判断单元,设置为计算所述第一目标拉伸或压缩长度减去所述第一差值得到第二差值,判断所述第二差值是否大于0;
确定单元,设置为得到的所述第二差值小于或等于0时,确定所述第一子帧长度为所述第二帧长度;
第二调整单元,设置为根据确定的所述第二帧长度调整所述指定帧的长度。
可选地,所述处理模块还包括:
第三调整单元,设置为在得到的所述第二差值大于0时,根据所述第一子帧长度和第三目标拉伸长度对所述第一子帧长度对应的帧调整到所述第二帧长度,其中,所述第三目标拉伸长度为所述第二差值和所述基音周期的差值的绝对值。
可选地,所述第二计算模块包括:
第二计算单元,设置为将所述第二目标拉伸或压缩长度除以所述基音周期得到商值;
第一比较单元,设置为比较得到的所述商值和1的大小;
第一设置单元,设置为若所述商值大于或等于1,将小于或等于所述商值的最大正整数设置为调整基数;或,若所述商值小于1,则将1设置为所述调整基数;
第二设置单元,设置为将所述基音周期和所述调整基数的乘积设置为所述调整参数。
可选地,所述第二计算模块还包括:
第二比较单元,设置为在所述将所述基音周期和所述调整基数的乘积设置为所述调整参数之后,比较所述调整参数和所述第一帧长度的大小;
更新单元,设置为若所述调整参数大于所述第一帧长度,则用所述第一帧长度更新所述调整参数。
与相关技术相比,本发明实施例提供的技术方案,包括:获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度,其中,指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;计算第一目标拉伸或压缩长度和第一修正值的和得到第二目标拉伸或压缩长度;依据第二目标拉伸或压缩长度和基音周期计算得到调整参数;依据调整参数对指定帧的长度进行调整。本发明实施例对每帧语音数据实现了实时调整,提高了语音数据的信号质量。
通过本发明实施例,采用获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度;计算第一目标拉伸或压缩长度和第一修正值的和得到第二目标拉伸或压缩长度,依据第二目标拉伸或压缩长度和基音周期计算得到调整参数;依据调整参数对指定帧的长度进行调整;确定调整指定帧的长度的第二帧长度和第二修正值,并根据第二修正值更新执行拉伸或压缩操作的指定帧的下一帧的修正值。通过整个待处理的语音数据的每一帧进行逐帧迭代的调整方式,上一帧的调整结果影响下一帧的调整比例,解决了相关技术中每帧拉伸/压缩比例相同,不能实时改变,且拉伸/压缩比例受限,不能从整体上把控的技术问题,进而达到了通过实时改变每帧的拉伸/压缩比例来补偿语音数据在传输通信中的一些突发状况(如,抖动、丢包、延迟)而提高整个语音质量的技术效果,有效减小了恶劣网络环境对语音质量的影响。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1是本发明实施例的语音数据的调整方法的流程图;
图2是本发明实施例的语音数据的调整装置的结构框图;
图3是本发明实施例的语音数据的调整装置的可选结构框图一;
图4是本发明实施例的语音数据的调整装置的可选结构框图二;
图5是本发明实施例的语音数据的调整装置的可选结构框图三;
图6是本发明实施例的语音数据的调整装置的可选结构框图四;
图7是本发明可选实施例的调整语音数据的流程示意图;
图8是本发明可选实施例的拉伸操作流程图;
图9是本发明可选实施例的拉伸示意图一;
图10是本发明可选实施例的拉伸示意图二;
图11是本发明可选实施例的压缩操作流程图;
图12是本发明可选实施例的在不同基音周期下进行压缩的三种示意图。
本发明的实施方式
下文中将结合附图对本申请的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在本实施例中提供了一种语音数据的调整方法,本实施例可以应用于所有需要进行时间尺度变化的领域和场景中,如在多媒体设备中,通过对多媒体数据的拉伸/压缩来实现变速播放和变声等功能,在数字通信或互联网通信中,通过对语音数据,特别是清音帧做合理的拉伸/压缩,可以有效的处理语音传输过程中突发的延时、抖动和丢包等状况,从而保证语音在传输过程中的质量。图1是本发明实施例的语音数据的调整方法的流程图,如图1所示,包括:
步骤102,获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度;
在本实施例中,指定帧可以为整个待处理语音数据中的任意帧,在刚开始处理语音数据时,指定帧可以为语音数据中按序排列的第一帧,指定帧的参数信息可以表示该指定帧自身参数的信息,如,基音周期、第一帧长度、 第一修正值,其中,第一帧长度表示该指定帧的帧长,第一修正值表示该指定帧帧长的可计算误差,每一帧的修正值在调整前可以默认为0,每一帧的修正值在整个语音数据的帧间可以传递,第一目标拉伸或压缩长度表示该指定帧需要拉伸或压缩的长度,可以预先设置或通过计算得出,在本实施例中,帧长,修正值和基音周期的单位以本领域使用广泛的单位“点”来表示。
步骤104,计算第一目标拉伸或压缩长度和第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;
可选的,第二目标拉伸或压缩长度表示考虑到修正值后该指定帧的实际需要拉伸或压缩的长度,如第一目标拉伸长度为100点,第一修正值为-20点,则通过计算可以得出实际只需要拉伸80点,由于每帧拉伸或压缩的长度与指定帧的自身参数相关,只能以基音周期的长度为单位进行调整,在每一帧调整过程中,会产生误差,通过将上一帧的误差通过修正值的方式传递到下一帧,有效地将整个语音数据的调整误差降到最小值。
步骤106,依据第二目标拉伸或压缩长度和基音周期计算得到调整参数;本发明实施例,调整参数用于指示对指定帧进行拉伸或压缩的长度;
指定帧的调整参数可以根据指定帧的调整类型确定,即根据是进行拉伸还是压缩、拉伸或压缩的长度、指定帧的第一帧长度相关,在指定帧通过计算可以只用调整一次就能实现目标调整时,调整参数表示本次拉伸或压缩的长度,而当指定帧需要拉伸或压缩多次才能实现目标调整时,调整参数包括:拉伸或压缩的次数,每一次需要拉伸或压缩的长度。
步骤108,依据调整参数和指定帧的参数信息对指定帧的长度进行调整。
步骤1010、确定调整指定帧的长度的第二帧长度和第二修正值并根据第二修正值更新执行拉伸或压缩操作的指定帧的下一帧的修正值。本发明实施例,根据第二修正值更新执行拉伸或压缩操作的指定帧的下一帧的修正值可以包括:将第二修正值作为执行拉伸或压缩操作的指定帧的下一帧的修正值。
需要说明的是,本发明实施例,一般的,进行压缩调整仅需要一次调整过程就可以完成调整;但是,执行拉伸操作时,需要通过将第二帧长度与第一帧长度和第二目标拉伸的和做比较,如果第二帧长度大于等于第一帧长度 和第二目标拉伸的和,则拉伸操作结束,否则进行第二次、第三次...的拉伸处理,直到得到的第三帧长度,第四帧长度…出现第N+1帧长度大于和,用和减去第N+1帧长度确定第二修正值。
可选的,第二帧长度为该指定帧调整后的长度,第二修正值表示该指定帧的调整误差,通过将上一帧的调整误差以修正值的方式在帧间将其传递到下一帧,解决了相关技术中以整个语音数据为单位进行调整时误差较大的技术问题,降低了语音数据的调整误差。
通过本发明实施例,采用获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度,其中,指定帧的参数信息包括:基音周期、第一帧长度、第一修正值,然后计算第一目标拉伸或压缩长度和第一修正值的和得到第二目标拉伸或压缩长度,依据第二目标拉伸或压缩长度和基音周期计算得到调整参数,其中,调整参数用于指示对指定帧进行拉伸或压缩的长度,依据调整参数对指定帧的长度进行调整;对指定帧的长度进行调整时,确定调整指定帧的长度的第二帧长度和第二修正值,并根据第二修正值更新执行拉伸或压缩操作的指定帧的下一帧的修正值,通过整个待处理的语音数据的每一帧进行逐帧迭代的调整方式,上一帧的调整结果影响下一帧的调整比例,解决了相关技术中每帧拉伸/压缩比例相同,不能实时改变,且拉伸/压缩比例受限,不能从整体上把控的技术问题,通过实时改变每帧的拉伸/压缩比例,实现了对语音数据在传输通信中的突发状况(如,抖动、丢包、延迟)的补偿,提高了整个语音质量,有效减小了恶劣网络环境对语音质量的影响。
可选的,在根据本实施例的可选实施方式中,依据调整参数和指定帧的参数信息对指定帧的长度进行调整包括:
S11,根据第一帧长度和第二目标拉伸或压缩长度对指定帧进行调整到第一子帧长度;
S12,计算第一子帧长度减去第一帧长度得到第一差值;
S13,将第一目标拉伸或压缩长度减去第一差值得到第二差值,判断第二差值是否大于0;
S14,得到的第二差值小于或等于0时,确定第一子帧长度为第二帧长度;
需要说明的是,第二差值小于或等于0时,表示已经拉伸的长度已经大于或等于目标拉伸长度了,只需要在计算下第二修正值,即可完成本发明示例。
在本实施例中,当通过第一帧长度和第二目标长度对指定帧第一次调整得到第一子帧长度后,如果第一子帧的长度和目标要拉伸到的长度的差值过大,则需要再进行再次拉伸处理,可以通过先计算得到第一次拉伸长度(即第一差值),然后再判断第一目标拉伸长度减去第一次拉伸长度的差值,如果差值小于或等于0,如果第一次拉伸长度达到了或超过了第一目标拉伸长度,则将第一次拉伸的结果作为该指定帧的拉伸结果,并继续调整下一帧。
在根据本实施例的可选实施方式中,还存在另外一种情况,在判断第一目标拉伸长度减去第一差值得到的第二差值大于0时,根据第一子帧长度和第三目标拉伸长度对第一子帧长度对应的帧进行调整得到第二帧长度,其中,第三目标拉伸长度为第二差值和基音周期的差值的绝对值。在本实施例中,通过第一次拉伸没有达到指定帧的拉伸要求,没有得到第二帧长度的帧,需要继续拉伸,但是,此次拉伸的目标长度会在第一次拉伸的基础上较少,可以为第二差值和基音周期的差值的绝对值,将该绝对值作为第三目标拉伸长度,进行第二次拉伸,得到该指定帧最终的第二帧长度。
在根据本实施例的可选实施方式中,依据第二目标拉伸或压缩长度和基音周期计算得到调整参数包括:
S21,将第二目标拉伸或压缩长度除以基音周期得到商值;
S22,比较得到的商值和1的大小;
S23,若商值大于或等于1,将小于或等于商值的最大正整数作为调整基数;若商值小于1,则将1作为调整基数;
S24,将基音周期和调整基数的乘积设置为调整参数。
在本实施例中,以第二目标拉伸长度为160点,进行举例说明,如果基音周期为50点,则通过计算得到商值为3.2,大于或等于1,则采用第一套算法,先得到小于或等于3.2的最大正整数3,将该最大正整数和基音周期相 乘得到调整参数150;如果基音周期为200点,则通过计算得到商值为0.8,小于1,则采用另外一套算法,直接将1和基音周期相乘得到调整参数200。
在根据本实施例的可选实施方式中,在步骤106将基音周期和调整基数的乘积设置为调整参数之后,还可以包括:
S31,比较调整参数和第一帧长度的大小;
S32,若调整参数大于第一帧长度,则用第一帧长度更新调整参数。这里,更新调整参数可以包括将第一帧长度作为调整参数。
在本实施例中,可能会因为调整参数过大而导致无法对指定帧进行调整,或者调整效果不好的问题,此时就需要调节调整参数,可根据当前指定帧的第一帧长度来调节,如调整参数为150点,第一帧长度为120点,通比较发现调整参数的长度大于第一帧的长度,则将120更新为调整参数。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM或RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明实施例的方法。
在本实施例中还提供了一种语音数据的调整装置,该装置可设置在可以处理或传输语音数据的设备中,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置可以以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的;
需要说明的是,本发明实施例,可以设置于连接在声音采集装置后面的独立装置,也可以是设置在声音采集装置内部,无论设置形式如何,接收到语音数据时,可以采用本发明实施例方法进行处理。
本发明实施例还提供一种计算机存储介质,计算机存储介质中存储有计 算机可执行指令,计算机可执行指令用于执行上述语音数据的调整方法。
图2是本发明实施例的语音数据的调整装置的结构框图,如图2所示,该装置包括:获取模块20、第一计算模块22、第二计算模块24、处理模块26,其中,
获取模块20,设置为获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度,其中,指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;
在本实施例中,指定帧可以为整个待处理语音数据中的任意帧,在刚开始处理语音数据时,指定帧为语音数据中按序排列的第一帧,指定帧的参数信息即表示该指定帧自身参数的信息,如,基音周期、第一帧长度、第一修正值,其中第一帧长度表示该指定帧的帧长,第一修正值表示该指定帧帧长的可计算误差,每一帧的修正值在调整前可以默认为0,每一帧的修正值在整个语音数据的帧间可以传递,第一目标拉伸或压缩长度表示该指定帧需要拉伸或压缩的长度,可以预先设置或通过计算得出,在本实施例中,帧长,修正值和基音周期的单位以本领域使用广泛的单位“点”来表示。
第一计算模块22,与获取模块20耦合连接,设置为计算第一目标拉伸或压缩长度和第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;
可选的,第二目标拉伸或压缩长度表示考虑到修正值后该指定帧的实际需要拉伸或压缩的长度,如第一目标拉伸长度为100点,第一修正值为-20点,则通过计算可以得出实际只需要拉伸80点,由于每帧拉伸或压缩的长度与指定帧的自身参数相关,只能以基音周期的长度为单位进行调整,在每一帧调整过程中,会产生误差,通过将上一帧的误差通过修正值的方式传递到下一帧,可以有效地将整个语音数据的调整误差降到最小值。
第二计算模块24,与第一计算模块22耦合连接,设置为依据获得的第二目标拉伸或压缩长度和基音周期计算得到调整参数,其中,调整参数用于指示对指定帧进行拉伸或压缩的长度;
指定帧的调整参数根据指定帧的调整类型确定,可以与进行拉伸还是压 缩、拉伸或压缩的长度、指定帧的第一帧长度等相关,在指定帧通过计算可以只用调整一次就能实现目标调整时,调整参数表示本次拉伸或压缩的长度,而当指定帧需要拉伸或压缩多次才能实现目标调整时,调整参数则表示拉伸或压缩的次数,每一次需要拉伸或压缩的长度。
处理模块26,与第二计算模块24耦合连接,设置为依据调整参数和指定帧的参数信息对指定帧的长度进行调整。
更新模块28,设置为处理模块26对指定帧的长度进行调整时,确定调整指定帧的长度的第二帧长度和第二修正值,并根据第二修正值更新执行拉伸或压缩操作的指定帧的下一帧的修正值。
可选的,第二帧长度为该指定帧调整后的长度,第二修正值表示该指定帧的调整误差,通过将上一帧的调整误差以修正值的方式在帧间将其传递到下一帧,解决了相关技术中以整个语音数据为单位进行调整时误差较大的技术问题。
图3是本发明实施例的语音数据的调整装置的可选结构框图一,如图3所示,该装置除包括图2所示的所有模块外,处理模块26还包括:第一调整单元30、第一计算单元32、判断单元34、确定单元36,其中,
第一调整单元30,设置为根据第一帧长度和第二目标拉伸或压缩长度对指定帧进行调整到第一子帧长度;
第一计算单元32,与第一调整单元30耦合连接,设置为计算第一子帧长度减去第一帧长度得到第一差值;
判断单元34,与第一计算单元32耦合连接,设置为计算第一目标拉伸或压缩长度减去第一差值得到第二差值,判断所述第二差值是否大于0;
确定单元36,与判断单元34耦合连接,判断单元34判断得到的第二差值小于或等于0时,确定第一子帧长度为第二帧长度。
在本实施例中,当通过第一帧长度和第二目标长度对指定帧第一次调整得到第一子帧长度后,如果第一子帧的长度和目标要拉伸或压缩到的长度的差值过大,则需要再进行再次拉伸处理,可以通过先计算得到第一次拉伸长度,然后再判断第一目标拉伸长度减去第一次拉伸长度的差值,如果差值小 于或等于0,如果第一次拉伸长度达到了或超过了第一目标拉伸长度,则将第一次拉伸的结果作为该指定帧的拉伸结果,并继续调整下一帧。
图4是本发明实施例的语音数据的调整装置的可选结构框图二,如图4所示,该装置除包括图3所示的所有模块外,处理模块26还包括:第三调整单元40,与判断单元34耦合连接,设置为在判断单元34判断得到的第二差值大于0时,根据第一子帧长度和第三目标拉伸长度对第一子帧长度对应的帧进行调整得到第二帧长度,其中,第三目标拉伸长度为第二差值和基音周期的差值的绝对值。
在本实施例中,通过第一次拉伸没有达到指定帧的拉伸要求,没有得到第二帧长度的帧,需要继续拉伸,但是,此次拉伸的目标长度会在第一次拉伸的基础上较少,可以为第二差值和基音周期的差值的绝对值,将该绝对值作为第三目标拉伸长度,进行第二次拉伸,得到该指定帧最终的第二帧长度。
图5是本发明实施例的语音数据的调整装置的可选结构框图三,如图5所示,该装置除包括图2所示的所有模块外,第二计算模块24包括:第二计算单元50,设置为将第二目标拉伸或压缩长度除以基音周期得到商值;第一比较单元52,设置为比较得到的商值和1的大小;第一设置单元54,设置为若商值大于或等于1,将小于或等于商值的最大正整数设置为调整基数;或,若商值小于1,则将1设置为调整基数;第二设置单元56,设置为将基音周期和调整基数的乘积设置为调整参数。
在本实施例中,以第二目标拉伸长度为160点,进行举例说明,如果基音周期为50点,则通过计算得到商值为3.2,大于或等于1,则采用第一套算法,先得到小于或等于3.2的最大正整数3,将该最大正整数和基音周期相乘得到调整参数150;如果基音周期为200点,则通过计算得到商值为0.8,小于1,则采用另外一套算法,直接将1和基音周期相乘得到调整参数200。
图6是本发明实施例的语音数据的调整装置的可选结构框图四,如图6所示,第二计算模块还包括:第二比较单元60,设置为在第二设置单元56将基音周期和调整基数的乘积设置为调整参数之后,比较调整参数和第一帧长度的大小;更新单元62,设置为若调整参数大于第一帧长度,则用第一帧长度更新调整参数。
在本实施例中,可能会因为调整参数过大而导致无法对指定帧进行调整,或者调整效果不好的问题,此时就需要调节调整参数,可根据当前指定帧的第一帧长度来调节,如调整参数为150点,第一帧长度为120点,通比较发现调整参数的长度大于第一帧的长度,则将120更新为调整参数。
下面结合根据本发明的可选实施例,结合不同的调整情况,对本方案进行说明。
图7是本发明可选实施例的调整语音数据的流程示意图,如图7所示,开始流程后,输入待处理的语音数据到缓存中,判断是否需要调整语音数据如果不是需要调整的语音数据,则结束,如果是需要调整语音数据,则计算获得基音周期和拉伸/压缩的参数,并进行拉伸/压缩,最后输出调整后的语音数据。
需要说明的是,本发明实施例判断是否需要调整语音数据,可以包括两种,一个主要是声音变速播放,还有一个网络丢包、延时的一个补偿,可以通过这两个方法判断是否需要进行调整语音数据,判断是否需要进行语音数据调整为本领域技术人员的惯用技术手段,在此不再赘述。
为了便于理解,下面可选的拉伸示例和压缩示例中,使用业界广泛使用的专有名词,其中,
PitchTime:基音周期;
FrameTag:目标点数,即需要拉伸/压缩的点数;(相当于第一目标拉伸/压缩长度)
TagRES:目标点数修正值,在帧间传递信息;(相当于修正值)
OptLength:本次拉伸/压缩的点数;(相当于调整参数)
DataLength:当前数据长度;(相当于第一帧长度)
OptRatio:拉伸/压缩比例。(可通过比例计算得到FrameTag)
图8是本发明可选实施例的拉伸操作流程图,如图8所示,包括:
S71、计算得到信号的基音周期PitchTime;
S72、根据需要拉伸比例OptRatio及帧间信息(如拉伸点数修正值TagRES) 计算本帧目标拉伸点数FrameTag,此时,当前数据长度DataLength为本帧的数据长度FrameLength;
S73、根据PitchTime,和FrameTag计算得到本次拉伸点数OptLength;
S74、如果计算的到OptLength过大(大于等于原始数据长度)或过小(小于等于0),那么需要用基音周期PitchTime对OptLength修正;
上述步骤72~74可以概括为:计算/更新拉伸语音数据的相关参数;
S75、根据DataLength和OptLength对语音数据进行帧扩展;
S76、判断是否满足拉伸要求;包括:用DataLength加上OptLength更新DataLength。用FrameTag减去OptLength来得到新的FrameTag。如果FrameTag小于等于0,则拉伸结束,否则循环上述操作S73~S76直到拉伸或压缩结束;
S77、更新语音相关信息,包括:用本帧的拉伸结果与预期结果的偏差值来修正帧间信息如TagRES。
拉伸实例1:
图9是本发明可选实施例的拉伸示意图一,如图9所示,说明了基音周期为100、帧长为160的信号拉伸出160个点信号的情况。
根据输入语音获得语音相关信息:TagRES=0,PitchTime=100,FrameTag=160,DataLength=160。
依据TagRES更新FrameTag=160,再根据PitchTime和FrameTag计算得到OptLength=100。
接着对数据进行第一次帧扩展。因为此时OptLength大于整个序列长度DataLength的一半,所以取两段分别位于源数据首部和尾部的数据长度为60个点的做平滑。即首先将语音数据的原始数据s的第1个到第100个点复制到拉伸后语音s′的第1个到第100个点。然后将原始数据s的第1个到第60个点和第101个点到第160个点做平滑后得放到拉伸后语音s′的第101个到第160个点。然后将原始数据s的第61个到第160个点直接复制到拉伸后语音s′的第161个到第260个点。
第一次帧扩展结束后,DataLength=260。FrameTag=60。
因为FrameTag大于0,没有达到拉伸要求,所以需要进行第二次帧扩展。
由FrameTag和PitchTime得到OptLength=100。
然后对数据进行第二次帧扩展。此时OptLength小于整个序列长度DataLength的一半,故将源数据首部开始的连续两段长度为OptLength的数据做平滑。即首先将拉伸后语音s′的第1个到第100个点复制到二次拉伸后语音s″的第1个到第100个点。然后将拉伸后语音s′的第1个到第100个点和第101个点到第200个点做平滑后得放到二次拉伸后语音s″的第101个到第200个点。最后将原始数据s′的第101个到第260个点直接拷贝到拉伸后语音s″的第200点之后。
第二次帧扩展结束后,DataLength=360。FrameTag=-40。
因为FrameTag小于等于0,所以不需要继续进行帧扩展了。
最后更新TagRES=-40。
可以发现最终拉伸后序列的长度是360,不是我们的想要的320,多拉伸了40个样点,但是TagRES已经记录下来。
拉伸实例2:
图10是本发明可选实施例的拉伸示意图二,如图10所示,该实例中,表示了基音周期为40、帧长为160的信号拉伸出150个点信号的情况。
根据输入语音获得语音相关信息:TagRES=-40,PitchTime=40,FrameTag=150,DataLength=160。
首先依据TagRES更新FrameTag=110,再根据PitchTime和FrameTag计算得到OptLength=80。
接着对数据进行第一次帧扩展,因为此时OptLength等于整个序列长度DataLength的一半,故将源数据首部开始的连续两段长度为OptLength的数据做平滑。即首先将原始数据s的第1个到第80个点复制到拉伸后语音s′的第1个到第80个点。然后将语音数据s的第1个到第80个点和第81个点到第160个点做平滑后得放到第一次拉伸后语音s′的第81个到第160个点。最后将原始数据s的第81个到第160个点直接拷贝到第一拉伸后语音s′的第160点之后。
第一次帧扩展结束后,DataLength=240。FrameTag=30。
因为FrameTag大于0,没有达到拉伸要求,所以需要进行第二次帧扩展。
由FrameTag和PitchTime得到OptLength=0,因为OptLength至少等于PitchTime,所以OptLength=40。
然后对数据进行第二次帧扩展。此时OptLength小于整个序列长度DataLength的一半,故将源数据首部开始的连续两段长度为OptLength的数据做平滑。即首先将原始数据s的第1个到第40个点复制到第一拉伸后语音s′的第1个到第40个点。然后将语音数据s的第1个到第40个点和第41个点到第80个点做平滑后得放到第一拉伸后语音s′的第41个到第80个点。最后将原始数据s的第41个到第240个点直接拷贝到拉伸后语音s′的第80点之后。
第二次帧扩展结束后,DataLength=280。FrameTag=-10。
因为FrameTag小于等于0,所以不需要继续进行帧扩展了。
最后更新TagRES=-10。
本实例是紧跟拉伸实例1后一帧的信号拉伸情况。在拉伸实例1中,拉伸前序列的长度为160,需要拉伸160个点,实际拉伸后序列长度为360,在本实例中拉伸前序列长度为160,
需要拉伸出150个点,但实际拉伸后序列长度为280。
两次合并计算后,累计需要拉伸310个点,而实际拉伸后为360+280=640点,实际拉伸了320点,从整体上把控了拉伸/压缩比例。
图11是本发明可选实施例的压缩操作流程图。如图11所示,包括:
S81、获取语音相关信息;包括:计算得到信号的基音周期PitchTime;
步骤82~84为:计算压缩相关参数;其中,
S82、根据需要压缩比例OptRatio及帧间信息(如压缩点数修正值TagRES)计算本帧目标压缩点数FrameTag,此时当前数据长度DataLength为本帧的数据长度FrameLength;
S83、根据PitchTime,和FrameTag计算得到本次压缩点数OptLength。
S84、如果计算的到OptLength过大(如大于等于原始数据长度)或过小(如小 于0),那么需要用PitchTime对OptLength修正。
S85、对语音数据进行帧压缩;包括根据DataLength和OptLength对语音数据进行帧压缩;
S86、更新语音相关信息;包括:用本帧压缩结果与预期结果的偏差值来修正帧间信息如TagRES。
压缩实例1:
图12是本发明可选实施例的在不同基音周期下进行压缩的三种示意图,如题12所示,分别表示基音周期为40、60、100时的三种压缩示意图,其中,in表示原始数据,即处理前的数据,out表示压缩后的数据。
根据输入语音获得语音相关信息:TagRES=0,FrameTag=80,DataLength=160。
可选的,基音周期PitchTime为40时,可以计算得到OptLength=80。
接着对数据进行帧压缩。因为OptLength刚好等于整个序列长度DataLength的一半,所以将源数据前一半和后一半做平滑即可。即原始数据in1的第1个到第80个点和第81个点到第160个点做平滑后得到压缩后的语音out1。
帧压缩后,DataLength=80,TagRES=FrameTag-OptLength=0。
可选的,基音周期PitchTime为60时,可以计算得到OptLength=60。
接着对数据进行帧压缩。此时OptLength小于整个原始序列长度DataLength的一半,故将源数据首部开始的连续两段长度为OptLength的数据做平滑,然后将剩余的数据直接拷贝到平滑后的数据后面即可。即原始数据in2的第1个到第60个点和第61个点到第120个点做平滑后得放到压缩后的语音out2的第1个到第60个点。然后将原始数据in2的第121个到第160个点直接拷贝到语音out2的第60点之后。
帧压缩后,DataLength=100,TagRES=FrameTag-OptLength=20。
可选的,基音周期PitchTime为100时,可以计算得到OptLength=100。
接着对数据进行帧压缩。因为此时OptLength大于整个原始序列长度DataLength的一半,所以取两段分别位于源数据首部和尾部的数据长度为60 个点的做平滑即可。即原始数据in3的第1个到第60个点和第101个点到第160个点做平滑后得放到压缩后的语音out3的第1个到第60个点。然后将原始数据in3的第61个到第100个点直接舍弃。
帧压缩后,DataLength=60,TagRES=FrameTag-OptLength=-20。
需要说明的是,上述的模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,获取待处理的语音数据中指定帧的参数信息,以及指定帧的第一目标拉伸或压缩长度;
S2,计算第一目标拉伸或压缩长度和第一修正值的和得到第二目标拉伸或压缩长度;
S3,依据第二目标拉伸或压缩长度和基音周期计算得到调整参数;
S4,依据调整参数对指定帧的长度进行调整。
显然,本领域的技术人员应该明白,上述的本发明的模块或步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
本领域普通技术人员还可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件(例如处理器)完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的每个模块/单元可以采用硬件的形式实现,例如通过集成电路来实现其相应功 能,也可以采用软件功能模块的形式实现,例如通过处理器执行存储于存储器中的程序/指令来实现其相应功能。本发明不限制于任何特定形式的硬件和软件的结合。
虽然本申请所揭露的实施方式如上,但所述的内容仅为便于理解本申请而采用的实施方式,并非用以限定本申请,如本发明实施方式中的具体的实现方法。任何本申请所属领域内的技术人员,在不脱离本申请所揭露的精神和范围的前提下,可以在实施的形式及细节上进行任何的修改与变化,但本申请的专利保护范围,仍须以所附的权利要求书所界定的范围为准。
工业实用性
上述技术方案对每帧语音数据实现了实时调整,提高了语音数据的信号质量。

Claims (10)

  1. 一种语音数据的调整方法,包括:
    获取待处理的语音数据中指定帧的参数信息,以及所述指定帧的第一目标拉伸或压缩长度,其中,所述指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;
    计算所述第一目标拉伸或压缩长度和所述第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;
    依据获得的所述第二目标拉伸或压缩长度和所述基音周期计算得到调整参数;
    依据所述调整参数和指定帧的参数信息对所述指定帧的长度进行调整;
    确定调整所述指定帧的长度的第二帧长度和第二修正值,并根据所述第二修正值更新执行拉伸或压缩操作的所述指定帧的下一帧的修正值。
  2. 根据权利要求1所述的调整方法,其中,所述依据调整参数和指定帧的参数信息对所述指定帧的长度进行调整包括:
    根据所述第一帧长度和所述第二目标拉伸或压缩长度对所述指定帧进行调整到第一子帧长度;
    计算所述第一子帧长度减去所述第一帧长度得到第一差值;
    计算所述第一目标拉伸长度减去所述第一差值得到第二差值,判断所述第二差值是否大于0;
    得到的所述第二差值小于或等于0时,确定所述第一子帧长度为所述第二帧长度;
    根据确定的所述第二帧长度调整所述指定帧的长度。
  3. 根据权利要求2所述的调整方法,所述调整方法还包括:
    得到的所述第二差值大于0时,根据所述第一子帧长度和第三目标拉伸长度对所述第一子帧长度对应的帧调整到所述第二帧长度,其中,所述第三目标拉伸长度为所述第二差值和所述基音周期的差值的绝对值。
  4. 根据权利要求1~3任一项所述的调整方法,其中,所述依据所述第二目标拉伸或压缩长度和所述基音周期计算得到调整参数包括:
    将所述第二目标拉伸或压缩长度除以所述基音周期得到商值;
    比较得到的所述商值和1的大小;
    若所述商值大于或等于1,将小于或等于所述商值的最大正整数作为调整基数;若所述商值小于1,则将1作为所述调整基数;
    将所述基音周期和所述调整基数的乘积设置为所述调整参数。
  5. 根据权利要求4所述的调整方法,在所述将所述基音周期和所述调整基数的乘积设置为所述调整参数之后,所述方法还包括:
    比较所述调整参数和所述第一帧长度的大小;
    若所述调整参数大于所述第一帧长度,则用所述第一帧长度更新所述调整参数。
  6. 一种语音数据的调整装置,所述调整装置包括:
    获取模块,设置为获取待处理的语音数据中指定帧的参数信息,以及所述指定帧的第一目标拉伸或压缩长度,其中,所述指定帧的参数信息包括:基音周期、第一帧长度、第一修正值;
    第一计算模块,设置为计算所述第一目标拉伸或压缩长度和所述第一修正值的和,将获得的和作为第二目标拉伸或压缩长度;
    第二计算模块,设置为依据获得的所述第二目标拉伸或压缩长度和所述基音周期计算得到调整参数;
    处理模块,设置为依据所述调整参数和指定帧的参数信息对所述指定帧的长度进行调整;
    更新模块,设置为确定调整所述指定帧的长度的第二帧长度和第二修正值,并根据所述第二修正值更新执行拉伸或压缩操作的所述指定帧的下一帧的修正值。
  7. 根据权利要求6所述的调整装置,其中,所述处理模块包括:
    第一调整单元,设置为根据所述第一帧长度和所述第二目标拉伸或压缩 长度对所述指定帧进行调整到第一子帧长度;
    第一计算单元,设置为计算所述第一子帧长度减去所述第一帧长度得到第一差值;
    判断单元,设置为计算所述第一目标拉伸或压缩长度减去所述第一差值得到第二差值,判断所述第二差值是否大于0;
    确定单元,设置为得到的所述第二差值小于或等于0时,确定所述第一子帧长度为所述第二帧长度;
    第二调整单元,设置为根据确定的所述第二帧长度调整所述指定帧的长度。
  8. 根据权利要求7所述的调整装置,所述处理模块还包括:
    第三调整单元,设置为在得到的所述第二差值大于0时,根据所述第一子帧长度和第三目标拉伸长度对所述第一子帧长度对应的帧调整到所述第二帧长度,其中,所述第三目标拉伸长度为所述第二差值和所述基音周期的差值的绝对值。
  9. 根据权利要求7或8所述的调整装置,其中,所述第二计算模块包括:
    第二计算单元,设置为将所述第二目标拉伸或压缩长度除以所述基音周期得到商值;
    第一比较单元,设置为比较得到的所述商值和1的大小;
    第一设置单元,设置为若所述商值大于或等于1,将小于或等于所述商值的最大正整数设置为调整基数;或,若所述商值小于1,则将1设置为所述调整基数;
    第二设置单元,设置为将所述基音周期和所述调整基数的乘积设置为所述调整参数。
  10. 根据权利要求9所述的调整装置,所述第二计算模块还包括:
    第二比较单元,设置为在所述将所述基音周期和所述调整基数的乘积设置为所述调整参数之后,比较所述调整参数和所述第一帧长度的大小;
    更新单元,设置为若所述调整参数大于所述第一帧长度,则用所述第一 帧长度更新所述调整参数。
PCT/CN2016/091618 2015-08-19 2016-07-25 一种语音数据的调整方法及装置 WO2017028658A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510511487.4A CN106469559B (zh) 2015-08-19 2015-08-19 语音数据的调整方法及装置
CN201510511487.4 2015-08-19

Publications (1)

Publication Number Publication Date
WO2017028658A1 true WO2017028658A1 (zh) 2017-02-23

Family

ID=58050855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/091618 WO2017028658A1 (zh) 2015-08-19 2016-07-25 一种语音数据的调整方法及装置

Country Status (2)

Country Link
CN (1) CN106469559B (zh)
WO (1) WO2017028658A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782050A (zh) * 2021-09-08 2021-12-10 浙江大华技术股份有限公司 声音变调方法、电子设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314335B (zh) * 2020-02-10 2021-10-08 腾讯科技(深圳)有限公司 数据传输方法、装置、终端、存储介质和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050058145A1 (en) * 2003-09-15 2005-03-17 Microsoft Corporation System and method for real-time jitter control and packet-loss concealment in an audio signal
CN101123571A (zh) * 2006-08-07 2008-02-13 北京三星通信技术研究有限公司 基于自适应抖动缓冲的调度策略的调整方法
CN101136234A (zh) * 2006-08-31 2008-03-05 广达电脑股份有限公司 用以估计音频文件的音频长度的方法及装置
CN101594186A (zh) * 2008-05-28 2009-12-02 华为技术有限公司 双通道信号编码中生成单通道信号的方法和装置
CN102419981A (zh) * 2011-11-02 2012-04-18 展讯通信(上海)有限公司 音频信号时间尺度和频率尺度缩放处理方法及设备
CN103200425A (zh) * 2013-03-29 2013-07-10 天脉聚源(北京)传媒科技有限公司 一种多媒体处理装置及方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016850B1 (en) * 2000-01-26 2006-03-21 At&T Corp. Method and apparatus for reducing access delay in discontinuous transmission packet telephony systems
JP3871657B2 (ja) * 2003-05-27 2007-01-24 株式会社東芝 話速変換装置、方法、及びそのプログラム
JP2006017900A (ja) * 2004-06-30 2006-01-19 Mitsubishi Electric Corp タイムストレッチ処理装置
CN100561577C (zh) * 2006-09-11 2009-11-18 北京中星微电子有限公司 声音信号的变速方法和系统
US8078456B2 (en) * 2007-06-06 2011-12-13 Broadcom Corporation Audio time scale modification algorithm for dynamic playback speed control
CN101290775B (zh) * 2008-06-25 2011-09-14 无锡中星微电子有限公司 一种快速实现语音信号变速的方法
CN101719371B (zh) * 2009-11-20 2012-04-04 安凯(广州)微电子技术有限公司 一种语音变速的方法
CN102117613B (zh) * 2009-12-31 2012-12-12 展讯通信(上海)有限公司 数字音频变速处理方法及其设备
CN102855884B (zh) * 2012-09-11 2014-08-13 中国人民解放军理工大学 基于短时连续非负矩阵分解的语音时长调整方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050058145A1 (en) * 2003-09-15 2005-03-17 Microsoft Corporation System and method for real-time jitter control and packet-loss concealment in an audio signal
CN101123571A (zh) * 2006-08-07 2008-02-13 北京三星通信技术研究有限公司 基于自适应抖动缓冲的调度策略的调整方法
CN101136234A (zh) * 2006-08-31 2008-03-05 广达电脑股份有限公司 用以估计音频文件的音频长度的方法及装置
CN101594186A (zh) * 2008-05-28 2009-12-02 华为技术有限公司 双通道信号编码中生成单通道信号的方法和装置
CN102419981A (zh) * 2011-11-02 2012-04-18 展讯通信(上海)有限公司 音频信号时间尺度和频率尺度缩放处理方法及设备
CN103200425A (zh) * 2013-03-29 2013-07-10 天脉聚源(北京)传媒科技有限公司 一种多媒体处理装置及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782050A (zh) * 2021-09-08 2021-12-10 浙江大华技术股份有限公司 声音变调方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN106469559B (zh) 2020-10-16
CN106469559A (zh) 2017-03-01

Similar Documents

Publication Publication Date Title
US11367453B2 (en) Apparatus and method for generating an error concealment signal using power compensation
US20070061135A1 (en) Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
RU2628159C2 (ru) Способ декодирования и устройство декодирования
MX2013010879A (es) Aparato y metodo de codificacion, y programa.
US11423913B2 (en) Apparatus and method for generating an error concealment signal using an adaptive noise estimation
RU2017144519A (ru) Оптимизированный масштабный коэффициент для расширения диапазона частот в декодере сигналов звуковой частоты
JP2020122980A (ja) 個別の符号帳情報についての個別の置き換えlpc表現を用いたエラー隠し信号を生成する装置及び方法
WO2017028658A1 (zh) 一种语音数据的调整方法及装置
RU2020103799A (ru) Способ и устройство кодирования параметра межканальной разности фаз
CN112420062B (zh) 一种音频信号处理方法及设备
JP2018072723A (ja) 音響処理方法および音響処理装置
JP6930089B2 (ja) 音響処理方法および音響処理装置
JP3754819B2 (ja) 音声通信方法及び音声通信装置
US11348596B2 (en) Voice processing method for processing voice signal representing voice, voice processing device for processing voice signal representing voice, and recording medium storing program for processing voice signal representing voice
JP2019070775A (ja) 信号解析装置、方法、及びプログラム
JP7092324B2 (ja) 声門流成分推定装置、プログラムおよび方法
JP3063088B2 (ja) 音声分析合成装置、音声分析装置及び音声合成装置
JP4872711B2 (ja) 雑音低減装置及び雑音低減方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836522

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16836522

Country of ref document: EP

Kind code of ref document: A1