WO2018040576A1 - 一种音频剪辑方法、装置、终端及计算机存储介质 - Google Patents

一种音频剪辑方法、装置、终端及计算机存储介质 Download PDF

Info

Publication number
WO2018040576A1
WO2018040576A1 PCT/CN2017/080702 CN2017080702W WO2018040576A1 WO 2018040576 A1 WO2018040576 A1 WO 2018040576A1 CN 2017080702 W CN2017080702 W CN 2017080702W WO 2018040576 A1 WO2018040576 A1 WO 2018040576A1
Authority
WO
WIPO (PCT)
Prior art keywords
adjusted
clip
audio
point
adjustment
Prior art date
Application number
PCT/CN2017/080702
Other languages
English (en)
French (fr)
Inventor
张海婷
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018040576A1 publication Critical patent/WO2018040576A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel

Definitions

  • the present disclosure relates to the field of audio editing, and more particularly to an audio editing method, apparatus, terminal, and computer storage medium.
  • the user selects the clip point directly at the terminal, and then performs audio clip according to the clip point.
  • the clipped audio obtained by the current clip mode is often not the ideal audio that the user desires.
  • Embodiments of the present disclosure are directed to providing an audio editing method, apparatus, terminal, and computer storage medium.
  • An embodiment of the present disclosure provides an audio editing method, including:
  • the clip audio is clipped by the final cut point.
  • An embodiment of the present disclosure further provides an audio editing device, including:
  • a clip point determining module to be adjusted configured to determine a clip point to be adjusted corresponding to the audio to be clipped
  • a clip point adjustment module configured to obtain an adjustment audio corresponding to the clip point to be adjusted, and adjust the clip point according to the adjusted audio to determine a final clip point;
  • a clipping module configured to clip the clip audio through the final cut point.
  • the embodiment of the present disclosure further provides a terminal, including: the audio editing device according to the embodiment of the present disclosure.
  • the embodiment of the present disclosure further provides a computer storage medium having stored therein computer executable instructions for executing an audio editing method according to an embodiment of the present disclosure.
  • the audio clipping method, the device, the terminal, and the computer storage medium provided by the embodiment of the present disclosure determine the to-be-adjusted clip point corresponding to the audio to be clipped; acquire the adjusted audio corresponding to the clip point to be adjusted, and adjust the clip point according to the adjusted audio, Determining the final clip point; the clip audio is clipped by the final clip point, that is, the clip point to be adjusted can be adjusted according to the adjusted audio corresponding to the clip point to be clipped of the to-be-edited audio, to obtain the final clip point, according to the final clip point.
  • the editing of the clip audio can avoid the occurrence of incomplete sentences or the presence of a silent period in the start and end of the clipped audio, and improve the quality of the audio clip, so that the audio obtained by the final clip is more in line with the user's needs, and the audio is improved. User experience.
  • FIG. 1 is a flowchart of an audio editing method according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of an audio clip point according to Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic diagram of another audio clip point according to Embodiment 1 of the present disclosure.
  • FIG. 4 is a flowchart of an audio editing method for performing one adjustment on a clip point according to Embodiment 2 of the present disclosure
  • FIG. 5 is a flowchart of an audio editing method for performing multiple adjustments on a clip point according to Embodiment 2 of the present disclosure
  • FIG. 6 is a flowchart of another audio editing method for performing multiple adjustments on a clip point according to Embodiment 2 of the present disclosure
  • FIG. 7 is a schematic diagram of an audio editing device according to Embodiment 3 of the present disclosure.
  • FIG. 8 is a schematic diagram of a terminal according to Embodiment 3 of the present disclosure.
  • the embodiment provides an audio editing method.
  • the method specifically includes:
  • Step S101 determining a to-be-adjusted clip point corresponding to the audio to be clipped
  • Step S102 Acquire an adjustment audio corresponding to the to-be-adjusted clip point, and perform adjustment according to the adjustment audio to adjust the clip point to determine a final clip point;
  • Step S103 the clip audio is clipped by the final cut point.
  • the to-be-adjusted clip point corresponding to the audio to be clipped in this embodiment may be an initial clip point corresponding to the audio to be clipped by the user, or may be an adjusted clip point obtained by adjusting the initial clip point.
  • determining the to-be-adjusted clip point corresponding to the audio to be clipped includes: accepting a trigger operation of the clip point of the user to edit the clip audio, determining the clip point to be adjusted; or accepting the position information of the clip point to be adjusted input by the user, determining The clip point to be adjusted; or the adjusted clip point can be automatically recognized. If the adjusted clip point needs further adjustment, the clip point is used as the clip point to be adjusted.
  • the to-be-adjusted clip point in this embodiment may be the clip start point to be adjusted, or may be the clip end point to be adjusted.
  • acquiring the adjusted audio corresponding to the to-be-adjusted clip point includes: acquiring audio in a preset area where the clip point to be adjusted is located; or acquiring a complete audio sentence in which the clip point to be adjusted is located .
  • the audio near the adjusted clip point can be analyzed to determine whether the clip to be adjusted needs to be adjusted.
  • the audio in the vicinity of the to-be-adjusted clip point may specifically be the audio in the preset area where the clip point to be adjusted is located. Referring to FIG.
  • the preset area where the clip point to be adjusted may be the CD area where the clip point to be adjusted is located, where CS 0 is smaller than AS 0 and S 0 -D is smaller than S 0 -B; otherwise CS 0 and S 0 -D may be the same. It can also be different, and it can be set as needed.
  • the adjusted audio corresponding to the audio to be edited in this embodiment may also be the complete audio statement of the to-be-adjusted clip point; for example, after determining the to-be-adjusted clip point, performing voice analysis on the audio to be clipped, and performing statement level
  • the segmentation determines a complete audio statement corresponding to the to-be-adjusted clip point, and uses the complete audio sentence as the adjusted audio corresponding to the to-be-adjusted clip point.
  • the clip to be adjusted is S 0
  • the corresponding complete audio statement may be C 1 -D 1 .
  • the sentence when performing complete audio sentence segmentation, since there are often significant changes of certain features at the boundary end of the sentence, the sentence may be delimited by detecting the feature change of the audio and combining the mute delay. Determining the boundary endpoint of the statement; the audio feature may specifically refer to feature information such as audio energy.
  • adjusting the edited clip point according to the adjusted audio, and determining the final clip point includes: calculating an evaluation value corresponding to the adjusted audio; comparing the evaluation value with the preset evaluation threshold, and adjusting the adjusted clip point according to the comparison result , get the final clip point.
  • the to-be-adjusted clip point is adjusted according to the adjusted audio corresponding to the clip point to be adjusted, and specifically, the adjustment of the clip point to be adjusted may be determined according to the magnitude relationship between the evaluation value corresponding to the adjusted audio and the preset evaluation threshold. the way.
  • the calculating the adjustment value corresponding to the audio in the embodiment includes: calculating an evaluation value according to the feature value of the adjusted audio, where the feature value includes at least one of a split ratio, an audio signal to noise ratio, a silence delay ratio, and a sound intensity ratio.
  • the evaluation value may be calculated according to at least one of the feature values such as the cut ratio of the adjusted audio, the audio signal to noise ratio, the silence delay ratio, and the sound intensity ratio.
  • the segmentation ratio refers to the adjustment audio of the to-be-adjusted clip point as the segmentation point of the to-be-adjusted clip point, and the portion included in the cropped audio accounts for the proportion of the entire adjusted audio, as shown in FIG.
  • the corresponding adjustment audio may be the latter adjustment audio, so the split ratio R 1 satisfies: 0 ⁇ R 1 ⁇ 1.
  • the audio signal-to-noise ratio is the ratio of the audio normal sound signal strength to the noise signal strength. When the signal-to-noise ratio is high, the noise is relatively small, and the audio is more likely to be retained. In this embodiment, the signal-to-noise ratio is uniform with other factors, and the ratio of the useful signal power to the total audio power can be taken.
  • the audio signal-to-noise ratio R 2 P useful / P total 1 where P is useful as useful signal power, P total 1 is the total audio power, 0 ⁇ R 2 ⁇ 1.
  • the sound intensity ratio is specifically adjusted for the sound intensity of the whole part of the audio to be clipped, and the closer the sound intensity of the two parts of the audio is, the more likely the adjusted audio is to be retained.
  • the adjustment audio in this embodiment may be the audio in the preset area corresponding to the to-be-adjusted clip point, or may be the complete audio statement in which the clip point is to be adjusted.
  • the size is used to set the importance of each feature value.
  • any number of the plurality of feature values may be selected for calculation, and the factor parameter corresponding to the unselected feature value may be set to 0.
  • the feature value of the adjusted audio may be other types of feature values in addition to the above four types, which may be specifically set as needed, which is not limited in this embodiment.
  • different evaluation value calculation methods and different evaluation value threshold values can be set to determine the adjustment method of the adjustment of the editing point.
  • the clip point to be adjusted is the start point of the clip corresponding to the audio to be adjusted
  • the clip point to be adjusted is adjusted according to the comparison result to obtain a final clip point, including: when the clip start point When the evaluation value of the corresponding adjusted audio is greater than the corresponding preset evaluation threshold, the starting point of the adjusted audio is used as the adjusted starting point of the editing; when the evaluation value of the adjusted audio corresponding to the starting point of the editing is less than or equal to the corresponding preset evaluation threshold, The end point of the adjusted audio is used as the starting point of the adjusted clip.
  • the adjusting the clip point to be adjusted according to the comparison result to obtain the final clip point, including: when the clip end point corresponding to the adjusted audio evaluation value is greater than its corresponding
  • the end point of the adjusted audio is used as the adjusted end point of the clip; when the evaluation value of the adjusted audio corresponding to the end point of the clip is less than or equal to its corresponding preset evaluation threshold, the starting point of the adjusted audio is used as the adjusted clip. end.
  • the to-be-adjusted clip point is the clip start point S 0 , it may specifically be that when R> ⁇ , the adjusted audio is retained, please refer to FIG.
  • the S 0 is adjusted to the s, and the s is used as the clip start point.
  • discard the adjustment audio specifically, adjust S 0 to e, use e as the starting point of the clip, and ⁇ is the preset evaluation value threshold corresponding to the starting point of the clip; if the to-be-adjusted clip point is a clip At the end point E 0 , when R> ⁇ , the adjustment audio is retained, specifically, E 0 is adjusted to m, m is used as the clip end point; when R ⁇ , the adjustment audio is discarded, specifically E 0 Adjust to f, use f as the end point of the clip, and ⁇ is the preset evaluation value threshold corresponding to the end point of the clip.
  • the preset evaluation thresholds corresponding to the clip start point and the clip end point in this embodiment may be the same or different; that is, the preset evaluation thresholds of the clip points to be adjusted may be set according to requirements, and may be set. For the same value, it can also be set to a different value.
  • the adjusting the audio to be adjusted according to the adjusted audio to determine the final clip point includes: determining, according to the length of the audio to be clipped, the number of adjustments corresponding to the clip point to be adjusted, and adjusting the clip point according to the number of adjustments , get the final clip point; or get the adjusted audio corresponding to the clip point adjusted to adjust the clip point, adjust the adjusted clip point until the previous adjusted clip point and the adjustment of the clip point to be adjusted this time When the directions do not match, the clip point adjustment is ended, and the previously adjusted clip point is used as the final clip point.
  • Adjusting the to-be-adjusted clip point to obtain the final clip point may be performing N adjustments on the to-be-adjusted clip point, and using the N-adjusted clip point as the final clip point, where N is the preset number of adjustments.
  • the preset number of adjustments may be set according to the size of the audio to be clipped. For the audio to be clipped for a short period of time, because the positioning of the clip point is relatively accurate, the value of the adjustment number may be relatively small, such as Once, or twice; when the duration of the audio to be clipped is long, the positioning of the clip point is relatively inaccurate, and the adjustment can be made several times correspondingly, so that the audio obtained after the clip can be more in line with the user's needs.
  • the adjustment of the to-be-adjusted clip point may specifically refer to an initially determined initial clip point.
  • the audio editing method provided in this embodiment further includes: when the ratio of the adjusted length of the adjusted audio to the duration of the audio to be edited reaches a preset adjustment ratio threshold, the adjustment of the clip point is stopped, and the previously determined clip point is determined.
  • the adjusted adjustment audio includes the adjusted adjustment audio corresponding to the clip start point and/or the adjusted adjustment audio corresponding to the clip end point. If the to-be-adjusted clip point of the to-be-edited audio is continuously adjusted a plurality of times, such as performing multiple consecutive adjustments in the same direction, the ratio of the length of the adjusted adjusted audio to the duration of the audio to be clipped may be calculated, the ratio Specifically, the ratio is adjusted.
  • the adjustment of the to-be-adjusted clip point of the audio to be edited may be stopped, and the previously determined clip point is used as the final clip point; if the adjustment ratio is not reached By preset the adjustment threshold, you can continue to adjust the clips to be adjusted according to the normal adjustment method.
  • the clip point may be a clip start point or a clip end point, and the adjusted adjustment audio may include the adjusted adjustment audio corresponding to the clip start point, and may also include the adjusted adjustment audio corresponding to the clip end point.
  • the audio editing method provided in this embodiment determines the final adjustment point by determining the to-be-adjusted clip point corresponding to the audio to be clipped, and adjusts the adjusted edit point according to the adjusted audio to determine the final clip point;
  • the clip audio is clipped, that is, compared with the prior art, the clip to be adjusted can be adjusted according to the adjusted audio corresponding to the clip to be clipped of the to-be-edited audio, and the final clip point is obtained, and the clip is treated according to the final clip point.
  • Audio editing can avoid the occurrence of incomplete sentences in the starting and ending positions of the clipped audio, or the occurrence of silent periods, etc., which improves the quality of the audio clips, so that the audio obtained by the final clip is more in line with the user's needs, and the user's Experience.
  • the embodiment provides The audio editing method divides the editing audio by voice technology, and then adjusts the editing point according to the adjusted audio obtained by segmentation, that is, performs segmentation analysis on the sentence where the editing point is located, and determines whether to retain the partial statement or discard the part. In this way, the audio obtained by the clip is more in line with the user's needs.
  • the embodiment provides an audio editing method. After determining a clip point, it is mainly determined by analyzing an audio sentence in which the clip point is located, determining whether the clip point needs to be adjusted, determining a final clip point, and treating the clip according to the final clip point. The audio is clipped to get the audio you need.
  • a method for performing an initial adjustment of the initial clip point to obtain a final clip point includes:
  • step S401 an initial clip point is determined.
  • the determination of the initial cut point in this embodiment may be based on the user's drag operation on the cut point on the cut interface to determine the initial cut point.
  • the initial cut point includes an initial cut start point S 0 and an initial cut end point E 0 .
  • Step S402 segmenting the complete audio sentence corresponding to the initial cut point.
  • the speech analysis and segmentation of the clip audio by speech technology is specifically performed by segmenting the statement level to determine the complete audio statement where the initial clip point is located.
  • the complete audio statement corresponding to the initial clip start point may be Se
  • the complete audio statement corresponding to the initial clip end point can be fm.
  • step S403 it is judged whether the evaluation value is greater than the preset evaluation threshold, if it is greater, the process goes to step S404; otherwise, the process goes to step S405.
  • the audio information is evaluated to determine whether to retain the complete audio sentence, specifically, the evaluation value of the initial clip point is calculated, and the evaluation value is compared with the corresponding preset evaluation threshold to determine Whether to keep the complete audio statement to the cut In the audio after the series. If the evaluation value is greater than the preset evaluation threshold, then the process goes to step S404; if the evaluation value is less than or equal to the preset evaluation threshold, then the process goes to step S405.
  • step S404 the complete audio statement is retained, and the process proceeds to step S406.
  • retaining the complete audio statement specifically includes: if the complete audio statement corresponding to the initial clip start point is retained, adjusting S 0 to s; if the complete audio statement corresponding to the initial clip end point is retained, adjusting E 0 to m.
  • step S405 the complete audio sentence is discarded, and the process proceeds to step S406.
  • discarding the complete audio statement specifically includes: if the complete audio statement corresponding to the initial clip start point is discarded, adjusting S 0 to e; if the complete audio statement corresponding to the initial clip start point is retained, adjusting E 0 to f.
  • Step S406 an audio clip is performed according to the final cut point.
  • the clip audio is clipped according to the final clip point.
  • the clip points in this embodiment include the clip start point and the clip end point, that is, the audio between the final clip start point and the final clip end point is clipped to obtain the final clip. After the audio. In the above manner, the obtained clipped audio has relatively higher integrity of the audio sentence, and is more in line with the user's needs.
  • the embodiment further provides a method for performing an audio clip on the initial edit point for multiple adjustments.
  • the method specifically includes:
  • Step S501 determining a clip point
  • Step S502 performing a segmentation on the complete audio sentence corresponding to the cut point
  • Step S503 it is determined whether the evaluation value is greater than the preset evaluation threshold; if it is greater, then the process proceeds to step S504; otherwise, the process proceeds to step S505;
  • Step S504 retain the complete audio statement, and jump to step S502;
  • Step S505 discard the complete audio statement, and jump to step S506;
  • Step S506 an audio clip is performed according to the final cut point.
  • the complete audio sentence corresponding to the adjusted clip point may be obtained after the first adjusted clip point is obtained, and it is determined whether the pair needs to be The adjusted clip point is further adjusted.
  • the adjusted clip point in this embodiment includes at least one adjusted clip point, which may be repeated steps S502 to S504 until it is determined that the adjusted clip is not required.
  • the clip point that was last adjusted is determined as the adjusted clip point.
  • the clip audio is clipped according to the final clip point.
  • the clip points in this embodiment all include the clip start point and the clip end point, and the other portions are processed in the same manner as the one-time adjustment. In the above manner, the obtained clipped audio has relatively higher integrity of the audio sentence, and is more in line with the user's needs.
  • the calculation manner of the evaluation value corresponding to the clip point to be adjusted may be different, that is, the corresponding evaluation threshold setting corresponding to each clip point and its corresponding
  • the calculated eigenvalues of the evaluation values can be set by selecting different values.
  • Step S601 determining a clip point
  • Step S602 performing a segmentation on the complete audio sentence corresponding to the cut point
  • Step S603 it is determined whether the evaluation value of the complete audio statement is greater than the preset evaluation threshold, greater than the jump to step S604; otherwise, the process proceeds to step S605;
  • Step S604 retain the complete audio statement, and jump to step S606;
  • Step S605 discard the complete audio statement, and jump to step S606;
  • Step S606 whether to retain or discard whether it is the same as the previous time, if the same, go to step S607; if not, then go to step S608;
  • Step S607 whether the adjustment ratio is Q>Q threshold , if yes, go to step S608; if not, then go to step S602;
  • Step S608 an audio clip is performed according to the final cut point.
  • the Q threshold may specifically be one.
  • the editing method of the audio to be edited in this embodiment may only perform one editing of the audio to be edited, or may perform multiple editing, so that the audio sentence at the starting and ending points of the obtained audio after the editing is as complete as possible. For the user, the user experience can be improved.
  • the embodiment provides an audio editing device.
  • the method includes: a clip point determining module 71 to be adjusted, a clip point adjusting module 72 and a clip module 73, wherein the to-be-adjusted clip point determining module 71 is configured to determine Editing a clip point corresponding to the clip audio; the clip point adjustment module 72 is configured to acquire the adjusted audio corresponding to the clip point to be adjusted, and adjust the clip point according to the adjusted audio to determine the final clip point; the clip module 73 is configured to The clip audio is clipped by the final cut point.
  • the to-be-adjusted clip point determining module 71 determines the to-be-adjusted clip point corresponding to the audio to be clipped, and specifically may be a trigger operation of accepting a clip point of the user to edit the clip audio, determining the clip point to be adjusted; or accepting the user
  • the position information of the to-be-adjusted clip point is input to determine the clip point to be adjusted; or the adjusted clip point is automatically recognized. If the adjusted clip point needs further adjustment, the clip point is used as the clip point to be adjusted.
  • the to-be-adjusted clip point in this embodiment may be the clip start point to be adjusted, or may be the clip end point to be adjusted.
  • the clip point adjustment module 72 is configured to acquire audio in a preset area where the clip point to be adjusted is located; or acquire a complete audio sentence in which the clip point to be adjusted is located.
  • the clip point adjustment module is configured to calculate an evaluation value corresponding to the adjusted audio; compare the evaluation value with the preset evaluation threshold, and adjust the adjusted clip point according to the comparison result to obtain a final clip point.
  • the clip point adjustment module is configured to calculate the evaluation value according to the feature value of the adjusted audio, where the feature value includes at least a ratio of a split ratio, an audio signal to noise ratio, a silence delay ratio, and a sound intensity ratio.
  • the clip point adjustment module 72 is configured to be When the clip point is the start point of the clip corresponding to the audio to be adjusted, when the evaluation value of the adjusted audio corresponding to the start point of the clip is greater than its corresponding preset evaluation threshold, the starting point of the adjusted audio is used as the adjusted start point of the clip; When the evaluation value of the adjustment audio corresponding to the start point of the clip is less than or equal to its corresponding preset evaluation threshold, the end point of the adjusted audio is used as the adjusted clip start point.
  • the clip point adjustment module 72 is further configured to: when the clip point to be adjusted is the clip end point corresponding to the to-be-adjusted audio, when the evaluation value of the adjusted audio corresponding to the clip end point is greater than its corresponding preset evaluation threshold The end point of the adjusted audio is used as the adjusted clip end point; when the evaluation value of the adjusted audio corresponding to the end point of the clip is less than or equal to its corresponding preset evaluation threshold, the starting point of the adjusted audio is used as the adjusted clip end point.
  • the clip point adjustment module 72 is configured to determine, according to the length of the audio to be clipped, the number of adjustments corresponding to the clip point to be adjusted, and adjust the clip point according to the number of adjustments to obtain a final clip point; or obtain a treat Adjusting the clip point to adjust the audio corresponding to the adjusted clip point, and adjusting the adjusted clip point until the previous adjusted clip point does not coincide with the adjustment direction of the clip point to be adjusted this time, and the clip point adjustment is ended.
  • the previously adjusted clip point is used as the final clip point.
  • the clip point adjustment module 72 is further configured to stop the adjustment of the clip point when the ratio of the adjusted duration of the adjusted audio to the duration of the audio to be clipped reaches a preset adjustment ratio threshold.
  • the determined clip point is determined as the final clip point; the adjusted adjustment audio includes the adjusted adjusted audio corresponding to the clip start point and/or the adjusted adjusted audio corresponding to the clip end point. If the to-be-adjusted clip point of the to-be-edited audio is continuously adjusted a plurality of times, such as performing multiple consecutive adjustments in the same direction, the ratio of the length of the adjusted adjusted audio to the duration of the audio to be clipped may be calculated, the ratio Specifically, the ratio is adjusted.
  • the edit point may be a clip start point or a clip end point, and the adjusted adjustment audio may include the adjusted adjustment audio corresponding to the clip start point, and may also include the adjusted adjustment audio corresponding to the clip end point.
  • the editing module 73 performs editing of the clip audio by the final clip point, specifically including storing the audio clip between the final clip start point and the final clip end point as the finally obtained clipped audio.
  • the clip point determination module 71, the clip point adjustment module 72 and the clip module 73 in the audio clip device can be used by the central processing unit (CPU, Central Processing) in the device in practical applications. Unit), Digital Signal Processor (DSP), Microcontroller Unit (MCU) or Field-Programmable Gate Array (FPGA).
  • CPU Central Processing
  • DSP Digital Signal Processor
  • MCU Microcontroller Unit
  • FPGA Field-Programmable Gate Array
  • the audio editing device determines the to-be-adjusted clip point corresponding to the audio to be clipped; acquires the adjusted audio corresponding to the clip point to be adjusted, adjusts the clip point to be adjusted according to the adjusted audio, determines the final clip point; passes the final clip point
  • the clip audio is clipped, that is, compared with the prior art, the clip to be adjusted can be adjusted according to the adjusted audio corresponding to the clip to be clipped of the to-be-edited audio, and the final clip point is obtained, and the clip is treated according to the final clip point.
  • Audio editing can avoid the occurrence of incomplete sentences in the starting and ending positions of the clipped audio, or the occurrence of silent periods, etc., which improves the quality of the audio clips, so that the audio obtained by the final clip is more in line with the user's needs, and the user's Experience.
  • the embodiment further provides a terminal.
  • the method further includes: the foregoing audio editing device.
  • the terminal provided in this embodiment can implement the adjustment of the clip point of the audio to be edited by the above-mentioned audio clip device, and the clip can obtain more reasonable clipped audio, so that the clipped audio is more in line with the user's needs and improves the user experience.
  • the method for adjusting the clip point of the clip audio to obtain a more reasonable clipped audio does not need to set a hardware accessory on the terminal, and changes the structure of the terminal, and can be applied to all terminals. And cost Low, good results.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present disclosure may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present disclosure is made essentially or prior to the prior art.
  • the contributed portion may be embodied in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the various aspects of the present disclosure. All or part of the methods described in the examples.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.
  • the technical solution of the embodiment of the present disclosure performs the editing of the clip audio by the final clip point, that is, the clip point to be adjusted can be adjusted according to the adjusted audio corresponding to the clip point to be clipped of the to-be-edited audio, to obtain the final clip point, according to the final
  • the editing point clips the clip audio, which avoids the occurrence of incomplete sentences in the start and end of the clipped audio, or the occurrence of a silent period, etc., which improves the quality of the audio clip, so that the audio obtained by the final clip is more in line with the user's needs. Improve the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

一种音频剪辑方法、装置、终端及计算机存储介质,剪辑方法包括确定待剪辑音频对应的待调整剪辑点(S101);获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点(S102);通过最终剪辑点对待剪辑音频进行剪辑(S103)。

Description

一种音频剪辑方法、装置、终端及计算机存储介质
相关申请的交叉引用
本申请基于申请号为201610804873.7、申请日为2016年09月05日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及音频剪辑领域,尤其涉及一种音频剪辑方法、装置、终端及计算机存储介质。
背景技术
当前在进行音频剪辑时,是由用户直接在终端进行剪辑点的选择,然后根据该剪辑点进行音频剪辑,通过当前这种剪辑方式得到的剪辑后的音频往往不是用户希望得到的理想音频。
发明内容
本公开实施例期望提供一种音频剪辑方法、装置、终端及计算机存储介质。
本公开实施例提供一种音频剪辑方法,包括:
确定待剪辑音频对应的待调整剪辑点;
获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;
通过最终剪辑点对待剪辑音频进行剪辑。
本公开实施例还提供一种音频剪辑装置,包括:
待调整剪辑点确定模块,配置为确定待剪辑音频对应的待调整剪辑点;
剪辑点调整模块,配置为获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;
剪辑模块,配置为通过最终剪辑点对待剪辑音频进行剪辑。
本公开实施例还提供一种终端,包括:本公开实施例所述的音频剪辑装置。
本公开实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,计算机可执行指令用于执行本公开实施例所述的音频剪辑方法。
本公开实施例提供的音频剪辑方法、装置、终端及计算机存储介质,通过确定待剪辑音频对应的待调整剪辑点;获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;通过最终剪辑点对待剪辑音频进行剪辑,即可以根据该待剪辑音频的待调整剪辑点对应的调整音频对该待调整剪辑点进行调整,得到最终剪辑点,根据该最终剪辑点对待剪辑音频进行剪辑,可以避免剪辑后的音频的起止位置存在不完整的语句、或存在静音时段等情况的发生,提高了音频剪辑的质量,使得最终剪辑得到的音频更加符合用户需求,提升了用户的体验。
附图说明
图1为本公开实施例一的音频剪辑方法流程图;
图2为本公开实施例一的音频剪辑点示意图;
图3为本公开实施例一的另一音频剪辑点示意图;
图4为本公开实施例二的对剪辑点进行一次调整的音频剪辑方法流程图;
图5为本公开实施例二的对剪辑点进行多次调整的音频剪辑方法流程图;
图6为本公开实施例二的另一种对剪辑点进行多次调整的音频剪辑方法流程图;
图7为本公开实施例三的音频剪辑装置示意图;
图8为本公开实施例三提供的终端示意图。
具体实施方式
下面通过具体实施方式结合附图对本公开实施例作进一步详细说明。
实施例一
本实施例提供了一种音频剪辑方法,请参见图1,具体包括:
步骤S101,确定待剪辑音频对应的待调整剪辑点;
步骤S102,获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;
步骤S103,通过最终剪辑点对待剪辑音频进行剪辑。
本实施例中的待剪辑音频对应的待调整剪辑点,具体可以是用户确定的待剪辑音频对应的初始剪辑点,也可以是对初始剪辑点进行调整后得到的调整后的剪辑点。对于步骤S101,确定待剪辑音频对应的待调整剪辑点,包括:接受用户对待剪辑音频的剪辑点的触发操作,确定该待调整剪辑点;或者接受用户输入的待调整剪辑点的位置信息,确定待调整剪辑点;也可以是自动识别调整后的剪辑点,若该调整后的剪辑点需要进行进一步调整,则将该剪辑点作为待调整剪辑点。另外,本实施例中的待调整剪辑点可以是待调整的剪辑起点,也可以是待调整的剪辑终点。
作为一种实施方式,本实施例中,获取所述待调整剪辑点对应的调整音频,包括:获取待调整剪辑点所在的预设区域内的音频;或获取待调整剪辑点所在的完整音频语句。为了使最终剪辑得到的音频更加符合用户的需求,可以对待调整剪辑点附近的音频进行分析,判断是否需要对该待调整剪辑点进行调整。该待调整剪辑点附近的音频具体可以是该待调整剪辑 点所在的预设区域内的音频,请参见图2,若该待剪辑音频为A-B,其中S0为其中一个待调整剪辑点,该待调整剪辑点所在的预设区域可以是该待调整剪辑点所在的C-D区域,其中,C-S0小于A-S0,S0-D小于S0-B;另外C-S0和S0-D可以相同,也可以不同,其可以根据需要具体设置。另外,本实施例中的待剪辑音频对应的调整音频也可以是该待调整剪辑点所在的完整音频语句;如可以在确定待调整剪辑点后,对该待剪辑音频进行语音分析,进行语句层面的切分,确定该待调整剪辑点对应的完整音频语句,将该完整音频语句作为该待调整剪辑点对应的调整音频。如图2所示,若待调整剪辑点为S0,则经过语音分析后,其对应的完整音频语句可以是C1-D1。本实施例中,在进行完整音频语句切分时,由于语句的边界端点处常存在某些特征的明显变化,所以具体可以是通过检测音频的特征变化和结合静音时延来对语句进行划界,确定语句的边界端点;该音频特征具体可以是指音频能量等特征信息。
本实施例中,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点,包括:计算调整音频对应的评估值;将评估值与预设评估阈值进行比较,根据比较结果对待调整剪辑点进行调整,得到最终剪辑点。本实施例中根据待调整剪辑点对应的调整音频对该待调整剪辑点进行调整,具体可以是根据该调整音频对应的评估值与预设评估阈值的大小关系确定对该待调整剪辑点的调整方式。
本实施例中的计算调整音频对应的评估值包括:根据调整音频的特征值计算评估值,特征值包括切分比、音频信噪比、静音时延比、音强比中的至少一个。可以根据该调整音频的切分比、音频信噪比、静音时延比、音强比等特征值中的至少一个计算该评估值。该切分比指待调整剪辑点所在调整音频以该待调整剪辑点为分割点,包含在裁剪音频中的部分占整个调整音频的比例,如图3所示,其中剪辑音频为A-B,该S0为待调整剪辑 点,其所在的调整音频为s-e,其中s-S0间的时长为a,S0-e间的时长为b,则切分比R1=b/(a+b),该切分比R1越大,该部分音频越应该被保存,即当待调整剪辑点是剪辑起点时,则可以将该部分音频的起点作为调整后的剪辑起点,当待调整剪辑点是剪辑终点时,则可以将该部分音频的终点作为调整后的剪辑终点;另外,其中一个极端情况是该待调整剪辑点正好处于两个调整音频中间,则该待调整剪辑点是剪辑起点时,其对应的调整语句可以是前一个调整音频语句,该待调整剪辑点是剪辑终点时,其对应的调整音频可以是后一个调整音频,因此切分比R1满足:0≤R1≤1。音频信噪比为是音频正常声音信号强度与噪声信号强度的比值,当信噪比高时,噪音比较小,此段音频更有可能被保留。本实施例中信噪比为了与其他因素相统一,可以取有用信号功率与全部音频功率的比值,该音频信噪比R2=P有用/P总1其中P有用为有用信号功率,P总1为全部音频功率,0<R2<1。静音时延比具体是指当前调整音频两端静音时间长度的比例,如图3所示,如果L1比L2长很多,那么当前调整音频更有可能与裁剪音频是一个整体,更有可能被保留。同样为与其他因素相统一,静音时延比R3满足:R3=((L1+L2)/(L1+L2)+1)/2,0<R3<1。音强比具体为此调整音频与待剪辑音频整个部分的音强对比,两部分音频的音强越相近,此调整音频越有可能被保留。音强比计算公式为R4=1-(P-P总2)/P,其中,P总2为当前待剪辑音频的音强,P为调整音频的音强,0<R4<1。本实施例中的调整音频可以是前述的待调整剪辑点对应的预设区域内的音频,也可以是待调整剪辑点所在的完整音频语句。
本实施例中根据上述调整音频的特征值计算评估值可以是根据上述特征值进行计算得到,该评估值R=K1R1+K2R2+K3R3+K4R4+……+KnRn,,其中,Kn为各特征值对应的因素参数,其中Kn≥0,n为大于零的正整数,Kn可以相同,也可以不相同,可以通过调整Kn的大小来设定每个特征值所占的重 要程度。本实施例中,在计算评估值时,可以从多个特征值中选择任意几项进行计算,可以将未选择的特征值对应的因素参数设置为0。该各项特征值取值越大评估值越大,此时该调整音频越可能被保留下来。需要理解的是,该调整音频的特征值除上述四种以外,也可以是其他类型的特征值,其可以根据需要具体设置,本实施例对此不作限定。在需要进行多次待调整剪辑点调整时,可以设置不同的评估值计算方式和不同的评估值阈值,决定对待调整剪辑点的调整方式。
作为一种实施方式,本实施例中当待调整剪辑点为待调整音频对应的剪辑起点时,所述根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点,包括:当剪辑起点对应的调整音频的评估值大于其对应的预设评估阈值时,将调整音频的起点作为调整后的剪辑起点;当剪辑起点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将调整音频的终点作为调整后的剪辑起点。当待调整剪辑点为待调整音频对应的剪辑终点时,所述根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点,包括:当剪辑终点对应的调整音频的评估值大于其对应的预设评估阈值时,将调整音频的终点作为调整后的剪辑终点;当剪辑终点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将调整音频的起点作为调整后的剪辑终点。当该待调整剪辑点为剪辑起点S0,则其具体可以是,当R>λ时,保留此调整音频,请参见图3,具体可以是将S0调整到s处,将s作为剪辑起点;当R≤λ时,舍弃该调整音频,具体可以是将S0调整到e处,将e作为剪辑起点,λ为该剪辑起点对应的预设评估值阈值;若该待调整剪辑点为剪辑终点E0,当R>λ时,保留此调整音频,则具体可以是将E0调整到m处,将m作为剪辑终点;当R≤λ时,舍弃该调整音频,具体可以是将E0调整到f处,将f作为剪辑终点,λ为该剪辑终点对应的预设评估值阈值。本实施例中的剪辑起点与剪辑终点对应的预设评估阈值可以是相同的,也 可以是不相同的;即各待调整剪辑点的预设评估阈值可以根据需要设置具体设置,可以将其设置为相同值,也可以设置为不同值。
本实施例中,所述根据调整音频对待调整剪辑点进行调整,确定最终剪辑点,包括:根据待剪辑音频的长度,确定待调整剪辑点对应的调整次数,对待调整剪辑点根据调整次数进行调整,得到最终剪辑点;或获取对待调整剪辑点进行调整后的剪辑点对应的调整音频,对调整后的剪辑点进行调整,直至前一次调整后的剪辑点与本次待调整的剪辑点的调整方向不一致时,结束剪辑点调整,将前一次调整后的剪辑点作为最终剪辑点。将该待调整剪辑点进行调整得到最终剪辑点可以是对该待调整剪辑点进行N次调整,将经过N次调整后的剪辑点作为最终剪辑点,其中N为预设的调整次数。该预设调整次数可以是根据该待剪辑的音频的大小进行设置,对于时长短的待剪辑音频由于其剪辑点的定位相对比较准确,则该调整次数的取值可以相对较小,如可以是一次,或者两次;当该待剪辑音频的时长较长时,对剪辑点的定位相对不是很准确,可以相应的多进行几次调整,以使剪辑后得到的音频能更符合用户需求。另外,也可以根据待调整剪辑点的调整方向的判断来确定对该待调整剪辑点的调整,再对剪辑点进行调整时,若其当前得出的进行调整的方向与前次进行调整的方向一致,则对该剪辑点进行调整,若不一致,则结束本次调整,将前一次调整的剪辑点作为最终剪辑点。该调整方向是否一致,具体可以是指保留调整音频还是舍弃调整音频。本实施例中的待调整剪辑点具体可以是指最初确定的初始剪辑点。
本实施例提供的音频剪辑方法,还包括:当已调整的调整音频的时长与待剪辑音频的时长的比值达到预设调整比例阈值,则停止对剪辑点的调整,将前一次确定的剪辑点作为最终剪辑点;已调整的调整音频包括剪辑起点对应的已调整的调整音频和/或剪辑终点对应的已调整的调整音频。若 待剪辑音频的待调整剪辑点连续进行了多次调整,如向同一个方向进行了多次连续调整,则可以计算已经调整的调整音频的时长与该待剪辑音频的时长的比值,该比值具体为调整比例,若该调整比例达到预设调整比例阈值,则可以停止对该待剪辑音频的待调整剪辑点的调整,将前一次确定的剪辑点作为最终剪辑点;若其调整比例未达到预设调整比例阈值,则可以继续按照正常的调整方式对其待调整剪辑点进行调整。该剪辑点可以是剪辑起点,也可以是剪辑终点,已调整的调整音频可以包括剪辑起点对应的已调整的调整音频,也可以包括剪辑终点对应的已调整的调整音频。本实施中的调整比例的计算公式为:Q=θ(T1+…+TP)/T,其中T为待调整音频的总时长,TP为SnEn相比于Sn-1En-1调整的音频时长,p为大于零的任意正整数,θ为与音频总时长有关的参数。如果音频持续保留或者舍弃,则需要通过判断调整比例Q是否已经达到其对应的调整比例阈值Q阈值,如果未达到则继续调整,如果达到,则停止对剪辑点的调整。
现有技术中通常是由用户经过预览后,在触摸屏上拖动剪裁位置确定剪辑点,然后直接根据该确定的剪辑点进行音频剪辑的方式,这种裁剪方式一般不能使得剪辑点所在的调整音频并不满足用户需求,如其所在的语句不是一句完整的语句,尤其是待剪辑音频时长较大时,这种误差更为明显。本实施例提供的音频剪辑方法,通过确定待剪辑音频对应的待调整剪辑点;获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;通过最终剪辑点对待剪辑音频进行剪辑,即相比于现有技术,可以根据该待剪辑音频的待调整剪辑点对应的调整音频对该待调整剪辑点进行调整,得到最终剪辑点,根据该最终剪辑点对待剪辑音频进行剪辑,可以避免剪辑后的音频的起止位置存在不完整的语句、或存在静音时段等情况的发生,提高了音频剪辑的质量,使得最终剪辑得到的音频更加符合用户需求,提高了用户的体验。另外,本实施例提供的 音频剪辑方法通过语音技术对待剪辑音频进行切分,进而根据切分得到的调整音频对剪辑点进行调整,即对剪辑点所在的语句进行切分分析,判断是要保留该部分语句还是舍弃该部分语句,通过这种方式使得剪辑得到的音频更加符合用户需求。
实施例二
本实施例提供一种音频剪辑方法,在确定剪辑点后,主要通过对剪辑点所在的音频语句进行分析,判断是否需要对该剪辑点进行调整,确定最终剪辑点,根据该最终剪辑点对待剪辑音频进行剪辑,得到需要的音频。
本实施例中,提供一种对初始剪辑点进行一次调整得到最终剪辑点的方法,具体可如图4所示,包括:
步骤S401,确定初始剪辑点。
本实施例中确定初始剪辑点可以是根据用户对剪辑界面上剪辑点的拖动操作确定最初的剪辑点,请参见图3,该初始剪辑点包括初始剪辑起点S0和初始剪辑终点E0
步骤S402,对初始剪辑点对应的完整音频语句进行切分。
通过语音技术对待剪辑音频进行语音分析与切分,其具体是进行语句层面的切分,确定该初始剪辑点所在的完整音频语句,如图3所示,初始剪辑起点对应的完整音频语句可以为s-e,初始剪辑终点对应的完整音频语句可以为f-m。
步骤S403,判断评估值是否大于预设评估阈值,大于则跳转到步骤S404;否则跳转到步骤S405。
得到初始剪辑点所在的完整音频语句后,进行音频信息评估,判断是否保留该完整音频语句,具体是计算该初始剪辑点的评估值,将其评估值与其对应的预设评估阈值进行比较,确定是否将该完整音频语句保留到剪 辑后的音频中。若其评估值大于预设评估阈值,则跳转到步骤S404;若其评估值小于等于预设评估阈值,则跳转到步骤S405。
步骤S404,保留该完整音频语句,跳转到步骤S406。
请参见图3,保留该完整音频语句具体包括:若保留初始剪辑起点对应的完整音频语句,则将S0调整至s处;若保留初始剪辑终点对应的完整音频语句,则将E0调整至m处。
步骤S405,舍弃该完整音频语句,跳转到步骤S406。
请参见图3,舍弃该完整音频语句具体包括:若舍弃初始剪辑起点对应的完整音频语句,则将S0调整至e处;若保留初始剪辑起点对应的完整音频语句,则将E0调整至f处。
步骤S406,根据最终剪辑点进行音频剪辑。
在确定最终剪辑点后,根据该最终剪辑点对待剪辑音频进行剪辑,本实施例中的剪辑点都包括剪辑起点和剪辑终点,即将最终剪辑起点与最终剪辑终点间的音频进行剪辑得到最终的剪辑后的音频。通过上述方式,得到的剪辑后的音频相对来说其音频语句的完整性较高,更符合用户的需求。
本实施例还提供一种对初始剪辑点进行多次调整进行音频剪辑的方法,请参见图5,具体包括:
步骤S501,确定剪辑点;
步骤S502,对剪辑点对应的完整音频语句进行切分;
步骤S503,判断评估值是否大于预设评估阈值;大于则跳转到步骤S504;否则跳转到步骤S505;
步骤S504,保留该完整音频语句,跳转到步骤S502;
步骤S505,舍弃该完整音频语句,跳转到步骤S506;
步骤S506,根据最终剪辑点进行音频剪辑。
本实施例中,若需要对初始剪辑点进行多次调整得到最终剪辑点,可以是在得到第一次调整后的剪辑点后获取该调整后的剪辑点对应的完整音频语句,判断是否需要对该调整后的剪辑点进行进一步调整,本实施例中的调整后的剪辑点包括至少一次调整后的剪辑点,其可以是重复进行该步骤S502至S504,直至判断需不需要对调整后的剪辑点进行调整时,确定最近一次调整的剪辑点作为调整后的剪辑点。本实施例中具体可以是在当存在舍弃待调整剪辑点对应的完整音频语句时,确定结束待调整剪辑点的调整。在确定最终剪辑点后,根据该最终剪辑点对待剪辑音频进行剪辑。本实施例中的剪辑点都包括剪辑起点和剪辑终点,其他部分的处理方式与只进行一次调整的方式都相同。通过上述方式,得到的剪辑后的音频相对来说其音频语句的完整性较高,更符合用户的需求。
另外,本实施例中,在进行每次剪辑点调整时,待调整剪辑点对应的评估值的计算方式可以是不相同的,即各剪辑点对应的用于进行预设评估阈值设置和其对应的评估值的计算的特征值可以选择不同值进行设置。如第一次调整时评估值的计算公式可以为前述的R=K1R1+K2R2+K3R3+K4R4;第一次调整后,后续的待调整的剪辑点在最初确定的剪辑音频S0E0之外时,可以将切分比设置为零,则第二次调整时评估值的计算公式可以为R=K(K2R2+K3R3+K4R4),其中K为第二次调整时加入的调整参数,0<K<1。本实施例中第一次调整时评估值可以由前述的切分比、音频信噪比、静音时延比、音强比进行确定;作为一种实施方式,所述评估值可以为R=(R1+R2+R3+R4)/4,即将K1、K2、K3、K4都设置为相同值1/4。设置当R≤0.5时,舍弃该完整音频语句,当R>0.5时,保留该完整音频语句。在进行第二次调整时,评估值由音频信噪比、静音时延比和音强比进行确定,所述评估值可以为R=0.9(R2+R3+R4)/3,1/3为相同的因素参数,其中0.9为 第二次调整时加入的调整参数。后续调整可以继续选择该第二次的评估值计算公式,也可以另外设置,直至R≤0.5时,舍弃该完整音频语句,结束调整。
另外,上述对初始剪辑点进行多次调整时,若每次调整的方向一致,如持续保留音频语句或持续舍弃音频语句,则可以通过已调整的音频的时长与待调整的音频的时长确定是否需要结束调整。调整比例计算公式为前述的:Q=θ(T1+…+TP)/T,根据该调整比例对待调整剪辑点进行调整,进行音频剪辑的过程,具体如图6所示,包括:
步骤S601,确定剪辑点;
步骤S602,对剪辑点对应的完整音频语句进行切分;
步骤S603,判断是该完整音频语句的评估值是否大于预设评估阈值,大于跳转到步骤S604;否则跳转到步骤S605;
步骤S604,保留该完整音频语句,跳转到步骤S606;
步骤S605,舍弃该完整音频语句,跳转到步骤S606;
步骤S606,保留或舍弃与上一次是否相同,若相同,跳转到步骤S607;若不相同,则跳转到步骤S608;
步骤S607,调整比例是否为Q>Q阈值,若是,跳转到步骤S608;若否,则跳转到步骤S602;
步骤S608,根据最终剪辑点进行音频剪辑。
即本实施例中,相比于前述对剪辑点进行多次调整的方式,增加了对剪辑点调整方向的判断和对调整比例的判断,来确定是否需要结束剪辑点调整。如第一次调整的方向和第二次调整的方向一致,调整的时长分别为T1和T2,假设设置θ=0.05,计算调整比例得到Q=0.05(T1+T2)/T,若Q小于Q阈值,则进行第三次剪辑点调整,直至Q大于Q阈值,结束调整。该Q阈值 具体可以是1。
本实施例提供的对待剪辑音频的剪辑方式,可以只对该待剪辑音频进行一次剪辑,也可以进行多次剪辑,使得得到的剪辑后的音频的起止点处的音频语句尽量为完整的语句,对用户来说,可以提高用户的体验。
实施例三
本实施例提供一种音频剪辑装置,请参见图7,包括:待调整剪辑点确定模块71,剪辑点调整模块72和剪辑模块73,其中,所述待调整剪辑点确定模块71配置为确定待剪辑音频对应的待调整剪辑点;所述剪辑点调整模块72配置为获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;所述剪辑模块73配置为通过最终剪辑点对待剪辑音频进行剪辑。
本实施例中,所述待调整剪辑点确定模块71确定待剪辑音频对应的待调整剪辑点,具体可以是接受用户对待剪辑音频的剪辑点的触发操作,确定该待调整剪辑点;或者接受用户输入的待调整剪辑点的位置信息,确定待调整剪辑点;也可以是自动识别调整后的剪辑点,若该调整后的剪辑点需要进行进一步调整,则将该剪辑点作为待调整剪辑点。另外,本实施例中的待调整剪辑点可以是待调整的剪辑起点,也可以是待调整的剪辑终点。
本实施例中,所述剪辑点调整模块72配置为获取所述待调整剪辑点所在的预设区域内的音频;或获取所述待调整剪辑点所在的完整音频语句。所述剪辑点调整模块,配置为计算调整音频对应的评估值;将评估值与预设评估阈值进行比较,根据比较结果对待调整剪辑点进行调整,得到最终剪辑点。其中,所述剪辑点调整模块,配置为根据所述调整音频的特征值计算所述评估值,所述特征值包括切分比、音频信噪比、静音时延比、音强比中的至少一个。
作为一种实施方式,所述剪辑点调整模块72,配置为当当所述待调整 剪辑点为所述待调整音频对应的剪辑起点时,当所述剪辑起点对应的调整音频的评估值大于其对应的预设评估阈值时,将调整音频的起点作为调整后的剪辑起点;当所述剪辑起点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将调整音频的终点作为调整后的剪辑起点。所述剪辑点调整模块72,还配置为当所述待调整剪辑点为所述待调整音频对应的剪辑终点时,当所述剪辑终点对应的调整音频的评估值大于其对应的预设评估阈值时,将调整音频的终点作为调整后的剪辑终点;当所述剪辑终点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将调整音频的起点作为调整后的剪辑终点。
本实施例中,所述剪辑点调整模块72,配置为根据待剪辑音频的长度,确定待调整剪辑点对应的调整次数,对待调整剪辑点根据调整次数进行调整,得到最终剪辑点;或获取对待调整剪辑点进行调整后的剪辑点对应的调整音频,对调整后的剪辑点进行调整,直至前一次调整后的剪辑点与本次待调整的剪辑点的调整方向不一致时,结束剪辑点调整,将前一次调整后的剪辑点作为最终剪辑点。
作为一种实施方式,所述剪辑点调整模块72,还配置为当已调整的调整音频的时长与待剪辑音频的时长的比值达到预设调整比例阈值,则停止对剪辑点的调整,将前一次确定的剪辑点作为最终剪辑点;已调整的调整音频包括剪辑起点对应的已调整的调整音频和/或剪辑终点对应的已调整的调整音频。若待剪辑音频的待调整剪辑点连续进行了多次调整,如向同一个方向进行了多次连续调整,则可以计算已经调整的调整音频的时长与该待剪辑音频的时长的比值,该比值具体为调整比例,若该调整比例达到预设调整比例阈值,则可以停止对该待剪辑音频的待调整剪辑点的调整,将前一次确定的剪辑点作为最终剪辑点;若其调整比例未达到预设调整比例阈值,则可以继续按照正常的调整方式对其待调整剪辑点进行调整。该剪 辑点可以是剪辑起点,也可以是剪辑终点,已调整的调整音频可以包括剪辑起点对应的已调整的调整音频,也可以包括剪辑终点对应的已调整的调整音频。
本实施例中,所述剪辑模块73通过最终剪辑点对待剪辑音频进行剪辑,具体包括将最终剪辑起点与最终剪辑终点间的音频剪辑出来进行存储,作为最终得到的剪辑后的音频。
本公开实施例中,所述音频剪辑装置中的待调整剪辑点确定模块71,剪辑点调整模块72和剪辑模块73,在实际应用中均可由所述装置中的中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现。
本实施例提供的音频剪辑装置,通过确定待剪辑音频对应的待调整剪辑点;获取待调整剪辑点对应的调整音频,根据调整音频对待调整剪辑点进行调整,确定最终剪辑点;通过最终剪辑点对待剪辑音频进行剪辑,即相比于现有技术,可以根据该待剪辑音频的待调整剪辑点对应的调整音频对该待调整剪辑点进行调整,得到最终剪辑点,根据该最终剪辑点对待剪辑音频进行剪辑,可以避免剪辑后的音频的起止位置存在不完整的语句、或存在静音时段等情况的发生,提高了音频剪辑的质量,使得最终剪辑得到的音频更加符合用户需求,提高了用户的体验。
本实施例还提供一种终端,请参见图8,具体包括:前述的音频剪辑装置。本实施例提供的终端,可以通过上述音频剪辑装置实现对待剪辑音频的剪辑点的调整,剪辑得到更为合理的剪辑后的音频,使得该剪辑后的音频更加符合用户需求,提高用户的体验。另外,本实施例中,对待剪辑音频的剪辑点的调整,剪辑得到更为合理的剪辑后的音频的方法,不需要在终端上设置硬件配件,改变终端的结构,可以适用于所有的终端,且成本 低,收效好。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出 贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。
工业实用性
本公开实施例的技术方案通过最终剪辑点对待剪辑音频进行剪辑,即可以根据该待剪辑音频的待调整剪辑点对应的调整音频对该待调整剪辑点进行调整,得到最终剪辑点,根据该最终剪辑点对待剪辑音频进行剪辑,可以避免剪辑后的音频的起止位置存在不完整的语句、或存在静音时段等情况的发生,提高了音频剪辑的质量,使得最终剪辑得到的音频更加符合用户需求,提升了用户的体验。

Claims (18)

  1. 一种音频剪辑方法,包括:
    确定待剪辑音频对应的待调整剪辑点;
    获取所述待调整剪辑点对应的调整音频,根据所述调整音频对所述待调整剪辑点进行调整,确定最终剪辑点;
    通过所述最终剪辑点对所述待剪辑音频进行剪辑。
  2. 如权利要求1所述的音频剪辑方法,其中,获取所述待调整剪辑点对应的调整音频,包括:
    获取所述待调整剪辑点所在的预设区域内的音频;或
    获取所述待调整剪辑点所在的完整音频语句。
  3. 如权利要求1所述的音频剪辑方法,其中,所述根据所述调整音频对所述待调整剪辑点进行调整,确定最终剪辑点,包括:
    计算所述调整音频对应的评估值;
    将所述评估值与预设评估阈值进行比较,根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点。
  4. 如权利要求3所述的音频剪辑方法,其中,所述计算所述调整音频对应的评估值包括:
    根据所述调整音频的特征值计算所述评估值,所述特征值包括切分比、音频信噪比、静音时延比、音强比中的至少一个。
  5. 如权利要求3所述的音频剪辑方法,其中,当所述待调整剪辑点为所述待调整音频对应的剪辑起点时,所述根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点,包括:
    当所述剪辑起点对应的调整音频的评估值大于其对应的预设评估阈值时,将所述调整音频的起点作为调整后的剪辑起点;
    当所述剪辑起点对应的调整音频的评估值小于等于其对应的预设评估 阈值时,将所述调整音频的终点作为调整后的剪辑起点。
  6. 如权利要求3所述的音频剪辑方法,其中,当所述待调整剪辑点为所述待调整音频对应的剪辑终点时,所述根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点,包括:
    当所述剪辑终点对应的调整音频的评估值大于其对应的预设评估阈值时,将所述调整音频的终点作为调整后的剪辑终点;
    当所述剪辑终点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将所述调整音频的起点作为调整后的剪辑终点。
  7. 如权利要求1至6任一项所述的音频剪辑方法,其中,所述根据所述调整音频对所述待调整剪辑点进行调整,确定最终剪辑点,包括:
    根据所述待剪辑音频的长度,确定所述待调整剪辑点对应的调整次数,对所述待调整剪辑点根据所述调整次数进行调整,得到所述最终剪辑点;或
    获取对所述待调整剪辑点进行调整后的剪辑点对应的调整音频,对所述调整后的剪辑点进行调整,直至前一次调整后的剪辑点与本次待调整的剪辑点的调整方向不一致时,结束剪辑点调整,将前一次调整后的剪辑点作为最终剪辑点。
  8. 如权利要求7所述的音频剪辑方法,其中,所述方法还包括:当已调整的调整音频的时长与所述待剪辑音频的时长的比值达到预设调整比例阈值,则停止对剪辑点的调整,将前一次确定的剪辑点作为所述最终剪辑点;所述已调整的调整音频包括剪辑起点对应的已调整的调整音频和/或剪辑终点对应的已调整的调整音频。
  9. 一种音频剪辑装置,包括:
    待调整剪辑点确定模块,配置为确定待剪辑音频对应的待调整剪辑点;
    剪辑点调整模块,配置为获取所述待调整剪辑点对应的调整音频,根 据所述调整音频对所述待调整剪辑点进行调整,确定最终剪辑点;
    剪辑模块,配置为通过所述最终剪辑点对所述待剪辑音频进行剪辑。
  10. 如权利要求9所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为获取所述待调整剪辑点所在的预设区域内的音频;或获取所述待调整剪辑点所在的完整音频语句。
  11. 如权利要求9所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为计算所述调整音频对应的评估值;将所述评估值与预设评估阈值进行比较,根据比较结果对所述待调整剪辑点进行调整,得到最终剪辑点。
  12. 如权利要求11所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为根据所述调整音频的特征值计算所述评估值,所述特征值包括切分比、音频信噪比、静音时延比、音强比中的至少一个。
  13. 如权利要求11所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为当所述待调整剪辑点为所述待调整音频对应的剪辑起点时,当所述剪辑起点对应的调整音频的评估值大于其对应的预设评估阈值时,将所述调整音频的起点作为调整后的剪辑起点;当所述剪辑起点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将所述调整音频的终点作为调整后的剪辑起点。
  14. 如权利要求11所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为当所述待调整剪辑点为所述待调整音频对应的剪辑终点时,当所述剪辑终点对应的调整音频的评估值大于其对应的预设评估阈值时,将所述调整音频的终点作为调整后的剪辑终点;当所述剪辑终点对应的调整音频的评估值小于等于其对应的预设评估阈值时,将所述调整音频的起点作为调整后的剪辑终点。
  15. 如权利要求9至14任一项所述的音频剪辑装置,其中,所述剪辑点调整模块,配置为根据所述待剪辑音频的长度,确定所述待调整剪辑点 对应的调整次数,对所述待调整剪辑点根据所述调整次数进行调整,得到所述最终剪辑点;或获取对所述待调整剪辑点进行调整后的剪辑点对应的调整音频,对所述调整后的剪辑点进行调整,直至前一次调整后的剪辑点与本次待调整的剪辑点的调整方向不一致时,结束剪辑点调整,将前一次调整后的剪辑点作为最终剪辑点。
  16. 如权利要求15所述的音频剪辑装置,其中,所述剪辑点调整模块,还配置为当已调整的调整音频的时长与所述待剪辑音频的时长的比值达到预设调整比例阈值,则停止对剪辑点的调整,将前一次确定的剪辑点作为所述最终剪辑点;所述已调整的调整音频包括剪辑起点对应的已调整的调整音频和/或剪辑终点对应的已调整的调整音频。
  17. 一种终端,包括:如权利要求9至16任一项所述的音频剪辑装置。
  18. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至8任一项所述的音频剪辑方法。
PCT/CN2017/080702 2016-09-05 2017-04-17 一种音频剪辑方法、装置、终端及计算机存储介质 WO2018040576A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610804873.7A CN107799132A (zh) 2016-09-05 2016-09-05 一种音频剪辑方法和装置,及终端
CN201610804873.7 2016-09-05

Publications (1)

Publication Number Publication Date
WO2018040576A1 true WO2018040576A1 (zh) 2018-03-08

Family

ID=61299941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/080702 WO2018040576A1 (zh) 2016-09-05 2017-04-17 一种音频剪辑方法、装置、终端及计算机存储介质

Country Status (2)

Country Link
CN (1) CN107799132A (zh)
WO (1) WO2018040576A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6771285B1 (en) * 1999-11-26 2004-08-03 Sony United Kingdom Limited Editing device and method
CN102414755A (zh) * 2009-03-16 2012-04-11 苹果公司 用于编辑电子消息中的音频或视频附件的设备、方法和图形用户界面
CN103931199A (zh) * 2011-11-14 2014-07-16 苹果公司 多媒体片段的生成
CN104361897A (zh) * 2014-11-21 2015-02-18 网易(杭州)网络有限公司 一种制作铃音的方法及装置
CN105323371A (zh) * 2015-02-13 2016-02-10 维沃移动通信有限公司 音频的剪辑方法及移动终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6771285B1 (en) * 1999-11-26 2004-08-03 Sony United Kingdom Limited Editing device and method
CN102414755A (zh) * 2009-03-16 2012-04-11 苹果公司 用于编辑电子消息中的音频或视频附件的设备、方法和图形用户界面
CN103931199A (zh) * 2011-11-14 2014-07-16 苹果公司 多媒体片段的生成
CN104361897A (zh) * 2014-11-21 2015-02-18 网易(杭州)网络有限公司 一种制作铃音的方法及装置
CN105323371A (zh) * 2015-02-13 2016-02-10 维沃移动通信有限公司 音频的剪辑方法及移动终端

Also Published As

Publication number Publication date
CN107799132A (zh) 2018-03-13

Similar Documents

Publication Publication Date Title
US11115541B2 (en) Post-teleconference playback using non-destructive audio transport
US10579327B2 (en) Speech recognition device, speech recognition method and storage medium using recognition results to adjust volume level threshold
KR102084931B1 (ko) 볼륨 레벨러 제어기 및 제어 방법
US10522164B2 (en) Method and device for improving audio processing performance
US9608588B2 (en) Dynamic range control with large look-ahead
EP3369175B1 (en) Object-based audio signal balancing
JP2019204073A (ja) 音声区間の認識方法、装置及び機器
WO2017032030A1 (zh) 一种音量调节方法及用户终端
US11327710B2 (en) Automatic audio ducking with real time feedback based on fast integration of signal levels
CN110264999B (zh) 一种音频处理方法、设备及计算机可读介质
US11990150B2 (en) Method and device for audio repair and readable storage medium
US8868419B2 (en) Generalizing text content summary from speech content
KR101986905B1 (ko) 신호 분석 및 딥 러닝 기반의 오디오 음량 제어 방법 및 시스템
CN107680584B (zh) 用于切分音频的方法和装置
JP6067391B2 (ja) 信号音量に基いた信号利得の適合時のピーク検出
CN107005609B (zh) 基于吹气动作操作移动终端的方法和移动终端
WO2018040576A1 (zh) 一种音频剪辑方法、装置、终端及计算机存储介质
US11551707B2 (en) Speech processing method, information device, and computer program product
KR101976986B1 (ko) 소리데이터 자동분할 장치
EP2296270A2 (en) Method for removing pop-up noise in mobile device
KR102124825B1 (ko) 자동적으로 영상을 트리밍하는 방법 및 그를 이용한 서버
US11343635B2 (en) Stereo audio
KR101501705B1 (ko) 음성 데이터를 이용한 문서 생성 장치, 방법 및 컴퓨터 판독 가능 기록 매체
CN117528337A (zh) 音频处理方法、装置、电子设备和介质
KR20240047372A (ko) 사운드 코덱에 있어서 출력 합성 왜곡의 제한을 위한 방법 및 디바이스

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17844884

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17844884

Country of ref document: EP

Kind code of ref document: A1