WO2021179991A1 - 音频处理方法及电子设备 - Google Patents

音频处理方法及电子设备 Download PDF

Info

Publication number
WO2021179991A1
WO2021179991A1 PCT/CN2021/079144 CN2021079144W WO2021179991A1 WO 2021179991 A1 WO2021179991 A1 WO 2021179991A1 CN 2021079144 W CN2021079144 W CN 2021079144W WO 2021179991 A1 WO2021179991 A1 WO 2021179991A1
Authority
WO
WIPO (PCT)
Prior art keywords
processed
field
audio
input
text
Prior art date
Application number
PCT/CN2021/079144
Other languages
English (en)
French (fr)
Inventor
胡吉祥
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Priority to KR1020227033855A priority Critical patent/KR20220149570A/ko
Priority to EP21767696.4A priority patent/EP4120268A4/en
Publication of WO2021179991A1 publication Critical patent/WO2021179991A1/zh
Priority to US17/940,057 priority patent/US20230005506A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 

Definitions

  • the present invention relates to the field of communication technology, in particular to an audio processing method and electronic equipment.
  • the user In the traditional technology, the user often manually adjusts the audio progress bar to find the playback period of the audio segment that needs to be modified, and then modify the audio segment at the playback period. During the operation, the user often needs to repeatedly adjust the progress bar to accurately locate the playback period of the audio clip that needs to be modified. The entire operation process is cumbersome and the audio processing efficiency is low.
  • the embodiments of the present invention provide an audio processing method and an electronic device to solve the problem of complicated operation process and low audio processing efficiency when modifying audio content.
  • the present invention is implemented as follows:
  • an embodiment of the present invention provides an audio processing method applied to an electronic device, and the method includes:
  • the audio segment at the play time period corresponding to the field to be processed is modified to obtain the target audio.
  • an embodiment of the present invention also provides an electronic device, including:
  • the first acquisition module is configured to acquire text information corresponding to the to-be-processed audio, the text information includes the to-be-processed text and the play time period corresponding to each field in the to-be-processed text;
  • the first receiving module is configured to receive the first input for the to-be-processed text
  • the first determining module is configured to determine the field to be processed in the text to be processed according to the field indicated by the first input in response to the first input;
  • the second receiving module is configured to receive a second input for the field to be processed
  • the second acquisition module is configured to acquire the target audio segment according to the second input
  • the second determining module is configured to modify the audio segment at the playback period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
  • an embodiment of the present invention provides an electronic device, including a processor, a memory, and an audio processing program stored on the memory and running on the processor, and the audio processing program is processed by the processor.
  • the steps of the audio processing method as described in the first aspect are implemented when the device is executed.
  • an embodiment of the present invention provides a computer-readable storage medium, and an audio processing program is stored on the computer-readable storage medium.
  • the audio processing program is executed by a processor, the audio processing program described in the first aspect is implemented. Processing method steps.
  • the audio processing method and electronic device provided by the embodiments of the present invention will first obtain the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then Receive a first input for the text to be processed, in response to the first input, determine the field indicated by the first input in the text to be processed as the field to be processed, and then receive a second input for the field to be processed, in response to the first input The second input is to obtain the target audio segment. Finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio. In this way, the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • Figure 1 shows a flowchart of the steps of an embodiment of the audio processing method of the present invention
  • FIG 2-1 shows a flowchart of another embodiment of the audio processing method of the present invention
  • Figure 2-2 shows a schematic diagram of an example of displaying text to be processed according to an embodiment of the present invention
  • Figure 2-3 shows a schematic diagram of another example of displaying to-be-processed text provided by an embodiment of the present invention
  • Figures 2-4 show schematic diagrams of examples of editing to-be-processed text provided by an embodiment of the present invention
  • Fig. 2-5 shows a schematic diagram of another example of editing to-be-processed text provided by an embodiment of the present invention
  • FIGS. 2-6 are schematic diagrams showing another example of editing text to be processed according to an embodiment of the present invention.
  • Figure 3 shows a flow chart of the steps of another embodiment of the audio processing method of the present invention.
  • Figure 4 shows a structural block diagram of an embodiment of the electronic device of the present invention
  • FIG. 5 shows a schematic diagram of the hardware structure of an embodiment of the electronic device of the present invention.
  • FIG. 1 shows a step flow chart of an embodiment of the audio processing method of the present invention.
  • the method may be applied to an electronic device. As shown in FIG. 1, the method may include steps 101 to 106.
  • Step 101 Acquire text information corresponding to a to-be-processed audio, where the text information includes the to-be-processed text and a play time period corresponding to each field in the to-be-processed text.
  • the audio to be processed can be the audio stored locally, or the audio that needs to be modified downloaded from the Internet.
  • the audio to be processed can be obtained directly through audio recording, or recorded during video recording. Yes, that is, the audio to be processed can be the audio extracted from the video.
  • the text to be processed may be text corresponding to the audio to be processed, and the corresponding text may be obtained by converting the audio to be processed according to an audio-to-text method.
  • the play time period corresponding to each field in the text to be processed may be the play time period of the audio corresponding to the field in the audio to be processed.
  • the playback period of "5.1 seconds to 5.9 seconds" can be determined as The playing time period corresponding to the field "good mood”.
  • Step 102 Receive a first input for the to-be-processed text.
  • the first input for the text to be processed may be an operation of selecting a field in the text to be processed that needs to be modified on an interface displaying the text to be processed.
  • the operation can be a single click, a double click, and so on.
  • Step 103 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
  • the field of the first input instruction refers to the field selected by the user through the first input, that is, the field corresponding to the audio that the user needs to modify. Therefore, the field to be processed can be determined according to the field of the first input instruction Field. According to the field of the first input instruction, when determining the field to be processed in the text to be processed, the field of the first input instruction in the text to be processed may be used as the field to be processed.
  • Step 104 Receive a second input for the field to be processed.
  • the second input for the field to be processed may be performed on an interface displaying the text to be processed, and the second input may be performed by the user according to the modification requirements of the audio segment corresponding to the field to be processed.
  • the second input may be a delete operation for the field to be processed, or an input operation for replacing the field to be processed, or an operation for inputting a field to be added, or an input for replacing the corresponding field to be processed The operation of the audio segment of the audio segment.
  • Step 105 In response to the second input, obtain a target audio segment.
  • the target audio segment may be the audio segment ultimately desired by the user.
  • the target audio segment may be directly input by the user, or it may be obtained by the electronic device by editing the field to be processed.
  • the specific method for editing the field to be processed may be determined according to the second input. For example, when the second input is an operation of inputting a field to be added, a new field can be added to the field to be processed. When the second input is a delete operation for the field to be processed, delete the field to be processed, and so on. Since the second input is performed by the user according to the modification requirements of the audio segment corresponding to the field to be processed, by editing the field to be processed, it can be ensured that the acquired target audio segment is the field corresponding to the audio ultimately desired by the user.
  • Step 106 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
  • the play time period corresponding to the field to be processed when modifying according to the target audio segment, in the play time period corresponding to each field contained in the text information, the play time period corresponding to the field to be processed can be read, and then the play time period corresponding to the field to be processed can be changed.
  • the audio segment is modified to the target audio segment, thereby realizing the modification of the audio to be processed.
  • the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, in response to the second input , Acquire the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
  • the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • FIG. 2-1 shows a step flow chart of another embodiment of the audio processing method of the present invention.
  • the method may be applied to an electronic device. As shown in FIG. 2-1, the method may include step 201 to step 207.
  • Step 201 Acquire text information corresponding to the audio to be processed, where the text information includes the text to be processed and the play time period corresponding to each field in the text to be processed.
  • the electronic device may obtain the text information corresponding to the audio to be processed through the following steps 2011 to 2013:
  • Step 2011 Detect whether there is a subtitle file matching the audio to be processed, the subtitle file including the subtitle text and the play time period corresponding to each field in the subtitle text.
  • the audio to be processed may be audio in a video
  • the subtitle file may be a subtitle file matching the video.
  • the audio to be processed may also be an independent audio, such as a song, etc.
  • the subtitle file may be a lyric file matching the song. Detecting whether there is a subtitle file that matches the audio to be processed can be searched online for a subtitle file that matches the audio to be processed, or search locally for a matching subtitle file.
  • Step 2012 If there is a subtitle file matching the audio to be processed, use the subtitle file as text information corresponding to the audio to be processed.
  • the subtitle file is used as the text information corresponding to the audio to be processed
  • the subtitle text contained in the subtitle file can be used as the text to be processed corresponding to the audio to be processed
  • the play period corresponding to each field in the subtitle text is taken as The playing period of this field in the audio to be processed.
  • Step 2013 If there is no subtitle file matching the audio to be processed, convert the audio contained in the audio to be processed into text, and generate the audio segment according to the time information of the audio segment in the audio to be processed.
  • the play time period corresponding to each field in the text, the text and the play time period corresponding to each field in the text are used as the text information corresponding to the audio to be processed.
  • converting the to-be-processed audio into text may be realized by using the method of converting speech to text.
  • the audio can be processed first to remove noise in the audio to avoid interference in the conversion process, and then the feature values in the audio are extracted, and the audio is divided into smaller audio segments, so that the audio segment is Contains one or more feature values, matches the feature value of the audio segment with the model feature value in the audio model library, and determines the text corresponding to the model feature value obtained by the matching as the text corresponding to the audio segment.
  • Step 202 Receive a first input for the to-be-processed text.
  • the text to be processed before receiving the first input for the text to be processed, the text to be processed may be displayed through the following steps:
  • the preset picture may be preset according to actual conditions.
  • the preset picture may be a picture associated with the audio to be processed.
  • it may be the video cover of the video to which the audio to be processed belongs, or the audio to be processed.
  • by displaying all the text to be processed in the preset screen it is convenient for the user to visually see the complete text to be processed, and at the same time, the user's viewing experience can be improved by using the preset screen related to the text to be processed.
  • Figure 2-2 shows a schematic diagram of an example of displaying text to be processed according to an embodiment of the present invention.
  • Pending text It should be noted that in actual application scenarios, the number of texts to be processed may be large, and due to the limitation of the screen size of electronic devices, the complete text to be processed may not be displayed at one time. Therefore, the text to be processed can be scrolled. To ensure complete display.
  • a video screen can also be displayed, and the corresponding to-be-processed text can be displayed in the video screen.
  • the to-be-processed text corresponding to the video screen may be text with the same playback period as the playback period of the video screen. Since the content of the video screen and the text to be processed corresponding to the video screen often have a strong correlation, the method of displaying in the video screen separately can facilitate the user to observe the content and text content of the video screen at the same time, thereby facilitating the user Make a selection.
  • a text display box can be generated on the video screen, and the text to be processed is displayed in the text display box.
  • the specific form of the display box can be preset according to actual conditions.
  • Figure 2-3 shows a schematic diagram of another example of displaying text to be processed according to an embodiment of the present invention.
  • the corresponding text to be processed is displayed in the video screen, that is, "Let me drop It’s not just last night’s wine that shed tears.”
  • the electronic device can receive the first input by receiving the selection input of the displayed text to be processed.
  • the electronic device can receive the first input by receiving the selection input of the displayed text to be processed.
  • the user can be provided with a visual selection scene and rich information, so that the user can easily select the to-be-processed text and improve the selection efficiency.
  • Step 203 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
  • the field indicated by the first input indication in the text to be processed when the field indicated by the first input indication in the text to be processed is determined as the field to be processed, it may be searched for all the fields of the first input indication contained in all the text to be processed, and then the searched field is determined as the field to be processed Field; where the field indicated by the first input may be the field selected by the user for the selection and input of the displayed text to be processed.
  • the first input may be performed through a preset search area, and the field indicated by the first input may be input through the search area.
  • the electronic device may display the search area before this step; and then receive the first input performed by the user through the search area. In this way, the user only needs to select once to realize the control electronic device to modify all the same fields, thereby improving the selection efficiency.
  • step 203 the audio volume can be adjusted through the following steps A to C.
  • Step A Receive a third input for the to-be-processed text.
  • the third input for the text to be processed may be performed on an interface displaying the text to be processed, and the third input may be an operation to adjust the font of the text to be processed.
  • the user can perform a third input when the font of the text to be processed needs to be adjusted, and accordingly, the electronic device can receive the third input.
  • Step B In response to the third input, adjust the font size of the field to be adjusted indicated by the third input to obtain the adjusted field.
  • adjusting the font size of the field to be adjusted indicated by the third input may be based on the adjustment operation indicated by the third input to enlarge or reduce the font size of the field to be adjusted to obtain the adjusted field to be adjusted.
  • Step C Adjust the volume size of the audio corresponding to the field to be adjusted according to the font size of the field to be adjusted after adjustment, wherein the larger the font of the field to be adjusted after adjustment, the field to be adjusted corresponds to the audio The louder the volume.
  • the font size of the field to be adjusted after adjustment may be determined first, and then the font size of the field to be adjusted after adjustment is determined according to the preset font size.
  • the volume size correspondence relationship determines the volume corresponding to the font size of the adjusted field to be adjusted.
  • the volume of the audio corresponding to the field to be adjusted is set to the volume of the audio corresponding to the field to be adjusted, thereby realizing volume adjustment.
  • the preset font size and the volume size the larger the font, the greater the volume.
  • the volume of the audio corresponding to the field to be adjusted can be set to the 60 decibels.
  • the user only needs to adjust the text font size, and can correspondingly control and adjust the volume of the corresponding audio, making the process of audio volume adjustment easier, and thereby improving the adjustment efficiency.
  • a curve for adjusting the font size may be preset.
  • the user can select the to-be-adjusted field whose font size needs to be adjusted from the to-be-processed text, and then adjust the shape of the curve, so as to input the second input.
  • the size of each word contained in the adjustment field can be adjusted in turn according to the height of each segment of the adjusted curve; wherein, the height of the segment can be the same as the size of the word. Proportional or inversely proportional. In this way, the user only needs to adjust the shape of the curve to achieve the volume of the corresponding audio segment.
  • the volume of the audio segment corresponding to the field to be processed has many possibilities.
  • the user can adjust the curve to a wave shape to control the volume of the field to be adjusted to increase or decrease. Can improve the fun of audio.
  • Step 204 Receive a second input for the field to be processed.
  • this step can refer to the foregoing step 104, which is not limited in the embodiment of the present invention.
  • Step 205 Edit the field to be processed according to the second input to obtain the target field.
  • the second input is a delete input
  • the second input is a replacement input
  • the field to be replaced corresponding to the second input can be obtained; the field to be processed is deleted and displayed in the The position of the field to be processed, the field to be replaced is added, and the target field is obtained.
  • obtaining the field to be replaced corresponding to the second input can be a method of extracting the field contained in the second input and using the field as the field to be replaced, or extracting the voice contained in the second input and converting it to text according to the voice Obtain the text corresponding to the voice, and use the obtained text as the field to be replaced.
  • the second input is an additional input, it can be considered that the user needs to add a new field to the field to be processed. Therefore, the field to be added corresponding to the second input can be obtained; at the position of the field to be processed, Add the field to be added to obtain the target field.
  • to obtain the field to be added corresponding to the second input may be to extract the field contained in the second input and use the field as the field to be added, or to extract the voice contained in the second input, and according to the voice
  • the method of translating text obtains the text corresponding to the voice, and uses the obtained text as the field to be added.
  • corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
  • a preset mark can also be added to the displayed field to be processed, and the field to be replaced or the field to be added is displayed according to the display position corresponding to the field to be processed.
  • the preset mark may be a mark that reflects a specific editing operation performed on the field to be processed, and different editing operations correspond to different preset marks. For example, if the editing operation is to delete the field to be processed, the preset mark may be a strikethrough added on the field to be processed, or a text mark indicating that the field is deleted is added to the field to be processed.
  • the preset mark can be an underline added on the field to be processed, or a text mark that indicates that the field is replaced on the field to be processed, and the field to be processed is displayed next to the field to be processed.
  • the specific display position can be set according to the actual situation.
  • the preset mark may be to add a field mark at the position corresponding to the field to be processed, such as an arrow, to indicate that the field is added at the position.
  • the added fields to be added can be displayed to facilitate the user to know what fields have been added.
  • the specific marking method may be various, which is not limited in the embodiment of the present invention. By adding a preset mark on the field to be processed, the user can more clearly know the location of the modified field to be processed and the specific editing operation performed on it.
  • the display position may be preset according to actual needs.
  • the display position may be below the field to be deleted. In this way, by displaying the to-be-replaced field or the to-be-added field in the display position corresponding to the field to be deleted, it is convenient for the user to quickly learn the content of the specific modification, thereby facilitating the user to check later.
  • Figure 2-4 shows a schematic diagram of an example of editing the text to be processed provided by an embodiment of the present invention.
  • the field to be processed is "let me cry”
  • the second input is delete Enter, delete the field to be processed, that is, add a strikethrough and delete it on "Let me shed tears”.
  • Figure 2-5 shows a schematic diagram of another example of editing the text to be processed according to an embodiment of the present invention.
  • the field to be processed is "tears" and the second input is a replacement input, so delete the text to be processed Field and display the field to be replaced, that is, add a strikethrough on the "tears", and the "saliva" below the field to be processed is the field to be replaced.
  • Figure 2-6 shows a schematic diagram of another example of editing text to be processed according to an embodiment of the present invention. As shown in Figure 2-6, the position indicated by the field to be processed is between "I" and "Drop", and the If the input is an increase input, an arrow is used to indicate the position of the field to be processed, and the "today” below the arrow is the field to be added.
  • Step 206 Determine the audio corresponding to the target field as the target audio segment.
  • the text can be linguistically analyzed to segment the target field into words, and then based on the words obtained by segmentation, the audio waveform segment corresponding to the matching field can be extracted from the speech synthesis database, and the audio waveform segment corresponding to each word can be extracted Synthesize to get the audio segment corresponding to the text. It may also be searched in the audio to be processed whether there is a field that is the same as the target field, and if it exists, the audio segment corresponding to the same field is extracted as the audio corresponding to the target field, and then the target audio segment is obtained.
  • Step 207 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
  • the play time period corresponding to the field to be processed can be obtained from the play time period corresponding to each field, and then the audio waveform corresponding to the audio to be processed is obtained. Finally, the play time period corresponding to the field to be processed is set to The corresponding band in the audio waveform diagram is modified to the audio band corresponding to the target audio band to obtain the target audio.
  • the field to be processed when obtaining the play time period corresponding to the field to be processed, the field to be processed can be searched from each field, and then the play time period corresponding to the field to be processed can be read.
  • the audio waveform corresponding to the audio to be processed it may be by extracting features contained in the audio, such as vibration frequency, and processing the features, such as normalization, to obtain a waveform showing the audio features according to the playback time.
  • the blank band when modifying the corresponding band in the audio waveform diagram corresponding to the playing period of the field to be processed into the blank band corresponding to the target audio band, the blank band can be used to replace the corresponding band to implement the modification.
  • the corresponding band may be deleted directly to implement the modification. It should be noted that when deleting, the waveform display of the corresponding band can be removed and changed to a straight line to show that the sound is deleted.
  • the target field is the field to be replaced, you can directly use the audio band corresponding to the target audio band to replace the corresponding band, or you can delete the corresponding band first, and then add the audio band corresponding to the target audio band at the deleted position The audio band corresponding to the field to be replaced.
  • the target field is a field to be added, you can directly use the audio band corresponding to the target audio segment to replace the corresponding band, or you can add the target audio segment corresponding to the position of the corresponding band in the audio waveform diagram according to the playback period corresponding to the field to be processed
  • the audio band corresponding to the field to be added in the audio band, the synthesized audio band is used as the target audio. In this way, by correspondingly modifying the band of the audio to be processed in the audio waveform graph, the modification of the audio to be processed can be realized, which can make the modification process more accurate, and thus can improve the accuracy of the modification.
  • the electronic device can also perform the following operations after obtaining the audio waveform:
  • the mark may be filled with different colors for the corresponding waveband, or may be added at the position of the corresponding waveband, and the specific form of the mark is not limited in the embodiment of the present invention. In this way, by displaying the audio waveform diagram corresponding to the audio to be processed, and marking the band corresponding to the field to be processed in the audio waveform diagram, it is convenient for the user to view the modified audio band.
  • the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, according to the second input,
  • the field to be processed is edited to obtain the target field, the audio corresponding to the target field is determined as the target audio segment, and finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
  • corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
  • the user can modify the audio without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • FIG. 3 shows a step flow chart of another embodiment of the audio processing method of the present invention.
  • the method may be applied to an electronic device. As shown in FIG. 3, the method may include steps 301 to 307.
  • Step 301 Acquire text information corresponding to the audio to be processed, where the text information includes the text to be processed and the play time period corresponding to each field in the text to be processed.
  • this step can refer to the foregoing step 201, which is not limited in the embodiment of the present invention.
  • Step 302 Receive a first input for the to-be-processed text.
  • this step can refer to the foregoing step 202, which is not limited in the embodiment of the present invention.
  • Step 303 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
  • this step can refer to the foregoing step 203, which is not limited in the embodiment of the present invention.
  • Step 304 Receive a second input for the field to be processed.
  • this step can refer to the foregoing step 104, which is not limited in the embodiment of the present invention.
  • Step 305 Extract the audio segment carried in the second input.
  • the second input may be an audio recording operation. Accordingly, the audio segment carried in the second input may be a voice segment recorded by the user.
  • the second input may also be an audio upload operation.
  • the audio segment carried in the second input may also be an audio segment selected by the user to upload.
  • Step 306 Determine the audio segment as the target audio segment.
  • the audio segment carried in the second input is the audio segment ultimately desired by the user. Therefore, the audio segment can be directly determined as the target audio segment.
  • the user before the input audio segment is determined as the target audio segment, the user is prompted whether to process the input audio segment, and if so, the input audio segment is intercepted according to the user operation , And use the intercepted audio segment as the target audio segment. In this way, by prompting the user whether to process the input audio segment, the quality of the target audio segment can be further improved.
  • Step 307 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
  • step 207 refers to the foregoing step 207, which is not limited in the embodiment of the present invention.
  • the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, and extract the second input
  • the audio segment is determined as the target audio segment, and finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
  • the target audio segment can be easily obtained, and therefore, the processing efficiency can be improved.
  • the user can modify the audio without manually adjusting the progress bar, which can further improve the audio processing efficiency.
  • an embodiment of the present invention also provides an electronic device.
  • the electronic device 40 may include:
  • the first obtaining module 401 is configured to obtain text information corresponding to the to-be-processed audio, the text information including the to-be-processed text and the play time period corresponding to each field in the to-be-processed text;
  • the first receiving module 402 is configured to receive the first input for the to-be-processed text
  • the first determining module 403 is configured to determine the field to be processed in the text to be processed according to the field indicated by the first input in response to the first input;
  • the second receiving module 404 is configured to receive a second input for the field to be processed
  • the second acquiring module 405 is configured to acquire the target audio segment according to the second input
  • the second determining module 406 is configured to modify the audio segment in the play period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
  • the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
  • the first input of the text in response to the first input, determines the field to be processed in the text to be processed according to the field indicated by the first input, and then receives a second input for the field to be processed, in response to the second input, Obtain the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
  • the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • the second obtaining module 405 is configured to:
  • the second input edit the field to be processed to obtain a target field, and determine the audio corresponding to the target field as the target audio segment;
  • the audio segment carried in the second input is extracted, and the audio segment is determined as the target audio segment.
  • the second obtaining module 405 is further configured to:
  • the second input is a delete input, delete the field to be processed, and determine the blank field obtained after deletion as the target field;
  • the second input is a replacement input
  • the field to be replaced corresponding to the second input is obtained, the field to be processed is deleted, and the field to be replaced is added at the position of the field to be processed to obtain the Target field
  • the second input is an addition input
  • the field to be added corresponding to the second input is obtained, and the field to be added is added at the position of the field to be processed to obtain the target field.
  • the electronic device 40 further includes:
  • the first display module is configured to display a preset picture and display all the to-be-processed texts in the preset picture; or, to display each video picture of the to-be-processed video, and display and The to-be-processed text corresponding to the video screen.
  • the first receiving module 402 is further configured to:
  • the electronic device 40 further includes:
  • the second display module is configured to add a preset mark to the displayed field to be processed, and display the field to be replaced or the field to be added according to the display position corresponding to the field to be processed.
  • the second determining module 406 is configured to:
  • the electronic device 40 further includes:
  • the third receiving module is configured to receive the third input for the to-be-processed text
  • the first adjustment module is configured to adjust the font size of the field to be adjusted indicated by the third input in response to the third input to obtain the adjusted field;
  • the second adjustment module is configured to adjust the volume size of the audio corresponding to the field to be adjusted according to the font size of the field to be adjusted after adjustment.
  • the larger the font of the field to be adjusted after adjustment the larger the font size of the field to be adjusted.
  • the first obtaining module 401 is configured to:
  • the subtitle file includes a subtitle text and a play period corresponding to each field in the subtitle text
  • the play time period corresponding to the field, the text and the play time period corresponding to each field in the text are used as the text information corresponding to the audio to be processed.
  • the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
  • the first input of the text in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, and treat it according to the second input
  • the processing field is edited to obtain the target field, and the audio corresponding to the target field is determined as the target audio segment.
  • the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
  • corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
  • the user can modify the audio without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • FIG. 5 shows a schematic diagram of the hardware structure of an embodiment of the electronic device of the present invention.
  • the electronic device 500 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and Power supply 511 and other components.
  • a radio frequency unit 501 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and Power supply 511 and other components.
  • Those skilled in the art can understand that the structure of the electronic device shown in FIG. 5 does not constitute a limitation on the electronic device.
  • the electronic device may include more or fewer components than those shown in the figure, or a combination of certain components, or different components. Layout.
  • electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers
  • the processor 510 is configured to obtain text information corresponding to the audio to be processed, and the text information includes the text to be processed and a play time period corresponding to each field in the text to be processed.
  • the processor 510 is configured to receive a first input for the to-be-processed text.
  • the processor 510 is configured to determine a field to be processed in the text to be processed according to the field indicated by the first input in response to the first input.
  • the processor 510 is configured to receive a second input for the field to be processed.
  • the processor 510 is configured to obtain a target audio segment in response to the second input.
  • the processor 510 is configured to modify the audio segment in the play period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
  • the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
  • the first input of the text in response to the first input, determines the field to be processed in the text to be processed according to the field indicated by the first input, and then receives a second input for the field to be processed, in response to the second input, Obtain the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
  • the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
  • processor 510 is used to:
  • Extract the audio segment carried in the second input determine the audio segment as the target audio segment.
  • processor 510 is further configured to:
  • the second input is a delete input, delete the field to be processed, and determine the blank field obtained after deletion as the target field;
  • the second input is a replacement input, obtain the field to be replaced corresponding to the second input; delete the field to be processed and add the field to be replaced in the position of the field to be processed to obtain the Target field
  • the field to be added corresponding to the second input is obtained; at the position of the field to be processed, the field to be added is added to obtain the target field.
  • the display unit 506 is used to:
  • the user input unit 507 is used to receive a selection input of the displayed text to be processed.
  • processor 510 is used to:
  • the user input unit 507 is used to:
  • the processor 510 is used to:
  • the adjusted font size of the field to be adjusted adjust the volume of the audio corresponding to the field to be adjusted; wherein, the larger the font of the field to be adjusted after adjustment, the greater the volume of the audio corresponding to the field to be adjusted. Big.
  • processor 510 is used to:
  • the subtitle file includes a subtitle text and a play period corresponding to each field in the subtitle text;
  • If there is no subtitle file matching the audio to be processed convert the audio contained in the audio to be processed into text, and generate each of the texts according to the time information of the audio segment in the audio to be processed Play time period corresponding to the field; use the text and the play time period corresponding to each field in the text as the text information corresponding to the audio to be processed.
  • the radio frequency unit 501 can be used for receiving and sending signals in the process of sending and receiving information or talking. Specifically, after receiving the downlink data from the base station, it is processed by the processor 510; Uplink data is sent to the base station.
  • the radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency unit 501 can also communicate with the network and other devices through a wireless communication system.
  • the electronic device provides users with wireless broadband Internet access through the network module 502, such as helping users to send and receive emails, browse web pages, and access streaming media.
  • the audio output unit 503 can convert the audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output it as sound. Moreover, the audio output unit 503 may also provide audio output related to a specific function performed by the electronic device 500 (for example, call signal reception sound, message reception sound, etc.).
  • the audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.
  • the input unit 504 is used to receive audio or video signals.
  • the input unit 504 may include a graphics processing unit (GPU) 5041 and a microphone 5042.
  • the graphics processor 5041 is configured to monitor images of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. Data is processed.
  • the processed image frame may be displayed on the display unit 506.
  • the image frame processed by the graphics processor 5041 may be stored in the memory 509 (or other storage medium) or sent via the radio frequency unit 501 or the network module 502.
  • the microphone 5042 can receive sound, and can process such sound into audio data.
  • the processed audio data can be converted into a format that can be sent to a mobile communication base station via the radio frequency unit 501 for output in the case of a telephone call mode.
  • the electronic device 500 further includes at least one sensor 505, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor includes an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 5061 according to the brightness of the ambient light.
  • the proximity sensor can close the display panel 5061 and the display panel 5061 when the electronic device 500 is moved to the ear. / Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games) , Magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; sensor 505 can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, Infrared sensors, etc., will not be repeated here.
  • the display unit 506 is used to display information input by the user or information provided to the user.
  • the display unit 606 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the user input unit 507 can be used to receive inputted numeric or character information, and generate key signal input related to user settings and function control of the electronic device.
  • the user input unit 507 includes a touch panel 5071 and other input devices 5072.
  • the touch panel 5071 also known as a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 5071 or near the touch panel 5071. operate).
  • the touch panel 5071 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 510, the command sent by the processor 510 is received and executed.
  • the touch panel 5071 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the user input unit 507 may also include other input devices 5072.
  • other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick, which will not be repeated here.
  • the touch panel 5071 can be covered on the display panel 5061.
  • the touch panel 6071 detects a touch operation on or near it, it is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 determines the type of the touch event according to the touch.
  • the type of event provides corresponding visual output on the display panel 5061.
  • the touch panel 5071 and the display panel 5061 are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 can be integrated
  • the implementation of the input and output functions of the electronic device is not specifically limited here.
  • the interface unit 508 is an interface for connecting an external device and the electronic device 500.
  • the external device may include a wired or wireless headset port, an external power source (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) port, video I/O port, headphone port, etc.
  • the interface unit 508 can be used to receive input (for example, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device 500 or can be used to connect the electronic device 500 to an external device. Transfer data between devices.
  • the memory 509 can be used to store software programs and various data.
  • the memory 509 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc.
  • the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the processor 510 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device, runs or executes software programs and/or modules stored in the memory 509, and calls data stored in the memory 509 , Perform various functions of electronic equipment and process data, so as to monitor the electronic equipment as a whole.
  • the processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc., and the modem The processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 510.
  • the electronic device 500 may also include a power source 511 (such as a battery) for supplying power to various components.
  • a power source 511 such as a battery
  • the power source 511 may be logically connected to the processor 510 through a power management system, so as to manage charging, discharging, and power consumption management through the power management system. And other functions.
  • the electronic device 500 includes some functional modules not shown, which will not be repeated here.
  • the embodiment of the present invention also provides an electronic device, including a processor 510, a memory 509, an audio processing program stored in the memory 509 and running on the processor 510, and the audio processing program is controlled by the processor 510.
  • an electronic device including a processor 510, a memory 509, an audio processing program stored in the memory 509 and running on the processor 510, and the audio processing program is controlled by the processor 510.
  • Each process of the foregoing audio processing method embodiment is realized during execution, and the same technical effect can be achieved. In order to avoid repetition, the details are not repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium on which an audio processing program is stored.
  • an audio processing program is executed by a processor, each process of the above-mentioned audio processing method embodiment is implemented, and the same can be achieved. In order to avoid repetition, I won’t repeat them here.
  • the computer-readable storage medium include non-transitory computer-readable storage media, such as read-only memory (Read-Only Memory, ROM for short), Random Access Memory (RAM for short), and magnetic CD or CD, etc.
  • Such a processor can be, but is not limited to, a general-purpose processor, a dedicated processor, a special application processor, or a field programmable logic circuit. It can also be understood that each block in the block diagram and/or flowchart and the combination of the blocks in the block diagram and/or flowchart can also be implemented by dedicated hardware that performs specified functions or actions, or can be implemented by dedicated hardware and A combination of computer instructions.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present invention.
  • a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本发明实施例提供了一种音频处理方法及电子设备,该方法会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,将待处理文本中第一输入指示的字段确定为待处理字段,接着,接收针对待处理字段的第二输入,响应于该第二输入,获取目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。

Description

音频处理方法及电子设备
相关申请的交叉引用
本申请主张在2020年03月11日在中国提交的中国专利申请号202010167788.0的优先权,其全部内容通过引用包含于此。
技术领域
本发明涉及通信技术领域,尤其涉及一种音频处理方法及电子设备。
背景技术
生活中人们经常会录制音频,但在录制的过程中,往往会出现说错话、多次出现口头禅等问题,这就需要对音频内容进行修改,处理掉用户不想要的片段。
传统技术中,往往是用户手动调整音频的进度条,以寻找需要修改的音频片段所在的播放时段,然后对该播放时段处的音频片段进行修改。在操作过程中,用户经常需要反复调整进度条,才能准确定位到需要修改的音频片段所在的播放时段,整个操作过程较为繁琐,音频处理的效率较低。
发明内容
本发明实施例提供一种音频处理方法及电子设备,以解决在修改音频内容时,操作过程较为繁琐,音频处理的效率较低的问题。
为了解决上述技术问题,本发明是这样实现的:
第一方面,本发明实施例提供一种音频处理方法,应用于电子设备,所述方法包括:
获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;
接收针对所述待处理文本的第一输入;
响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;
接收针对所述待处理字段的第二输入;
根据所述第二输入,获取目标音频段;
根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
第二方面,本发明实施例还提供了一种电子设备,包括:
第一获取模块,用于获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;
第一接收模块,用于接收针对所述待处理文本的第一输入;
第一确定模块,用于响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;
第二接收模块,用于接收针对所述待处理字段的第二输入;
第二获取模块,用于根据所述第二输入,获取目标音频段;
第二确定模块,用于根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
第三方面,本发明实施例提供了一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的音频处理程序,所述音频处理程序被所述处理器执行时实现如第一方面所述的音频处理方法的步骤。
第四方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储音频处理程序,所述音频处理程序被处理器执行时实现如第一方面所述的音频处理方法的步骤。
综上所述,本发明实施例提供的音频处理方法及电子设备,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,将待处理文本中第一输入指示的字段确定为待处理字段,接着,接收针对待处理字段的第二输入,响应于该第二输入,获取目标音 频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1表示本发明的音频处理方法的实施例的步骤流程图;
图2-1表示本发明的音频处理方法的另一实施例的步骤流程图;
图2-2表示本发明实施例提供的显示待处理文本的示例的示意图;
图2-3表示本发明实施例提供的显示待处理文本的另一示例的示意图;
图2-4表示本发明实施例提供的编辑待处理文本的示例的示意图;
图2-5表示本发明实施例提供的编辑待处理文本的另一示例的示意图;
图2-6表示本发明实施例提供的编辑待处理文本的又一示例的示意图;
图3表示本发明的音频处理方法的又一实施例的步骤流程图;
图4表示本发明的电子设备的实施例的结构框图;
图5表示本发明的电子设备的实施例的硬件结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1表示本发明的音频处理方法的实施例的步骤流程图,该方法可以应用于电子设备,如图1所示,该方法可以包括步骤101至步骤106。
步骤101,获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段。
本发明实施例中,待处理音频可以是存储在本地的音频,也可以是从网上下载得到的需要修改的音频,待处理音频可以是通过音频录制直接得到的,也可以是录制视频过程中录制的,即,待处理音频可以是从视频中提取出的音频。进一步地,待处理文本可以是待处理音频所对应的文本,该对应的文本可以是根据音频转文字的方法,对待处理音频进行转换得到的。待处理文本中各个字段对应的播放时段,可以是该字段对应的音频在待处理音频中对应的播放时段。示例的,假设待处理文本中字段“好心情”对应的音频在待处理音频中对应的播放时段为第5.1秒至第5.9秒,那么可以将播放时段“第5.1秒至第5.9秒”确定为字段“好心情”对应的播放时段。
步骤102,接收针对所述待处理文本的第一输入。
本发明实施例中,针对待处理文本的第一输入,可以是在显示待处理文本的界面上,对待处理文本中需要修改的字段进行选择的操作。该操作可以是单击、双击等等。
步骤103,响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段。
本发明实施例中,第一输入指示的字段指的是用户通过第一输入选择的字段,即,用户需要修改的音频所对应的字段,因此,可以根据第一输入指示的字段,确定待处理字段。根据第一输入指示的字段,确定待处理文本中的待处理字段时,可以是将待处理文本中第一输入指示的字段作为待处理字段。
步骤104,接收针对所述待处理字段的第二输入。
本发明实施例中,针对待处理字段的第二输入,可以是在显示待处理文本的界面上执行的,第二输入可以是用户根据对待处理字段对应的音频段的修改需求执行的。例如,该第二输入可以是针对待处理字段的删除操 作,或者是输入用于替换待处理字段的操作,或者是输入需要添加的字段的操作,又或者是输入用于替换待处理字段所对应音频段的音频段的操作。
步骤105,响应于所述第二输入,获取目标音频段。
本发明实施例中,目标音频段可以是用户最终想要的音频段。该目标音频段可以是用户直接输入的,也可以是电子设备通过对待处理字段进行编辑,获取到的。其中,对待处理字段编辑的具体方式可以是根据第二输入决定的。例如,可以在第二输入是输入需要添加的字段的操作时,为待处理字段中添加新的字段。在第二输入是针对待处理字段的删除操作时,删除待处理字段,等等。由于第二输入是用户根据对待处理字段对应的音频段的修改需求执行的,因此,通过对待处理字段进行编辑,可以确保获取到的目标音频段即为用户最终想要的音频对应的字段。
步骤106,根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
本发明实施例中,根据目标音频段进行修改时,可以在文本信息中包含的各个字段对应的播放时段中,读取待处理字段对应的播放时段,然后将待处理字段对应的播放时段处的音频段修改为目标音频段,进而实现对待处理音频的修改。
综上所述,本发明实施例提供的音频处理方法,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,响应于该第二输入,获取目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
图2-1表示本发明的音频处理方法的另一实施例的步骤流程图,该方法可以应用于电子设备,如图2-1所示,该方法可以包括步骤201至步骤207。
步骤201、获取待处理音频对应的文本信息,所述文本信息包括待处 理文本及所述待处理文本中各个字段对应的播放时段。
本发明实施例中,电子设备可以通过下述步骤2011~步骤2013实现获取待处理音频对应的文本信息:
步骤2011、检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段。
本发明实施例中,待处理音频可以是视频中的音频,相应地,字幕文件可以是与视频匹配的字幕文件。待处理音频也可以是独立的音频,比如,歌曲等,相应地,字幕文件可以是与歌曲相匹配的歌词文件。检测是否存在与待处理音频匹配的字幕文件,可以是在网上搜索是否有与待处理音频相匹配的字幕文件,也可以是在本地搜索是否有匹配的字幕文件。
步骤2012、若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息。
本发明实施例中,将字幕文件作为待处理音频对应的文本信息,可以是将字幕文件中包含的字幕文本作为待处理音频对应的待处理文本,并将字幕文本中各个字段对应的播放时段作为该字段在待处理音频中的播放时段。通过检测是否存在匹配的字幕文件,在存在字幕文件的情况下,将字幕文件作为待处理音频对应的文本信息,可以省去根据音频生成文本的步骤,进而一定程度上节省音频处理的时间。
步骤2013、若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
本发明实施例中,将待处理音频转换为文本,可以是利用语音转文字的方法实现的。具体的,可以先对音频进行处理,以去除音频中的杂音,避免对转换过程造成干扰,然后,再提取音频中的特征值,并将音频划分为更小的音频段,使得该音频段中包含一个或多个特征值,根据音频段的特征值与音频模型库中的模型特征值进行匹配,将匹配得到的模型特征值对应的文字,确定为该音频段对应的文本。生成文本中各个字段对应的播放时段,可以是在音频转换为文本的过程中,读取将划分的音频段对应的 播放时段,然后,将划分的音频段对应的播放时段,作为该字段对应的播放时段。这样,在不存在字幕文件的时候,通过音频生成对应的文本,可以得到与待处理音频相匹配的文本内容,进而确保能够为后续提供到准确的文本信息。
步骤202、接收针对所述待处理文本的第一输入。
本发明实施例中,可以接收针对待处理文本的第一输入之前,通过下述步骤实现显示待处理文本:
显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本。
具体的,预设画面可以是根据实际情况预先设置的,示例的,预设画面可以与待处理音频存在关联的画面,例如,可以是待处理音频所属视频的视频封面,或者是待处理音频所属的音频专辑的封面,又或者是待处理音频的演唱者的照片,等等,本发明实施例对此不作限定。进一步地,通过在预设画面中显示所有的待处理文本,可以方便用户直观看到完整待处理文本,同时使用与待处理文本相关的预设画面,可以提高用户的观看体验。示例的,图2-2表示本发明实施例提供的显示待处理文本的示例的示意图,如图2-2所示,在包含演唱过待处理音频的某一演唱者的相关画面中,显示所有的待处理文本。需要说明的是,实际应用场景中,待处理文本的数量可能较多,而受到电子设备屏幕尺寸的限制,可能无法一次性显示出完整的待处理文本,因此,可以在对待处理文本进行滚动显示,以确保能够完整显示。
进一步地,也可以显示视频画面,并在视频画面中显示对应的待处理文本。其中,视频画面对应的待处理文本,可以是播放时段与该视频画面所在播放时段相同的文本。由于视频画面的内容与视频画面对应的待处理文本往往具有较强的关联,因此,这样,分别在视频画面中显示的方式,可以方便用户同时观察到视频画面的内容及文本内容,进而方便用户进行选择。具体在显示时,可以是在视频画面上生成文本显示框,在文本显示框中显示待处理文本,该显示框的具体形式可以根据实际情况预先设定。 示例的,图2-3表示本发明实施例提供的显示待处理文本的另一示例的示意图,如图2-3所示,在视频画面中显示有对应的待处理文本,即“让我掉下眼泪的不止昨夜的酒”。
相应地,电子设备可以通过接收对显示的待处理文本的选择输入,实现接收第一输入。这样,通过在预设画面或视频画面中显示待处理文本,可以为用户提供可视化的选择场景以及丰富的信息,使用户能够便捷的选择出待处理文本,提高选择效率。
步骤203、响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段。
具体的,将待处理文本中第一输入指示的字段确定为待处理字段时,可以是从所有待处理文本中搜索包含的所有第一输入指示的字段,然后将搜索到的字段确定为待处理字段;其中,第一输入指示的字段可以是用户针对显示的待处理文本的选择输入所选中的字段。或者,第一输入可以是通过预设搜索区域执行,第一输入指示的字段可以是通过该搜索区域输入的。相应地,电子设备可以在本步骤之前,显示搜索区域;然后接收用户通过所述搜索区域执行的第一输入。这样,用户仅需选择一次,即可实现控制电子设备对所有相同字段的进行修改,进而可以提高选择效率。
进一步地,在执行步骤203之后还可以通过下述步骤A~步骤C实现调整音频音量。
步骤A、接收针对所述待处理文本的第三输入。
本发明实施例中,针对待处理文本的第三输入,可以是在显示待处理文本的界面上执行的,第三输入可以是对待处理文本字体的调整操作。用户可以在需要对待处理文本字体进行调整时,执行第三输入,相应地,电子设备可以接收该第三输入。
步骤B、响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段。
本发明实施例中,对第三输入所指示的待调整字段的字体大小进行调整,可以是根据第三输入指示的调整操作,对待调整字段字体大小进行放大或缩小,得到调整后的待调整字段。
步骤C、根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小,其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
本发明实施例中,根据调整后的待调整字段的字体大小,调整待调整字段对应音频的音量大小时,可以是先确定调整后的待调整字段的字体大小,然后根据预设的字体大小与音量大小对应关系,确定调整后的待调整字段的字体大小所对应的音量,最后,将该待调整字段对应音频的音量大小设置为该待调整字段对应音频的音量,进而实现音量调整。其中,该预设的字体大小与音量大小对应关系中,字体越大,音量越大。
示例的,假设调整后的待调整字段的字体大小为四号,四号字体对应的音量为60分贝,那么相应地,可以将该待调整字段对应音频的音量大小设置为该60分贝。这样,用户仅需对文本字体大小进行调整,即可相应地控制调整对应音频的音量大小,使得音频音量调整的过程更为简便,进而可以提高调整效率。
进一步地,本发明实施例还可以预先设置用于调整字体大小的曲线。相应地,用户可以从待处理文本中选择需要调整字体大小的待调整字段,然后调整该曲线的形状,进而实现输入第二输入。进一步地,在对字体大小进行调整时,可以根据调整后的曲线的各个分段的高度,依次对待调整字段中包含的各个字的大小进行调整;其中,分段的高度可与该字的大小成正比,或成反比。这样,用户只需对曲线形状进行调整,就可实现对应音频段的音量大小。同时,由于曲线形状的多样性,使得待处理字段对应的音频段的音量大小具有多种可能,例如,用户可以通过将曲线调整为波浪形,控制待调整字段对应的音量忽大忽小,进而可以提高音频的趣味性。
步骤204、接收针对所述待处理字段的第二输入。
具体的,本步骤的实现方式可以参照前述步骤104,本发明实施例对此不作限定。
步骤205、根据所述第二输入,对所述待处理字段进行编辑,得到目标字段。
本发明实施例中,若所述第二输入为删除输入,则可以认为用户需要 删掉待处理字段,因此,可以将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段。
进一步地,若所述第二输入为替换输入,则可以认为用户需要对待处理字段进行替换,因此,可以获取所述第二输入对应的待替换字段;将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段。其中,获取第二输入对应的待替换字段,可以是提取第二输入中包含的字段,将该字段作为待替换字段,也可以是提取第二输入中包含的语音,并根据语音转文字的方法得到该语音对应的文字,将得到的文字作为待替换字段。
进一步地,若所述第二输入为增加输入,则可以认为用户需要对待处理字段添加新的字段,因此,可以获取所述第二输入对应的待增加字段;在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。本发明实施例中,获取第二输入对应的待增加字段,可以是提取第二输入中包含的字段,将该字段作为待增加字段,也可以是提取第二输入中包含的语音,并根据语音转文字的方法得到该语音对应的文字,将得到的文字作为待增加字段。在本发明实施例中,通过根据不同的第二输入可以执行相应的编辑操作,进而可以满足用户多种的修改需求,提高音频修改效果。
需要说明的是,还可以为显示的待处理字段增加预设标记,以及,根据待处理字段对应的显示位置,显示待替换字段或待增加字段。具体的,预设标记可以是反映对待处理字段执行的具体编辑操作的标记,不同的编辑操作对应的预设标记不同。示例的,若编辑操作为将待处理字段删除,则预设标记可以为在待处理字段上添加的删除线,也可以为在待处理字段上添加表示该字段被删除的文字标记。若编辑操作为替换待处理字段,则预设标记可以为在待处理字段上添加的下划线,也可以为在待处理字段上添加表示该字段被替换的文字标记,并在待处理字段旁显示待替换字段,具体的显示位置可以根据实际情况设定。若编辑操作为在待处理字段的位置增加字段,则预设标记可以为在待处理字段对应的位置处添加增加字段标记,比如箭头等,以指示在该位置处添加字段。同时可以显示添加的待增加字段,以方便用户获知具体添加了什么字段。具体的标记方式可以是 多样的,本发明实施例对此不作限制。通过在待处理字段上添加预设标记,可以使得用户可以更清楚的获知示被修改的待处理字段所在位置,以及具体对其所执行的编辑操作。
进一步地,显示位置可以是根据实际需求预先设定,示例的,该显示位置可以是待删除字段的下方。这样,通过在待删除字段对应的显示位置中显示待替换字段或待增加字段,可以方便用户快速获知具体修改的内容,进而方便用户之后进行检查。
示例的,图2-4表示本发明实施例提供的编辑待处理文本的示例的示意图,如图2-4所示,待处理字段为“让我掉下眼泪的”,而第二输入为删除输入,则删除待处理字段,即在“让我掉下眼泪的”上添加删除线并删除。
图2-5表示本发明实施例提供的编辑待处理文本的另一示例的示意图,如图2-5所示,待处理字段为“眼泪”,而第二输入为替换输入,则删除待处理字段并显示待替换字段,即在“眼泪”上添加删除线,待处理字段下方的“口水”为待替换字段。
图2-6表示本发明实施例提供的编辑待处理文本的又一示例的示意图,如图2-6所示,待处理字段指示的位置为“我”和“掉”之间,而第二输入为增加输入,则用箭头指示待处理字段的位置,箭头下方的“今天”为待增加字段。
步骤206、将所述目标字段对应的音频,确定为所述目标音频段。
具体的,可以先对文本进行语言学分析将目标字段切分为词语,再根据切分得到的词语,从语音合成库中提取匹配的字段对应的音频波形段,将各个词语对应的音频波形段合成,得到该文本对应的音频段。也可以是在待处理音频中检索是否存在与目标字段相同的字段,若存在,则提取该相同字段对应的音频段,作为目标字段对应的音频,进而得到目标音频段。
步骤207、根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
具体的,本步骤中可以先从各个字段对应的播放时段中获取待处理字段对应的播放时段,然后,获取所述待处理音频对应的音频波形图,最后, 将待处理字段对应的播放时段在音频波形图中的对应波段,修改为目标音频段对应的音频波段,得到目标音频。
其中,获取待处理字段对应的播放时段时,可以从各个字段中查找该待处理字段,然后读取该待处理字段对应的播放时段。获取待处理音频对应的音频波形图时,可以是通过提取音频中包含的特征,比如振动频率等,对该特征进行处理,比如归一化处理等,得到根据播放时间显示音频特征的波形图。
进一步地,将待处理字段对应的播放时段在音频波形图中的对应波段,修改为目标音频段对应的空白波段时,可以使用该空白波段对该对应波段进行替换,以实现修改。或者,也可以是直接将该对应波段删除,以实现修改。需要说明的是,在删除时,可以是去掉该对应波段的波形显示,将其改成直线,以示声音被删除。
若目标字段为待替换字段,可以直接使用目标音频段对应的音频波段对该对应波段进行替换,也可以是先将该对应波段删除,然后在删除的位置上添加目标音频段对应的音频波段中待替换字段对应的音频波段。若目标字段为待增加字段,可以直接使用目标音频段对应的音频波段对该对应波段进行替换,也可以根据待处理字段对应的播放时段在音频波形图中对应波段的位置,添加目标音频段对应的音频波段中待增加字段对应的音频波段,将合成得到的音频波段作为目标音频。这样,通过在音频波形图中对应修改待处理音频的波段,实现对待处理音频的修改,可以使得修改过程更加精准,进而可以提高修改的准确性。
需要说明的是,电子设备还可以在获取到音频波形图之后,执行下述操作:
显示所述待处理音频对应的音频波形图;对所述待处理字段对应的播放时段在所述音频波形图中的对应波段,进行标记。其中,标记可以是将对应波段填充不同的颜色,也可以是在对应波段的位置处添加标注,具体的标记形式本发明实施例不作限制。这样,通过显示待处理音频对应的音频波形图,并在在音频波形图中标记待处理字段对应的波段,可以方便用户查看被修改的音频波段。
需要说明的是,在修改音频波段之前,还可以对待处理音频进行处理,将待处理音频中的人声与背景音进行分离,然后,提取待处理音频中的人声,并对该人声对应的音频波段进行相应地修改,最后,将修改后的人声与背景音合成,得到目标音频。这样,通过只针对人声的修改,而保留音频中的背景音,可以极大地减少对音频的修改程度,使得修改后的音频更加自然,连贯。
综上所述,本发明实施例提供的音频处理方法,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,根据该第二输入,对待处理字段进行编辑,得到目标字段,将目标字段对应的音频,确定为目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,通过根据不同的第二输入可以执行相应的编辑操作,进而可以满足用户多种的修改需求,提高音频的修改效果。同时,用户无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
图3表示本发明的音频处理方法的又一实施例的步骤流程图,该方法可以应用于电子设备,如图3所示,该方法可以包括步骤301至步骤307。
步骤301、获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段。
具体的,本步骤的实现方式可以参照前述步骤201,本发明实施例对此不作限定。
步骤302、接收针对所述待处理文本的第一输入。
具体的,本步骤的实现方式可以参照前述步骤202,本发明实施例对此不作限定。
步骤303、响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段。
具体的,本步骤的实现方式可以参照前述步骤203,本发明实施例对此不作限定。
步骤304、接收针对所述待处理字段的第二输入。
具体的,本步骤的实现方式可以参照前述步骤104,本发明实施例对此不作限定。
步骤305、提取所述第二输入中携带的音频段。
本发明实施例中,第二输入可以为音频录制操作,相应地,第二输入中携带的音频段可以是用户录制的语音段。第二输入也可以为音频上传操作,相应地,第二输入中携带的音频段,也可以是用户选择要上传的音频段。
步骤306、将所述音频段确定为所述目标音频段。
本发明实施例中,由于第二输入往往是根据用户对待处理字段对应的音频段的修改需求执行的,即,第二输入中携带的音频段即为用户最终想要的音频段。因此,可以直接将该音频段确定为目标音频段。
需要说明的是,本发明实施例中,还可以在将输入的音频段确定为目标音频段之前,提示用户是否要对输入的音频段进行处理,若是,根据用户操作对输入的音频段进行截取,将截取得到的音频段作为目标音频段。这样,通过提示用户是否处理输入的音频段,可以进一步提高目标音频段的质量。
步骤307、根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
具体的,本步骤的实现方式可以参照前述步骤207,本发明实施例对此不作限定。
综上所述,本发明实施例提供的音频处理方法,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,提取该第二输入中携带的音频段,将音频段确定为目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,通过直接提取第二输入中携带的音频段,即可便捷的得到目标音频段,因此,可以提高处理效 率。同时,用户无需手动调整进度条,即可实现对音频的修改,进而可以进一步提高音频处理效率。
以上介绍了本发明实施例提供的音频处理方法,下面将结合附图介绍本发明实施例提供的电子设备。
参见图4,本发明实施例还提供了一种电子设备,如图4所示,该电子设备40可以包括:
第一获取模块401,用于获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;
第一接收模块402,用于接收针对所述待处理文本的第一输入;
第一确定模块403,用于响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;
第二接收模块404,用于接收针对所述待处理字段的第二输入;
第二获取模块405,用于根据所述第二输入,获取目标音频段;
第二确定模块406,用于根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
综上所述,本发明实施例提供的电子设备,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,响应于该第二输入,获取目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
可选的,所述第二获取模块405用于:
根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,将所述目标字段对应的音频,确定为所述目标音频段;或者,
提取所述第二输入中携带的音频段,将所述音频段确定为所述目标音频段。
可选的,所述第二获取模块405还用于:
若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;
若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段,将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;
若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段,在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
可选的,所述电子设备40还包括:
第一显示模块,用于显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本。
所述第一接收模块402还用于:
接收对显示的待处理文本的选择输入。
可选的,所述电子设备40还包括:
第二显示模块,用于为显示的所述待处理字段增加预设标记,以及,根据所述待处理字段对应的显示位置,显示所述待替换字段或待增加字段。
可选的,所述第二确定模块406用于:
从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;
获取所述待处理音频对应的音频波形图;
将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
可选的,所述电子设备40还包括:
第三接收模块,用于接收针对所述待处理文本的第三输入;
第一调整模块,用于响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;
第二调整模块,用于根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小,其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
可选的,所述第一获取模块401用于:
检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;
若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;
若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
综上所述,本发明实施例提供的电子设备,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,根据该第二输入,对待处理字段进行编辑,得到目标字段,将目标字段对应的音频,确定为目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,通过根据不同的第二输入可以执行相应的编辑操作,进而可以满足用户多种的修改需求,提高音频的修改效果。同时,用户无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
图5表示本发明的电子设备的实施例的硬件结构示意图。
该电子设备500包括但不限于:射频单元501、网络模块502、音频输出单元503、输入单元504、传感器505、显示单元506、用户输入单元507、接口单元508、存储器509、处理器510、以及电源511等部件。本领域技术人员可以理解,图5中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。在本发明实施例中,电子设备包括但不限于手机、平板电脑、笔记本电脑、掌上电脑、车载终端、可穿戴设备、以及计步器等。
其中,处理器510,用于获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段。
处理器510,用于接收针对所述待处理文本的第一输入。
处理器510,用于响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段。
处理器510,用于接收针对所述待处理字段的第二输入。
处理器510,用于响应于所述第二输入,获取目标音频段。
处理器510,用于根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
综上所述,本发明实施例提供的电子设备,会先获取待处理音频对应的文本信息,其中,文本信息包括待处理文本及待处理文本中各个字段对应的播放时段,再接收针对待处理文本的第一输入,响应于该第一输入,根据第一输入指示的字段,确定待处理文本中的待处理字段,接着,接收针对待处理字段的第二输入,响应于该第二输入,获取目标音频段,最后,根据目标音频段,对待处理字段对应的播放时段处的音频段进行修改,得到目标音频。这样,无需手动调整进度条,即可实现对音频的修改,因此可以提高音频处理效率。
可选的,处理器510用于:
根据所述第二输入,对所述待处理字段进行编辑,得到目标字段;将所述目标字段对应的音频,确定为所述目标音频段;或者,
提取所述第二输入中携带的音频段;将所述音频段确定为所述目标音频段。
可选的,处理器510还用于:
若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;
若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段;将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;
若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段;在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
可选的,显示单元506,用于:
显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面;在所述视频画面中显示与所述视频画面对应的待处理文本。
相应地,用户输入单元507,用于接收对显示的待处理文本的选择输入。
可选的,处理器510用于:
从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;
获取所述待处理音频对应的音频波形图;
将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
可选的,用户输入单元507用于:
接收针对所述待处理文本的第三输入。
处理器510用于:
响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;
根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小;其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
可选的,处理器510用于:
检测是否存在与所述待处理音频匹配的字幕文件;所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;
若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;
若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段;将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
应理解的是,本发明实施例中,射频单元501可用于收发信息或通话 过程中,信号的接收和发送,具体的,将来自基站的下行数据接收后,给处理器510处理;另外,将上行的数据发送给基站。通常,射频单元501包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元501还可以通过无线通信系统与网络和其他设备通信。
电子设备通过网络模块502为用户提供了无线的宽带互联网访问,如帮助用户收发电子邮件、浏览网页和访问流式媒体等。
音频输出单元503可以将射频单元501或网络模块502接收的或者在存储器509中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元503还可以提供与电子设备500执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元503包括扬声器、蜂鸣器以及受话器等。
输入单元504用于接收音频或视频信号。输入单元504可以包括图形处理器(Graphics Processing Unit,GPU)5041和麦克风5042,图形处理器5041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元506上。经图形处理器5041处理后的图像帧可以存储在存储器509(或其它存储介质)中或者经由射频单元501或网络模块502进行发送。麦克风5042可以接收声音,并且能够将这样的声音处理为音频数据。处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元501发送到移动通信基站的格式输出。
电子设备500还包括至少一种传感器505,比如光传感器、运动传感器以及其他传感器。具体地,光传感器包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板5061的亮度,接近传感器可在电子设备500移动到耳边时,关闭显示面板5061和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别电子设备姿态(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;传感器505还可以包括指纹传感器、压 力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等,在此不再赘述。
显示单元506用于显示由用户输入的信息或提供给用户的信息。显示单元606可包括显示面板5061,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板5061。
用户输入单元507可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。具体地,用户输入单元507包括触控面板5071以及其他输入设备5072。触控面板5071,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板5071上或在触控面板5071附近的操作)。触控面板5071可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器510,接收处理器510发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板5071。除了触控面板5071,用户输入单元507还可以包括其他输入设备5072。具体地,其他输入设备5072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
进一步的,触控面板5071可覆盖在显示面板5061上,当触控面板6071检测到在其上或附近的触摸操作后,传送给处理器510以确定触摸事件的类型,随后处理器510根据触摸事件的类型在显示面板5061上提供相应的视觉输出。虽然在图5中,触控面板5071与显示面板5061是作为两个独立的部件来实现电子设备的输入和输出功能,但是在某些实施例中,可以将触控面板5071与显示面板5061集成而实现电子设备的输入和输出功能,具体此处不做限定。
接口单元508为外部装置与电子设备500连接的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、 有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元508可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到电子设备500内的一个或多个元件或者可以用于在电子设备500和外部装置之间传输数据。
存储器509可用于存储软件程序以及各种数据。存储器509可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器509可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器510是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器509内的软件程序和/或模块,以及调用存储在存储器509内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。处理器510可包括一个或多个处理单元;优选的,处理器510可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器510中。
电子设备500还可以包括给各个部件供电的电源511(比如电池),优选的,电源511可以通过电源管理系统与处理器510逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
另外,电子设备500包括一些未示出的功能模块,在此不再赘述。
优选的,本发明实施例还提供一种电子设备,包括处理器510,存储器509,存储在存储器509上并可在所述处理器510上运行的音频处理程序,该音频处理程序被处理器510执行时实现上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质 上存储有音频处理程序,该音频处理程序被处理器执行时实现上述音频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。其中,所述的计算机可读存储介质的示例包括非暂态计算机可读存储介质,如只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。
上面参考根据本发明的实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各方面。应当理解,流程图和/或框图中的每个方框以及流程图和/或框图中各方框的组合可以由计算机程序指令实现。这些计算机程序指令可被提供给通用计算机、专用计算机、或其它可编程数据处理装置的处理器,以产生一种机器,使得经由计算机或其它可编程数据处理装置的处理器执行的这些指令使能对流程图和/或框图的一个或多个方框中指定的功能/动作的实现。这种处理器可以是但不限于是通用处理器、专用处理器、特殊应用处理器或者现场可编程逻辑电路。还可理解,框图和/或流程图中的每个方框以及框图和/或流程图中的方框的组合,也可以由执行指定的功能或动作的专用硬件来实现,或可由专用硬件和计算机指令的组合来实现。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本发明的保护之内。

Claims (18)

  1. 一种音频处理方法,应用于电子设备,所述方法包括:
    获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;
    接收针对所述待处理文本的第一输入;
    响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;
    接收针对所述待处理字段的第二输入;
    根据所述第二输入,获取目标音频段;
    根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
  2. 根据权利要求1所述的方法,其中,所述根据所述第二输入,获取目标音频段,包括:
    根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,将所述目标字段对应的音频,确定为所述目标音频段;或者,
    提取所述第二输入中携带的音频段,将所述音频段确定为所述目标音频段。
  3. 根据权利要求2所述的方法,其中,所述根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,包括:
    若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;
    若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段,将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;
    若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段,在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
  4. 根据权利要求3所述的方法,其中,所述待处理音频为待处理视频中包含的音频;
    所述接收针对所述待处理文本的第一输入之前,所述方法还包括:
    显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本;
    所述接收针对所述待处理文本的第一输入,包括:
    接收对显示的待处理文本的选择输入。
  5. 根据权利要求1至4任一所述的方法,其中,所述根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频,包括:
    从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;
    获取所述待处理音频对应的音频波形图;
    将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
  6. 根据权利要求1所述的方法,其中,所述根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段之后,所述方法还包括:
    接收针对所述待处理文本的第三输入;
    响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;
    根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小,其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
  7. 根据权利要求1所述的方法,其中,所述获取待处理音频对应的文本信息,包括:
    检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;
    若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;
    若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
  8. 一种电子设备,包括:
    第一获取模块,用于获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;
    第一接收模块,用于接收针对所述待处理文本的第一输入;
    第一确定模块,用于响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;
    第二接收模块,用于接收针对所述待处理字段的第二输入;
    第二获取模块,用于根据所述第二输入,获取目标音频段;
    第二确定模块,用于根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
  9. 根据权利要求8所述的电子设备,其中,所述第二获取模块用于:
    根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,将所述目标字段对应的音频,确定为所述目标音频段;或者,
    提取所述第二输入中携带的音频段,将所述音频段确定为所述目标音频段。
  10. 根据权利要求9所述的电子设备,其中,所述第二获取模块还用于:
    若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;
    若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段,将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;
    若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段,在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
  11. 根据权利要求10所述的电子设备,还包括:
    第一显示模块,用于显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本;
    所述第一接收模块,还用于:
    接收对显示的待处理文本的选择输入。
  12. 根据权利要求8至11任一所述的电子设备,其中,所述第二确定模块用于:
    从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;
    获取所述待处理音频对应的音频波形图;
    将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
  13. 根据权利要求8所述的电子设备,还包括:
    第三接收模块,用于接收针对所述待处理文本的第三输入;
    第一调整模块,用于响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;
    第二调整模块,用于根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小;其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
  14. 根据权利要求8所述的电子设备,其中,所述第一获取模块用于:
    检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;
    若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;
    若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
  15. 一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的音频处理程序,所述音频处理程序被所述处理器执行时实现如权利要求1至7中任一项所述的音频处理方法的步骤。
  16. 一种电子设备,被配置成用于执行如权利要求1至7中任一项所述的音频处理方法的步骤。
  17. 一种计算机可读存储介质,所述计算机可读存储介质上存储音频处理程序,所述音频处理程序被处理器执行时实现如权利要求1至7中任一项所述的音频处理方法的步骤。
  18. 一种计算机程序产品,所述程序产品可被处理器执行以实现如权利要求1至7中任一项所述的音频处理方法的步骤。
PCT/CN2021/079144 2020-03-11 2021-03-04 音频处理方法及电子设备 WO2021179991A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020227033855A KR20220149570A (ko) 2020-03-11 2021-03-04 오디오 처리 방법 및 전자 기기
EP21767696.4A EP4120268A4 (en) 2020-03-11 2021-03-04 SOUND PROCESSING METHOD AND ELECTRONIC DEVICE
US17/940,057 US20230005506A1 (en) 2020-03-11 2022-09-08 Audio processing method and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010167788.0A CN111445927B (zh) 2020-03-11 2020-03-11 一种音频处理方法及电子设备
CN202010167788.0 2020-03-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/940,057 Continuation US20230005506A1 (en) 2020-03-11 2022-09-08 Audio processing method and electronic device

Publications (1)

Publication Number Publication Date
WO2021179991A1 true WO2021179991A1 (zh) 2021-09-16

Family

ID=71627433

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079144 WO2021179991A1 (zh) 2020-03-11 2021-03-04 音频处理方法及电子设备

Country Status (5)

Country Link
US (1) US20230005506A1 (zh)
EP (1) EP4120268A4 (zh)
KR (1) KR20220149570A (zh)
CN (1) CN111445927B (zh)
WO (1) WO2021179991A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111445927B (zh) * 2020-03-11 2022-04-26 维沃软件技术有限公司 一种音频处理方法及电子设备
CN112102841A (zh) * 2020-09-14 2020-12-18 北京搜狗科技发展有限公司 一种音频编辑方法、装置和用于音频编辑的装置
CN112669885B (zh) * 2020-12-31 2023-04-28 咪咕文化科技有限公司 一种音频剪辑方法、电子设备及存储介质
CN114915836A (zh) * 2022-05-06 2022-08-16 北京字节跳动网络技术有限公司 用于编辑音频的方法、装置、设备和存储介质
CN115695848A (zh) * 2022-10-28 2023-02-03 杭州遥望网络科技有限公司 一种直播数据处理方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177536A1 (en) * 2007-01-24 2008-07-24 Microsoft Corporation A/v content editing
CN202502737U (zh) * 2012-03-12 2012-10-24 中国人民解放军济南军区司令部第二部 一种视音频信息的智能编辑系统
CN104135628A (zh) * 2013-05-03 2014-11-05 安凯(广州)微电子技术有限公司 一种视频编辑方法及终端
US9185225B1 (en) * 2011-06-08 2015-11-10 Cellco Partnership Method and apparatus for modifying digital messages containing at least audio
CN108984788A (zh) * 2018-07-30 2018-12-11 珠海格力电器股份有限公司 一种录音文件整理、归类系统及其控制方法与录音设备
CN111445927A (zh) * 2020-03-11 2020-07-24 维沃软件技术有限公司 一种音频处理方法及电子设备
CN112102841A (zh) * 2020-09-14 2020-12-18 北京搜狗科技发展有限公司 一种音频编辑方法、装置和用于音频编辑的装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785649B1 (en) * 1999-12-29 2004-08-31 International Business Machines Corporation Text formatting from speech
WO2005116992A1 (en) * 2004-05-27 2005-12-08 Koninklijke Philips Electronics N.V. Method of and system for modifying messages
US10445052B2 (en) * 2016-10-04 2019-10-15 Descript, Inc. Platform for producing and delivering media content
CN107633850A (zh) * 2017-10-10 2018-01-26 维沃移动通信有限公司 一种音量调节方法及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080177536A1 (en) * 2007-01-24 2008-07-24 Microsoft Corporation A/v content editing
US9185225B1 (en) * 2011-06-08 2015-11-10 Cellco Partnership Method and apparatus for modifying digital messages containing at least audio
CN202502737U (zh) * 2012-03-12 2012-10-24 中国人民解放军济南军区司令部第二部 一种视音频信息的智能编辑系统
CN104135628A (zh) * 2013-05-03 2014-11-05 安凯(广州)微电子技术有限公司 一种视频编辑方法及终端
CN108984788A (zh) * 2018-07-30 2018-12-11 珠海格力电器股份有限公司 一种录音文件整理、归类系统及其控制方法与录音设备
CN111445927A (zh) * 2020-03-11 2020-07-24 维沃软件技术有限公司 一种音频处理方法及电子设备
CN112102841A (zh) * 2020-09-14 2020-12-18 北京搜狗科技发展有限公司 一种音频编辑方法、装置和用于音频编辑的装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4120268A4 *

Also Published As

Publication number Publication date
KR20220149570A (ko) 2022-11-08
CN111445927B (zh) 2022-04-26
CN111445927A (zh) 2020-07-24
EP4120268A1 (en) 2023-01-18
EP4120268A4 (en) 2023-06-21
US20230005506A1 (en) 2023-01-05

Similar Documents

Publication Publication Date Title
WO2021179991A1 (zh) 音频处理方法及电子设备
CN109819313B (zh) 视频处理方法、装置及存储介质
CN110381371B (zh) 一种视频剪辑方法及电子设备
WO2021078116A1 (zh) 视频处理方法及电子设备
CN111010610B (zh) 一种视频截图方法及电子设备
US20220284928A1 (en) Video display method, electronic device and medium
CN110557565B (zh) 视频处理方法和移动终端
WO2021233293A1 (zh) 笔记记录方法及电子设备
CN111050070B (zh) 视频拍摄方法、装置、电子设备及介质
WO2021104160A1 (zh) 编辑方法及电子设备
WO2021073478A1 (zh) 弹幕信息识别方法、显示方法、服务器及电子设备
WO2021036553A1 (zh) 图标显示方法及电子设备
CN111010608B (zh) 视频播放的方法及电子设备
CN109819167B (zh) 一种图像处理方法、装置和移动终端
CN110830368B (zh) 即时通讯消息发送方法及电子设备
CN111491211B (zh) 视频处理方法、视频处理装置及电子设备
WO2021036659A1 (zh) 视频录制方法及电子设备
CN110719527A (zh) 一种视频处理方法、电子设备及移动终端
WO2021238837A1 (zh) 信息显示的方法、装置、电子设备、介质及程序产品
CN109391842B (zh) 一种配音方法、移动终端
CN110808019A (zh) 一种歌曲生成方法及电子设备
CN111601174A (zh) 一种字幕添加方法及装置
CN108763475B (zh) 一种录制方法、录制装置及终端设备
WO2019076377A1 (zh) 图像的查看方法及移动终端
WO2021083090A1 (zh) 消息发送方法及移动终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21767696

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20227033855

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021767696

Country of ref document: EP

Effective date: 20221011