WO2021179991A1 - 音频处理方法及电子设备 - Google Patents
音频处理方法及电子设备 Download PDFInfo
- Publication number
- WO2021179991A1 WO2021179991A1 PCT/CN2021/079144 CN2021079144W WO2021179991A1 WO 2021179991 A1 WO2021179991 A1 WO 2021179991A1 CN 2021079144 W CN2021079144 W CN 2021079144W WO 2021179991 A1 WO2021179991 A1 WO 2021179991A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processed
- field
- audio
- input
- text
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 55
- 230000004044 response Effects 0.000 claims abstract description 29
- 238000010586 diagram Methods 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 18
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 24
- 238000012986 modification Methods 0.000 description 16
- 230000004048 modification Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 13
- 230000000694 effects Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000036651 mood Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
Definitions
- the present invention relates to the field of communication technology, in particular to an audio processing method and electronic equipment.
- the user In the traditional technology, the user often manually adjusts the audio progress bar to find the playback period of the audio segment that needs to be modified, and then modify the audio segment at the playback period. During the operation, the user often needs to repeatedly adjust the progress bar to accurately locate the playback period of the audio clip that needs to be modified. The entire operation process is cumbersome and the audio processing efficiency is low.
- the embodiments of the present invention provide an audio processing method and an electronic device to solve the problem of complicated operation process and low audio processing efficiency when modifying audio content.
- the present invention is implemented as follows:
- an embodiment of the present invention provides an audio processing method applied to an electronic device, and the method includes:
- the audio segment at the play time period corresponding to the field to be processed is modified to obtain the target audio.
- an embodiment of the present invention also provides an electronic device, including:
- the first acquisition module is configured to acquire text information corresponding to the to-be-processed audio, the text information includes the to-be-processed text and the play time period corresponding to each field in the to-be-processed text;
- the first receiving module is configured to receive the first input for the to-be-processed text
- the first determining module is configured to determine the field to be processed in the text to be processed according to the field indicated by the first input in response to the first input;
- the second receiving module is configured to receive a second input for the field to be processed
- the second acquisition module is configured to acquire the target audio segment according to the second input
- the second determining module is configured to modify the audio segment at the playback period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
- an embodiment of the present invention provides an electronic device, including a processor, a memory, and an audio processing program stored on the memory and running on the processor, and the audio processing program is processed by the processor.
- the steps of the audio processing method as described in the first aspect are implemented when the device is executed.
- an embodiment of the present invention provides a computer-readable storage medium, and an audio processing program is stored on the computer-readable storage medium.
- the audio processing program is executed by a processor, the audio processing program described in the first aspect is implemented. Processing method steps.
- the audio processing method and electronic device provided by the embodiments of the present invention will first obtain the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then Receive a first input for the text to be processed, in response to the first input, determine the field indicated by the first input in the text to be processed as the field to be processed, and then receive a second input for the field to be processed, in response to the first input The second input is to obtain the target audio segment. Finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio. In this way, the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- Figure 1 shows a flowchart of the steps of an embodiment of the audio processing method of the present invention
- FIG 2-1 shows a flowchart of another embodiment of the audio processing method of the present invention
- Figure 2-2 shows a schematic diagram of an example of displaying text to be processed according to an embodiment of the present invention
- Figure 2-3 shows a schematic diagram of another example of displaying to-be-processed text provided by an embodiment of the present invention
- Figures 2-4 show schematic diagrams of examples of editing to-be-processed text provided by an embodiment of the present invention
- Fig. 2-5 shows a schematic diagram of another example of editing to-be-processed text provided by an embodiment of the present invention
- FIGS. 2-6 are schematic diagrams showing another example of editing text to be processed according to an embodiment of the present invention.
- Figure 3 shows a flow chart of the steps of another embodiment of the audio processing method of the present invention.
- Figure 4 shows a structural block diagram of an embodiment of the electronic device of the present invention
- FIG. 5 shows a schematic diagram of the hardware structure of an embodiment of the electronic device of the present invention.
- FIG. 1 shows a step flow chart of an embodiment of the audio processing method of the present invention.
- the method may be applied to an electronic device. As shown in FIG. 1, the method may include steps 101 to 106.
- Step 101 Acquire text information corresponding to a to-be-processed audio, where the text information includes the to-be-processed text and a play time period corresponding to each field in the to-be-processed text.
- the audio to be processed can be the audio stored locally, or the audio that needs to be modified downloaded from the Internet.
- the audio to be processed can be obtained directly through audio recording, or recorded during video recording. Yes, that is, the audio to be processed can be the audio extracted from the video.
- the text to be processed may be text corresponding to the audio to be processed, and the corresponding text may be obtained by converting the audio to be processed according to an audio-to-text method.
- the play time period corresponding to each field in the text to be processed may be the play time period of the audio corresponding to the field in the audio to be processed.
- the playback period of "5.1 seconds to 5.9 seconds" can be determined as The playing time period corresponding to the field "good mood”.
- Step 102 Receive a first input for the to-be-processed text.
- the first input for the text to be processed may be an operation of selecting a field in the text to be processed that needs to be modified on an interface displaying the text to be processed.
- the operation can be a single click, a double click, and so on.
- Step 103 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
- the field of the first input instruction refers to the field selected by the user through the first input, that is, the field corresponding to the audio that the user needs to modify. Therefore, the field to be processed can be determined according to the field of the first input instruction Field. According to the field of the first input instruction, when determining the field to be processed in the text to be processed, the field of the first input instruction in the text to be processed may be used as the field to be processed.
- Step 104 Receive a second input for the field to be processed.
- the second input for the field to be processed may be performed on an interface displaying the text to be processed, and the second input may be performed by the user according to the modification requirements of the audio segment corresponding to the field to be processed.
- the second input may be a delete operation for the field to be processed, or an input operation for replacing the field to be processed, or an operation for inputting a field to be added, or an input for replacing the corresponding field to be processed The operation of the audio segment of the audio segment.
- Step 105 In response to the second input, obtain a target audio segment.
- the target audio segment may be the audio segment ultimately desired by the user.
- the target audio segment may be directly input by the user, or it may be obtained by the electronic device by editing the field to be processed.
- the specific method for editing the field to be processed may be determined according to the second input. For example, when the second input is an operation of inputting a field to be added, a new field can be added to the field to be processed. When the second input is a delete operation for the field to be processed, delete the field to be processed, and so on. Since the second input is performed by the user according to the modification requirements of the audio segment corresponding to the field to be processed, by editing the field to be processed, it can be ensured that the acquired target audio segment is the field corresponding to the audio ultimately desired by the user.
- Step 106 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
- the play time period corresponding to the field to be processed when modifying according to the target audio segment, in the play time period corresponding to each field contained in the text information, the play time period corresponding to the field to be processed can be read, and then the play time period corresponding to the field to be processed can be changed.
- the audio segment is modified to the target audio segment, thereby realizing the modification of the audio to be processed.
- the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, in response to the second input , Acquire the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
- the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- FIG. 2-1 shows a step flow chart of another embodiment of the audio processing method of the present invention.
- the method may be applied to an electronic device. As shown in FIG. 2-1, the method may include step 201 to step 207.
- Step 201 Acquire text information corresponding to the audio to be processed, where the text information includes the text to be processed and the play time period corresponding to each field in the text to be processed.
- the electronic device may obtain the text information corresponding to the audio to be processed through the following steps 2011 to 2013:
- Step 2011 Detect whether there is a subtitle file matching the audio to be processed, the subtitle file including the subtitle text and the play time period corresponding to each field in the subtitle text.
- the audio to be processed may be audio in a video
- the subtitle file may be a subtitle file matching the video.
- the audio to be processed may also be an independent audio, such as a song, etc.
- the subtitle file may be a lyric file matching the song. Detecting whether there is a subtitle file that matches the audio to be processed can be searched online for a subtitle file that matches the audio to be processed, or search locally for a matching subtitle file.
- Step 2012 If there is a subtitle file matching the audio to be processed, use the subtitle file as text information corresponding to the audio to be processed.
- the subtitle file is used as the text information corresponding to the audio to be processed
- the subtitle text contained in the subtitle file can be used as the text to be processed corresponding to the audio to be processed
- the play period corresponding to each field in the subtitle text is taken as The playing period of this field in the audio to be processed.
- Step 2013 If there is no subtitle file matching the audio to be processed, convert the audio contained in the audio to be processed into text, and generate the audio segment according to the time information of the audio segment in the audio to be processed.
- the play time period corresponding to each field in the text, the text and the play time period corresponding to each field in the text are used as the text information corresponding to the audio to be processed.
- converting the to-be-processed audio into text may be realized by using the method of converting speech to text.
- the audio can be processed first to remove noise in the audio to avoid interference in the conversion process, and then the feature values in the audio are extracted, and the audio is divided into smaller audio segments, so that the audio segment is Contains one or more feature values, matches the feature value of the audio segment with the model feature value in the audio model library, and determines the text corresponding to the model feature value obtained by the matching as the text corresponding to the audio segment.
- Step 202 Receive a first input for the to-be-processed text.
- the text to be processed before receiving the first input for the text to be processed, the text to be processed may be displayed through the following steps:
- the preset picture may be preset according to actual conditions.
- the preset picture may be a picture associated with the audio to be processed.
- it may be the video cover of the video to which the audio to be processed belongs, or the audio to be processed.
- by displaying all the text to be processed in the preset screen it is convenient for the user to visually see the complete text to be processed, and at the same time, the user's viewing experience can be improved by using the preset screen related to the text to be processed.
- Figure 2-2 shows a schematic diagram of an example of displaying text to be processed according to an embodiment of the present invention.
- Pending text It should be noted that in actual application scenarios, the number of texts to be processed may be large, and due to the limitation of the screen size of electronic devices, the complete text to be processed may not be displayed at one time. Therefore, the text to be processed can be scrolled. To ensure complete display.
- a video screen can also be displayed, and the corresponding to-be-processed text can be displayed in the video screen.
- the to-be-processed text corresponding to the video screen may be text with the same playback period as the playback period of the video screen. Since the content of the video screen and the text to be processed corresponding to the video screen often have a strong correlation, the method of displaying in the video screen separately can facilitate the user to observe the content and text content of the video screen at the same time, thereby facilitating the user Make a selection.
- a text display box can be generated on the video screen, and the text to be processed is displayed in the text display box.
- the specific form of the display box can be preset according to actual conditions.
- Figure 2-3 shows a schematic diagram of another example of displaying text to be processed according to an embodiment of the present invention.
- the corresponding text to be processed is displayed in the video screen, that is, "Let me drop It’s not just last night’s wine that shed tears.”
- the electronic device can receive the first input by receiving the selection input of the displayed text to be processed.
- the electronic device can receive the first input by receiving the selection input of the displayed text to be processed.
- the user can be provided with a visual selection scene and rich information, so that the user can easily select the to-be-processed text and improve the selection efficiency.
- Step 203 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
- the field indicated by the first input indication in the text to be processed when the field indicated by the first input indication in the text to be processed is determined as the field to be processed, it may be searched for all the fields of the first input indication contained in all the text to be processed, and then the searched field is determined as the field to be processed Field; where the field indicated by the first input may be the field selected by the user for the selection and input of the displayed text to be processed.
- the first input may be performed through a preset search area, and the field indicated by the first input may be input through the search area.
- the electronic device may display the search area before this step; and then receive the first input performed by the user through the search area. In this way, the user only needs to select once to realize the control electronic device to modify all the same fields, thereby improving the selection efficiency.
- step 203 the audio volume can be adjusted through the following steps A to C.
- Step A Receive a third input for the to-be-processed text.
- the third input for the text to be processed may be performed on an interface displaying the text to be processed, and the third input may be an operation to adjust the font of the text to be processed.
- the user can perform a third input when the font of the text to be processed needs to be adjusted, and accordingly, the electronic device can receive the third input.
- Step B In response to the third input, adjust the font size of the field to be adjusted indicated by the third input to obtain the adjusted field.
- adjusting the font size of the field to be adjusted indicated by the third input may be based on the adjustment operation indicated by the third input to enlarge or reduce the font size of the field to be adjusted to obtain the adjusted field to be adjusted.
- Step C Adjust the volume size of the audio corresponding to the field to be adjusted according to the font size of the field to be adjusted after adjustment, wherein the larger the font of the field to be adjusted after adjustment, the field to be adjusted corresponds to the audio The louder the volume.
- the font size of the field to be adjusted after adjustment may be determined first, and then the font size of the field to be adjusted after adjustment is determined according to the preset font size.
- the volume size correspondence relationship determines the volume corresponding to the font size of the adjusted field to be adjusted.
- the volume of the audio corresponding to the field to be adjusted is set to the volume of the audio corresponding to the field to be adjusted, thereby realizing volume adjustment.
- the preset font size and the volume size the larger the font, the greater the volume.
- the volume of the audio corresponding to the field to be adjusted can be set to the 60 decibels.
- the user only needs to adjust the text font size, and can correspondingly control and adjust the volume of the corresponding audio, making the process of audio volume adjustment easier, and thereby improving the adjustment efficiency.
- a curve for adjusting the font size may be preset.
- the user can select the to-be-adjusted field whose font size needs to be adjusted from the to-be-processed text, and then adjust the shape of the curve, so as to input the second input.
- the size of each word contained in the adjustment field can be adjusted in turn according to the height of each segment of the adjusted curve; wherein, the height of the segment can be the same as the size of the word. Proportional or inversely proportional. In this way, the user only needs to adjust the shape of the curve to achieve the volume of the corresponding audio segment.
- the volume of the audio segment corresponding to the field to be processed has many possibilities.
- the user can adjust the curve to a wave shape to control the volume of the field to be adjusted to increase or decrease. Can improve the fun of audio.
- Step 204 Receive a second input for the field to be processed.
- this step can refer to the foregoing step 104, which is not limited in the embodiment of the present invention.
- Step 205 Edit the field to be processed according to the second input to obtain the target field.
- the second input is a delete input
- the second input is a replacement input
- the field to be replaced corresponding to the second input can be obtained; the field to be processed is deleted and displayed in the The position of the field to be processed, the field to be replaced is added, and the target field is obtained.
- obtaining the field to be replaced corresponding to the second input can be a method of extracting the field contained in the second input and using the field as the field to be replaced, or extracting the voice contained in the second input and converting it to text according to the voice Obtain the text corresponding to the voice, and use the obtained text as the field to be replaced.
- the second input is an additional input, it can be considered that the user needs to add a new field to the field to be processed. Therefore, the field to be added corresponding to the second input can be obtained; at the position of the field to be processed, Add the field to be added to obtain the target field.
- to obtain the field to be added corresponding to the second input may be to extract the field contained in the second input and use the field as the field to be added, or to extract the voice contained in the second input, and according to the voice
- the method of translating text obtains the text corresponding to the voice, and uses the obtained text as the field to be added.
- corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
- a preset mark can also be added to the displayed field to be processed, and the field to be replaced or the field to be added is displayed according to the display position corresponding to the field to be processed.
- the preset mark may be a mark that reflects a specific editing operation performed on the field to be processed, and different editing operations correspond to different preset marks. For example, if the editing operation is to delete the field to be processed, the preset mark may be a strikethrough added on the field to be processed, or a text mark indicating that the field is deleted is added to the field to be processed.
- the preset mark can be an underline added on the field to be processed, or a text mark that indicates that the field is replaced on the field to be processed, and the field to be processed is displayed next to the field to be processed.
- the specific display position can be set according to the actual situation.
- the preset mark may be to add a field mark at the position corresponding to the field to be processed, such as an arrow, to indicate that the field is added at the position.
- the added fields to be added can be displayed to facilitate the user to know what fields have been added.
- the specific marking method may be various, which is not limited in the embodiment of the present invention. By adding a preset mark on the field to be processed, the user can more clearly know the location of the modified field to be processed and the specific editing operation performed on it.
- the display position may be preset according to actual needs.
- the display position may be below the field to be deleted. In this way, by displaying the to-be-replaced field or the to-be-added field in the display position corresponding to the field to be deleted, it is convenient for the user to quickly learn the content of the specific modification, thereby facilitating the user to check later.
- Figure 2-4 shows a schematic diagram of an example of editing the text to be processed provided by an embodiment of the present invention.
- the field to be processed is "let me cry”
- the second input is delete Enter, delete the field to be processed, that is, add a strikethrough and delete it on "Let me shed tears”.
- Figure 2-5 shows a schematic diagram of another example of editing the text to be processed according to an embodiment of the present invention.
- the field to be processed is "tears" and the second input is a replacement input, so delete the text to be processed Field and display the field to be replaced, that is, add a strikethrough on the "tears", and the "saliva" below the field to be processed is the field to be replaced.
- Figure 2-6 shows a schematic diagram of another example of editing text to be processed according to an embodiment of the present invention. As shown in Figure 2-6, the position indicated by the field to be processed is between "I" and "Drop", and the If the input is an increase input, an arrow is used to indicate the position of the field to be processed, and the "today” below the arrow is the field to be added.
- Step 206 Determine the audio corresponding to the target field as the target audio segment.
- the text can be linguistically analyzed to segment the target field into words, and then based on the words obtained by segmentation, the audio waveform segment corresponding to the matching field can be extracted from the speech synthesis database, and the audio waveform segment corresponding to each word can be extracted Synthesize to get the audio segment corresponding to the text. It may also be searched in the audio to be processed whether there is a field that is the same as the target field, and if it exists, the audio segment corresponding to the same field is extracted as the audio corresponding to the target field, and then the target audio segment is obtained.
- Step 207 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
- the play time period corresponding to the field to be processed can be obtained from the play time period corresponding to each field, and then the audio waveform corresponding to the audio to be processed is obtained. Finally, the play time period corresponding to the field to be processed is set to The corresponding band in the audio waveform diagram is modified to the audio band corresponding to the target audio band to obtain the target audio.
- the field to be processed when obtaining the play time period corresponding to the field to be processed, the field to be processed can be searched from each field, and then the play time period corresponding to the field to be processed can be read.
- the audio waveform corresponding to the audio to be processed it may be by extracting features contained in the audio, such as vibration frequency, and processing the features, such as normalization, to obtain a waveform showing the audio features according to the playback time.
- the blank band when modifying the corresponding band in the audio waveform diagram corresponding to the playing period of the field to be processed into the blank band corresponding to the target audio band, the blank band can be used to replace the corresponding band to implement the modification.
- the corresponding band may be deleted directly to implement the modification. It should be noted that when deleting, the waveform display of the corresponding band can be removed and changed to a straight line to show that the sound is deleted.
- the target field is the field to be replaced, you can directly use the audio band corresponding to the target audio band to replace the corresponding band, or you can delete the corresponding band first, and then add the audio band corresponding to the target audio band at the deleted position The audio band corresponding to the field to be replaced.
- the target field is a field to be added, you can directly use the audio band corresponding to the target audio segment to replace the corresponding band, or you can add the target audio segment corresponding to the position of the corresponding band in the audio waveform diagram according to the playback period corresponding to the field to be processed
- the audio band corresponding to the field to be added in the audio band, the synthesized audio band is used as the target audio. In this way, by correspondingly modifying the band of the audio to be processed in the audio waveform graph, the modification of the audio to be processed can be realized, which can make the modification process more accurate, and thus can improve the accuracy of the modification.
- the electronic device can also perform the following operations after obtaining the audio waveform:
- the mark may be filled with different colors for the corresponding waveband, or may be added at the position of the corresponding waveband, and the specific form of the mark is not limited in the embodiment of the present invention. In this way, by displaying the audio waveform diagram corresponding to the audio to be processed, and marking the band corresponding to the field to be processed in the audio waveform diagram, it is convenient for the user to view the modified audio band.
- the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, according to the second input,
- the field to be processed is edited to obtain the target field, the audio corresponding to the target field is determined as the target audio segment, and finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
- corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
- the user can modify the audio without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- FIG. 3 shows a step flow chart of another embodiment of the audio processing method of the present invention.
- the method may be applied to an electronic device. As shown in FIG. 3, the method may include steps 301 to 307.
- Step 301 Acquire text information corresponding to the audio to be processed, where the text information includes the text to be processed and the play time period corresponding to each field in the text to be processed.
- this step can refer to the foregoing step 201, which is not limited in the embodiment of the present invention.
- Step 302 Receive a first input for the to-be-processed text.
- this step can refer to the foregoing step 202, which is not limited in the embodiment of the present invention.
- Step 303 In response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input.
- this step can refer to the foregoing step 203, which is not limited in the embodiment of the present invention.
- Step 304 Receive a second input for the field to be processed.
- this step can refer to the foregoing step 104, which is not limited in the embodiment of the present invention.
- Step 305 Extract the audio segment carried in the second input.
- the second input may be an audio recording operation. Accordingly, the audio segment carried in the second input may be a voice segment recorded by the user.
- the second input may also be an audio upload operation.
- the audio segment carried in the second input may also be an audio segment selected by the user to upload.
- Step 306 Determine the audio segment as the target audio segment.
- the audio segment carried in the second input is the audio segment ultimately desired by the user. Therefore, the audio segment can be directly determined as the target audio segment.
- the user before the input audio segment is determined as the target audio segment, the user is prompted whether to process the input audio segment, and if so, the input audio segment is intercepted according to the user operation , And use the intercepted audio segment as the target audio segment. In this way, by prompting the user whether to process the input audio segment, the quality of the target audio segment can be further improved.
- Step 307 According to the target audio segment, modify the audio segment at the play time period corresponding to the field to be processed to obtain the target audio.
- step 207 refers to the foregoing step 207, which is not limited in the embodiment of the present invention.
- the audio processing method first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback period corresponding to each field in the text to be processed, and then receives Process the first input of the text, in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, and extract the second input
- the audio segment is determined as the target audio segment, and finally, according to the target audio segment, the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
- the target audio segment can be easily obtained, and therefore, the processing efficiency can be improved.
- the user can modify the audio without manually adjusting the progress bar, which can further improve the audio processing efficiency.
- an embodiment of the present invention also provides an electronic device.
- the electronic device 40 may include:
- the first obtaining module 401 is configured to obtain text information corresponding to the to-be-processed audio, the text information including the to-be-processed text and the play time period corresponding to each field in the to-be-processed text;
- the first receiving module 402 is configured to receive the first input for the to-be-processed text
- the first determining module 403 is configured to determine the field to be processed in the text to be processed according to the field indicated by the first input in response to the first input;
- the second receiving module 404 is configured to receive a second input for the field to be processed
- the second acquiring module 405 is configured to acquire the target audio segment according to the second input
- the second determining module 406 is configured to modify the audio segment in the play period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
- the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
- the first input of the text in response to the first input, determines the field to be processed in the text to be processed according to the field indicated by the first input, and then receives a second input for the field to be processed, in response to the second input, Obtain the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
- the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- the second obtaining module 405 is configured to:
- the second input edit the field to be processed to obtain a target field, and determine the audio corresponding to the target field as the target audio segment;
- the audio segment carried in the second input is extracted, and the audio segment is determined as the target audio segment.
- the second obtaining module 405 is further configured to:
- the second input is a delete input, delete the field to be processed, and determine the blank field obtained after deletion as the target field;
- the second input is a replacement input
- the field to be replaced corresponding to the second input is obtained, the field to be processed is deleted, and the field to be replaced is added at the position of the field to be processed to obtain the Target field
- the second input is an addition input
- the field to be added corresponding to the second input is obtained, and the field to be added is added at the position of the field to be processed to obtain the target field.
- the electronic device 40 further includes:
- the first display module is configured to display a preset picture and display all the to-be-processed texts in the preset picture; or, to display each video picture of the to-be-processed video, and display and The to-be-processed text corresponding to the video screen.
- the first receiving module 402 is further configured to:
- the electronic device 40 further includes:
- the second display module is configured to add a preset mark to the displayed field to be processed, and display the field to be replaced or the field to be added according to the display position corresponding to the field to be processed.
- the second determining module 406 is configured to:
- the electronic device 40 further includes:
- the third receiving module is configured to receive the third input for the to-be-processed text
- the first adjustment module is configured to adjust the font size of the field to be adjusted indicated by the third input in response to the third input to obtain the adjusted field;
- the second adjustment module is configured to adjust the volume size of the audio corresponding to the field to be adjusted according to the font size of the field to be adjusted after adjustment.
- the larger the font of the field to be adjusted after adjustment the larger the font size of the field to be adjusted.
- the first obtaining module 401 is configured to:
- the subtitle file includes a subtitle text and a play period corresponding to each field in the subtitle text
- the play time period corresponding to the field, the text and the play time period corresponding to each field in the text are used as the text information corresponding to the audio to be processed.
- the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
- the first input of the text in response to the first input, determine the field to be processed in the text to be processed according to the field indicated by the first input, and then receive a second input for the field to be processed, and treat it according to the second input
- the processing field is edited to obtain the target field, and the audio corresponding to the target field is determined as the target audio segment.
- the audio segment at the playback period corresponding to the field to be processed is modified to obtain the target audio.
- corresponding editing operations can be performed according to different second inputs, thereby satisfying various modification requirements of users and improving audio modification effects.
- the user can modify the audio without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- FIG. 5 shows a schematic diagram of the hardware structure of an embodiment of the electronic device of the present invention.
- the electronic device 500 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and Power supply 511 and other components.
- a radio frequency unit 501 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and Power supply 511 and other components.
- Those skilled in the art can understand that the structure of the electronic device shown in FIG. 5 does not constitute a limitation on the electronic device.
- the electronic device may include more or fewer components than those shown in the figure, or a combination of certain components, or different components. Layout.
- electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers
- the processor 510 is configured to obtain text information corresponding to the audio to be processed, and the text information includes the text to be processed and a play time period corresponding to each field in the text to be processed.
- the processor 510 is configured to receive a first input for the to-be-processed text.
- the processor 510 is configured to determine a field to be processed in the text to be processed according to the field indicated by the first input in response to the first input.
- the processor 510 is configured to receive a second input for the field to be processed.
- the processor 510 is configured to obtain a target audio segment in response to the second input.
- the processor 510 is configured to modify the audio segment in the play period corresponding to the field to be processed according to the target audio segment to obtain the target audio.
- the electronic device provided by the embodiment of the present invention first obtains the text information corresponding to the audio to be processed, where the text information includes the text to be processed and the playback time period corresponding to each field in the text to be processed, and then receives the text information corresponding to the audio to be processed.
- the first input of the text in response to the first input, determines the field to be processed in the text to be processed according to the field indicated by the first input, and then receives a second input for the field to be processed, in response to the second input, Obtain the target audio segment, and finally, according to the target audio segment, modify the audio segment at the playback period corresponding to the field to be processed to obtain the target audio.
- the audio can be modified without manually adjusting the progress bar, so the audio processing efficiency can be improved.
- processor 510 is used to:
- Extract the audio segment carried in the second input determine the audio segment as the target audio segment.
- processor 510 is further configured to:
- the second input is a delete input, delete the field to be processed, and determine the blank field obtained after deletion as the target field;
- the second input is a replacement input, obtain the field to be replaced corresponding to the second input; delete the field to be processed and add the field to be replaced in the position of the field to be processed to obtain the Target field
- the field to be added corresponding to the second input is obtained; at the position of the field to be processed, the field to be added is added to obtain the target field.
- the display unit 506 is used to:
- the user input unit 507 is used to receive a selection input of the displayed text to be processed.
- processor 510 is used to:
- the user input unit 507 is used to:
- the processor 510 is used to:
- the adjusted font size of the field to be adjusted adjust the volume of the audio corresponding to the field to be adjusted; wherein, the larger the font of the field to be adjusted after adjustment, the greater the volume of the audio corresponding to the field to be adjusted. Big.
- processor 510 is used to:
- the subtitle file includes a subtitle text and a play period corresponding to each field in the subtitle text;
- If there is no subtitle file matching the audio to be processed convert the audio contained in the audio to be processed into text, and generate each of the texts according to the time information of the audio segment in the audio to be processed Play time period corresponding to the field; use the text and the play time period corresponding to each field in the text as the text information corresponding to the audio to be processed.
- the radio frequency unit 501 can be used for receiving and sending signals in the process of sending and receiving information or talking. Specifically, after receiving the downlink data from the base station, it is processed by the processor 510; Uplink data is sent to the base station.
- the radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
- the radio frequency unit 501 can also communicate with the network and other devices through a wireless communication system.
- the electronic device provides users with wireless broadband Internet access through the network module 502, such as helping users to send and receive emails, browse web pages, and access streaming media.
- the audio output unit 503 can convert the audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output it as sound. Moreover, the audio output unit 503 may also provide audio output related to a specific function performed by the electronic device 500 (for example, call signal reception sound, message reception sound, etc.).
- the audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.
- the input unit 504 is used to receive audio or video signals.
- the input unit 504 may include a graphics processing unit (GPU) 5041 and a microphone 5042.
- the graphics processor 5041 is configured to monitor images of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. Data is processed.
- the processed image frame may be displayed on the display unit 506.
- the image frame processed by the graphics processor 5041 may be stored in the memory 509 (or other storage medium) or sent via the radio frequency unit 501 or the network module 502.
- the microphone 5042 can receive sound, and can process such sound into audio data.
- the processed audio data can be converted into a format that can be sent to a mobile communication base station via the radio frequency unit 501 for output in the case of a telephone call mode.
- the electronic device 500 further includes at least one sensor 505, such as a light sensor, a motion sensor, and other sensors.
- the light sensor includes an ambient light sensor and a proximity sensor.
- the ambient light sensor can adjust the brightness of the display panel 5061 according to the brightness of the ambient light.
- the proximity sensor can close the display panel 5061 and the display panel 5061 when the electronic device 500 is moved to the ear. / Or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games) , Magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; sensor 505 can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, Infrared sensors, etc., will not be repeated here.
- the display unit 506 is used to display information input by the user or information provided to the user.
- the display unit 606 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
- LCD liquid crystal display
- OLED organic light-emitting diode
- the user input unit 507 can be used to receive inputted numeric or character information, and generate key signal input related to user settings and function control of the electronic device.
- the user input unit 507 includes a touch panel 5071 and other input devices 5072.
- the touch panel 5071 also known as a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 5071 or near the touch panel 5071. operate).
- the touch panel 5071 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 510, the command sent by the processor 510 is received and executed.
- the touch panel 5071 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
- the user input unit 507 may also include other input devices 5072.
- other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick, which will not be repeated here.
- the touch panel 5071 can be covered on the display panel 5061.
- the touch panel 6071 detects a touch operation on or near it, it is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 determines the type of the touch event according to the touch.
- the type of event provides corresponding visual output on the display panel 5061.
- the touch panel 5071 and the display panel 5061 are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 can be integrated
- the implementation of the input and output functions of the electronic device is not specifically limited here.
- the interface unit 508 is an interface for connecting an external device and the electronic device 500.
- the external device may include a wired or wireless headset port, an external power source (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) port, video I/O port, headphone port, etc.
- the interface unit 508 can be used to receive input (for example, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device 500 or can be used to connect the electronic device 500 to an external device. Transfer data between devices.
- the memory 509 can be used to store software programs and various data.
- the memory 509 may mainly include a storage program area and a storage data area.
- the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc.
- the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
- the processor 510 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device, runs or executes software programs and/or modules stored in the memory 509, and calls data stored in the memory 509 , Perform various functions of electronic equipment and process data, so as to monitor the electronic equipment as a whole.
- the processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc., and the modem The processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 510.
- the electronic device 500 may also include a power source 511 (such as a battery) for supplying power to various components.
- a power source 511 such as a battery
- the power source 511 may be logically connected to the processor 510 through a power management system, so as to manage charging, discharging, and power consumption management through the power management system. And other functions.
- the electronic device 500 includes some functional modules not shown, which will not be repeated here.
- the embodiment of the present invention also provides an electronic device, including a processor 510, a memory 509, an audio processing program stored in the memory 509 and running on the processor 510, and the audio processing program is controlled by the processor 510.
- an electronic device including a processor 510, a memory 509, an audio processing program stored in the memory 509 and running on the processor 510, and the audio processing program is controlled by the processor 510.
- Each process of the foregoing audio processing method embodiment is realized during execution, and the same technical effect can be achieved. In order to avoid repetition, the details are not repeated here.
- the embodiment of the present invention also provides a computer-readable storage medium on which an audio processing program is stored.
- an audio processing program is executed by a processor, each process of the above-mentioned audio processing method embodiment is implemented, and the same can be achieved. In order to avoid repetition, I won’t repeat them here.
- the computer-readable storage medium include non-transitory computer-readable storage media, such as read-only memory (Read-Only Memory, ROM for short), Random Access Memory (RAM for short), and magnetic CD or CD, etc.
- Such a processor can be, but is not limited to, a general-purpose processor, a dedicated processor, a special application processor, or a field programmable logic circuit. It can also be understood that each block in the block diagram and/or flowchart and the combination of the blocks in the block diagram and/or flowchart can also be implemented by dedicated hardware that performs specified functions or actions, or can be implemented by dedicated hardware and A combination of computer instructions.
- the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present invention.
- a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims (18)
- 一种音频处理方法,应用于电子设备,所述方法包括:获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;接收针对所述待处理文本的第一输入;响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;接收针对所述待处理字段的第二输入;根据所述第二输入,获取目标音频段;根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
- 根据权利要求1所述的方法,其中,所述根据所述第二输入,获取目标音频段,包括:根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,将所述目标字段对应的音频,确定为所述目标音频段;或者,提取所述第二输入中携带的音频段,将所述音频段确定为所述目标音频段。
- 根据权利要求2所述的方法,其中,所述根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,包括:若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段,将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段,在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
- 根据权利要求3所述的方法,其中,所述待处理音频为待处理视频中包含的音频;所述接收针对所述待处理文本的第一输入之前,所述方法还包括:显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本;所述接收针对所述待处理文本的第一输入,包括:接收对显示的待处理文本的选择输入。
- 根据权利要求1至4任一所述的方法,其中,所述根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频,包括:从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;获取所述待处理音频对应的音频波形图;将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
- 根据权利要求1所述的方法,其中,所述根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段之后,所述方法还包括:接收针对所述待处理文本的第三输入;响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小,其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
- 根据权利要求1所述的方法,其中,所述获取待处理音频对应的文本信息,包括:检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
- 一种电子设备,包括:第一获取模块,用于获取待处理音频对应的文本信息,所述文本信息包括待处理文本及所述待处理文本中各个字段对应的播放时段;第一接收模块,用于接收针对所述待处理文本的第一输入;第一确定模块,用于响应于所述第一输入,根据所述第一输入指示的字段,确定所述待处理文本中的待处理字段;第二接收模块,用于接收针对所述待处理字段的第二输入;第二获取模块,用于根据所述第二输入,获取目标音频段;第二确定模块,用于根据所述目标音频段,对所述待处理字段对应的播放时段处的音频段进行修改,得到目标音频。
- 根据权利要求8所述的电子设备,其中,所述第二获取模块用于:根据所述第二输入,对所述待处理字段进行编辑,得到目标字段,将所述目标字段对应的音频,确定为所述目标音频段;或者,提取所述第二输入中携带的音频段,将所述音频段确定为所述目标音频段。
- 根据权利要求9所述的电子设备,其中,所述第二获取模块还用于:若所述第二输入为删除输入,则将所述待处理字段删除,并将删除后得到的空白字段确定为所述目标字段;若所述第二输入为替换输入,则获取所述第二输入对应的待替换字段,将所述待处理字段删除并在所述待处理字段的位置,添加所述待替换字段,得到所述目标字段;若所述第二输入为增加输入,则获取所述第二输入对应的待增加字段,在所述待处理字段的位置,添加所述待增加字段,得到所述目标字段。
- 根据权利要求10所述的电子设备,还包括:第一显示模块,用于显示预设画面,并在所述预设画面中显示所有的所述待处理文本;或者,显示所述待处理视频的各个视频画面,在所述视频画面中显示与所述视频画面对应的待处理文本;所述第一接收模块,还用于:接收对显示的待处理文本的选择输入。
- 根据权利要求8至11任一所述的电子设备,其中,所述第二确定模块用于:从所述各个字段对应的播放时段中获取所述待处理字段对应的播放时段;获取所述待处理音频对应的音频波形图;将所述待处理字段对应的播放时段在所述音频波形图中的对应波段,修改为所述目标音频段对应的音频波段,得到所述目标音频。
- 根据权利要求8所述的电子设备,还包括:第三接收模块,用于接收针对所述待处理文本的第三输入;第一调整模块,用于响应于所述第三输入,对所述第三输入所指示的待调整字段的字体大小进行调整,得到调整后的待调整字段;第二调整模块,用于根据所述调整后的待调整字段的字体大小,调整所述待调整字段对应音频的音量大小;其中,所述调整后的待调整字段的字体越大,所述待调整字段对应音频的音量越大。
- 根据权利要求8所述的电子设备,其中,所述第一获取模块用于:检测是否存在与所述待处理音频匹配的字幕文件,所述字幕文件包括字幕文本及所述字幕文本中各个字段对应的播放时段;若存在与所述待处理音频匹配的字幕文件,则将所述字幕文件作为所述待处理音频对应的文本信息;若不存在与所述待处理音频匹配的字幕文件,则将所述待处理音频中包含的音频转换为文本,并根据所述待处理音频中音频段播放的时间信息,生成所述文本中各个字段对应的播放时段,将所述文本与所述文本中各个字段对应的播放时段,作为所述待处理音频对应的文本信息。
- 一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的音频处理程序,所述音频处理程序被所述处理器执行时实现如权利要求1至7中任一项所述的音频处理方法的步骤。
- 一种电子设备,被配置成用于执行如权利要求1至7中任一项所述的音频处理方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储音频处理程序,所述音频处理程序被处理器执行时实现如权利要求1至7中任一项所述的音频处理方法的步骤。
- 一种计算机程序产品,所述程序产品可被处理器执行以实现如权利要求1至7中任一项所述的音频处理方法的步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227033855A KR20220149570A (ko) | 2020-03-11 | 2021-03-04 | 오디오 처리 방법 및 전자 기기 |
EP21767696.4A EP4120268A4 (en) | 2020-03-11 | 2021-03-04 | SOUND PROCESSING METHOD AND ELECTRONIC DEVICE |
US17/940,057 US20230005506A1 (en) | 2020-03-11 | 2022-09-08 | Audio processing method and electronic device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010167788.0 | 2020-03-11 | ||
CN202010167788.0A CN111445927B (zh) | 2020-03-11 | 2020-03-11 | 一种音频处理方法及电子设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/940,057 Continuation US20230005506A1 (en) | 2020-03-11 | 2022-09-08 | Audio processing method and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021179991A1 true WO2021179991A1 (zh) | 2021-09-16 |
Family
ID=71627433
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/079144 WO2021179991A1 (zh) | 2020-03-11 | 2021-03-04 | 音频处理方法及电子设备 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230005506A1 (zh) |
EP (1) | EP4120268A4 (zh) |
KR (1) | KR20220149570A (zh) |
CN (1) | CN111445927B (zh) |
WO (1) | WO2021179991A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445927B (zh) * | 2020-03-11 | 2022-04-26 | 维沃软件技术有限公司 | 一种音频处理方法及电子设备 |
CN112102841B (zh) * | 2020-09-14 | 2024-08-30 | 北京搜狗科技发展有限公司 | 一种音频编辑方法、装置和用于音频编辑的装置 |
CN112669885B (zh) * | 2020-12-31 | 2023-04-28 | 咪咕文化科技有限公司 | 一种音频剪辑方法、电子设备及存储介质 |
CN114915836A (zh) * | 2022-05-06 | 2022-08-16 | 北京字节跳动网络技术有限公司 | 用于编辑音频的方法、装置、设备和存储介质 |
CN115695848A (zh) * | 2022-10-28 | 2023-02-03 | 杭州遥望网络科技有限公司 | 一种直播数据处理方法、装置、设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080177536A1 (en) * | 2007-01-24 | 2008-07-24 | Microsoft Corporation | A/v content editing |
CN202502737U (zh) * | 2012-03-12 | 2012-10-24 | 中国人民解放军济南军区司令部第二部 | 一种视音频信息的智能编辑系统 |
CN104135628A (zh) * | 2013-05-03 | 2014-11-05 | 安凯(广州)微电子技术有限公司 | 一种视频编辑方法及终端 |
US9185225B1 (en) * | 2011-06-08 | 2015-11-10 | Cellco Partnership | Method and apparatus for modifying digital messages containing at least audio |
CN108984788A (zh) * | 2018-07-30 | 2018-12-11 | 珠海格力电器股份有限公司 | 一种录音文件整理、归类系统及其控制方法与录音设备 |
CN111445927A (zh) * | 2020-03-11 | 2020-07-24 | 维沃软件技术有限公司 | 一种音频处理方法及电子设备 |
CN112102841A (zh) * | 2020-09-14 | 2020-12-18 | 北京搜狗科技发展有限公司 | 一种音频编辑方法、装置和用于音频编辑的装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6785649B1 (en) * | 1999-12-29 | 2004-08-31 | International Business Machines Corporation | Text formatting from speech |
JP2008500573A (ja) * | 2004-05-27 | 2008-01-10 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | メッセージを変更するための方法及びシステム |
US10445052B2 (en) * | 2016-10-04 | 2019-10-15 | Descript, Inc. | Platform for producing and delivering media content |
CN107633850A (zh) * | 2017-10-10 | 2018-01-26 | 维沃移动通信有限公司 | 一种音量调节方法及电子设备 |
-
2020
- 2020-03-11 CN CN202010167788.0A patent/CN111445927B/zh active Active
-
2021
- 2021-03-04 KR KR1020227033855A patent/KR20220149570A/ko active Search and Examination
- 2021-03-04 WO PCT/CN2021/079144 patent/WO2021179991A1/zh active Application Filing
- 2021-03-04 EP EP21767696.4A patent/EP4120268A4/en active Pending
-
2022
- 2022-09-08 US US17/940,057 patent/US20230005506A1/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080177536A1 (en) * | 2007-01-24 | 2008-07-24 | Microsoft Corporation | A/v content editing |
US9185225B1 (en) * | 2011-06-08 | 2015-11-10 | Cellco Partnership | Method and apparatus for modifying digital messages containing at least audio |
CN202502737U (zh) * | 2012-03-12 | 2012-10-24 | 中国人民解放军济南军区司令部第二部 | 一种视音频信息的智能编辑系统 |
CN104135628A (zh) * | 2013-05-03 | 2014-11-05 | 安凯(广州)微电子技术有限公司 | 一种视频编辑方法及终端 |
CN108984788A (zh) * | 2018-07-30 | 2018-12-11 | 珠海格力电器股份有限公司 | 一种录音文件整理、归类系统及其控制方法与录音设备 |
CN111445927A (zh) * | 2020-03-11 | 2020-07-24 | 维沃软件技术有限公司 | 一种音频处理方法及电子设备 |
CN112102841A (zh) * | 2020-09-14 | 2020-12-18 | 北京搜狗科技发展有限公司 | 一种音频编辑方法、装置和用于音频编辑的装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4120268A4 * |
Also Published As
Publication number | Publication date |
---|---|
KR20220149570A (ko) | 2022-11-08 |
CN111445927A (zh) | 2020-07-24 |
EP4120268A4 (en) | 2023-06-21 |
US20230005506A1 (en) | 2023-01-05 |
EP4120268A1 (en) | 2023-01-18 |
CN111445927B (zh) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021179991A1 (zh) | 音频处理方法及电子设备 | |
CN109819313B (zh) | 视频处理方法、装置及存储介质 | |
CN110381371B (zh) | 一种视频剪辑方法及电子设备 | |
WO2021078116A1 (zh) | 视频处理方法及电子设备 | |
CN111010610B (zh) | 一种视频截图方法及电子设备 | |
US20220284928A1 (en) | Video display method, electronic device and medium | |
WO2021233293A1 (zh) | 笔记记录方法及电子设备 | |
CN110557565B (zh) | 视频处理方法和移动终端 | |
WO2021073478A1 (zh) | 弹幕信息识别方法、显示方法、服务器及电子设备 | |
CN111050070B (zh) | 视频拍摄方法、装置、电子设备及介质 | |
WO2021104160A1 (zh) | 编辑方法及电子设备 | |
CN111491211B (zh) | 视频处理方法、视频处理装置及电子设备 | |
CN111010608B (zh) | 视频播放的方法及电子设备 | |
WO2021036659A1 (zh) | 视频录制方法及电子设备 | |
CN111601174A (zh) | 一种字幕添加方法及装置 | |
CN109819167B (zh) | 一种图像处理方法、装置和移动终端 | |
CN110719527A (zh) | 一种视频处理方法、电子设备及移动终端 | |
CN110830368B (zh) | 即时通讯消息发送方法及电子设备 | |
CN110568926A (zh) | 一种声音信号处理方法及终端设备 | |
WO2021238837A1 (zh) | 信息显示的方法、装置、电子设备、介质及程序产品 | |
CN109391842B (zh) | 一种配音方法、移动终端 | |
CN110808019A (zh) | 一种歌曲生成方法及电子设备 | |
CN108763475B (zh) | 一种录制方法、录制装置及终端设备 | |
WO2021104175A1 (zh) | 信息的处理方法及装置 | |
CN111372029A (zh) | 视频显示方法、装置及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21767696 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202217054441 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 20227033855 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021767696 Country of ref document: EP Effective date: 20221011 |