WO2022022536A1

WO2022022536A1 - Audio playback method, audio playback apparatus, and electronic device

Info

Publication number: WO2022022536A1
Application number: PCT/CN2021/108757
Authority: WO
Inventors: 史建兴
Original assignee: 维沃移动通信有限公司
Priority date: 2020-07-30
Filing date: 2021-07-27
Publication date: 2022-02-03
Also published as: CN111986689A

Abstract

Disclosed are an audio playback method and apparatus, and an electronic device. The method comprises: when an audio file in a multimedia file is played back, determining the target noise characteristic; determining a first audio in the audio file according to the target noise characteristic; and extracting a target audio in the audio file according to the first audio, and playing back the target audio, the target audio being the first audio or a second audio, and the second audio being an audio other than the first audio in the audio file.

Description

Audio playback method, audio playback device and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on July 30, 2020, the application number is 202010749736.4, and the application name is "audio playback method, audio playback device and electronic equipment", the entire contents of which are included by reference here.

technical field

The embodiments of the present application relate to the field of communication technologies, and in particular, to an audio playback method, an audio playback device, and an electronic device.

Background technique

At present, the shooting functions of electronic devices are becoming more and more powerful, and users can shoot videos to record life through the shooting functions of the electronic devices. For example, a user may shoot a video of a concert, or a video of other users' speeches, etc. through the electronic device.

In the process of video shooting by electronic equipment, taking the electronic equipment for shooting a concert as an example, the electronic equipment will collect all the audio within the collection range of the concert site. However, the environment at the concert site is usually noisy, and the electronic device not only collects the singer's voice and background music, but also collects the noise in the environment. During subsequent video playback by the electronic device, the electronic device will play the noise together with the singer's voice and background music, which results in poor audio playback effect.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to provide an audio playback method, an audio playback device and an electronic device, which can solve the problem of poor playback effect in the audio playback process.

In order to solve the above technical problems, this application is implemented as follows:

In a first aspect, an embodiment of the present application provides an audio playback method, the method includes: in the case of playing an audio file in a multimedia file, determining a target noise feature; according to the target noise feature, determining a first audio frequency in the audio file According to this first audio, extract the target audio in the audio file, play the target audio; Wherein, this target audio is the first audio or is the second audio, and the second audio is the audio other than the first audio in the audio file .

In a second aspect, an embodiment of the present application provides an audio playback device, the audio playback device includes: a determination module, a noise reduction module, and a playback module; the determination module is configured to determine a target when an audio file in a multimedia file is played noise feature; a noise reduction module for determining a first audio in the audio file according to the target noise feature determined by the determining module; and extracting the target audio in the audio file according to the first audio; a playing module for playing the noise reduction The target audio extracted by the module; wherein, the target audio is the first audio or the second audio, and the second audio is the audio other than the first audio in the audio file.

In a third aspect, embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.

In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .

In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.

In the embodiment of the present application, the audio playback device may determine the target noise feature when playing the audio file in the multimedia file. Then, the audio playback device may determine the first audio in the audio file according to the target noise feature. Then, the audio playback device can extract the target audio in the audio file according to the first audio, and play the target audio; wherein, the target audio is the first audio or the second audio, and the second audio is the first audio in the audio file except the first audio. Audio other than audio. Through the above solution, first, in the case that the audio playback device plays the audio file, the audio playback device can determine the target noise feature corresponding to the audio file. When the audio playback device determines the target noise feature, it can accurately determine the first audio in the audio file according to the target noise feature. Secondly, the audio playback device can extract the target audio in the audio file according to the first audio. Due to the improvement of the accuracy rate of determining the first audio, the accuracy of extracting the target audio in the audio file is also improved, so that the audio playback can be made. The device accurately suppresses the noise in the audio file, thereby obtaining the playback effect of the audio file required by the user. In this way, the purpose of improving the playback effect of the audio file is achieved.

Description of drawings

1 is a schematic flowchart of an audio playback method provided by an embodiment of the present application;

2 is one of schematic diagrams of interfaces to which an audio playback method provided by an embodiment of the present application is applied;

FIG. 3 is a second schematic diagram of an interface to which an audio playback method provided by an embodiment of the present application is applied;

FIG. 4 is a third schematic diagram of an interface to which an audio playback method provided by an embodiment of the present application is applied;

FIG. 5 is a fourth schematic diagram of an interface to which an audio playback method provided by an embodiment of the present application is applied;

6 is a schematic structural diagram of an audio playback device provided by an embodiment of the present application;

FIG. 7 is one of the schematic structural diagrams of an electronic device provided by an embodiment of the present application;

FIG. 8 is a second schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

The terms "first", "second" and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that embodiments of the application can be practiced in sequences other than those illustrated or described herein. In addition, the objects distinguished by "first", "second", etc. are usually one type, and the number of objects is not limited. For example, the first object may be one or more than one. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the associated objects are in an "or" relationship.

The audio playback method provided by the embodiments of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.

The audio playback method in the embodiments of the present application may be applied in various scenarios, for example, in the scenario of playing a concert video, or in the scenario of playing a child's performance video, or in the scenario of playing a lecture video, or It can be used in the scene of playing ocean audio, or in the scene of playing home video, or in the scene of playing animal video, or in the scene of playing songs, etc.

Taking the scenario of playing a concert video as an example, when a user plays a concert video through an electronic device and finds that the audio contains a lot of noise (except for the singer's voice and background music), the user can click on the screen of the electronic device, this At this time, the electronic device can determine the noise characteristics of the audio files in the concert video, and determine that what the user wants to listen to is the singer's voice and background music in the concert video. Then, the electronic device can extract the singer's voice and background music in the concert video, and play it. Therefore, the user can hear the singer's voice and background music with reduced noise, thereby improving the playback effect of the audio file in the concert video.

1 is a schematic flowchart of an audio playback method provided by an embodiment of the present application, including steps 201 to 203:

Step 201: When the audio playback device plays the audio file in the multimedia file, the audio playback device determines the target noise feature.

In this embodiment of the present application, the multimedia file may be a multimedia file collected by an audio playback device, a multimedia file downloaded by an audio playback device, or a multimedia file played online by an audio playback device, which is not limited in this embodiment of the present application .

In the embodiment of the present application, the above-mentioned multimedia file may be a video file or an audio file, which is not limited in the embodiment of the present application.

It can be understood that when the multimedia file is a video file, the audio file in the above-mentioned multimedia file is an audio file (for example, background music or vocals, etc.) in the video file, and when the multimedia file is an audio file, the above-mentioned audio file The audio file in the multimedia file is the audio file.

In this embodiment of the present application, the target noise feature may include: white noise, Gaussian noise, impulse noise, human voice, or other noise, which is not limited in this embodiment of the present application.

It should be noted that, the noise features in the embodiments of the present application may be understood as noise types.

Optionally, in this embodiment of the present application, the target noise feature may be a noise feature corresponding to a shooting scene or a noise feature corresponding to a noise reduction degree, which is not limited in this embodiment of the present application.

Wherein, the above shooting scene may include any one of the following: outdoor, seaside, bus, concert or home. It should be noted that the shooting scenes in the embodiments of the present application include but are not limited to the aforementioned five scenes, which may be specifically limited according to actual needs, which are not limited in the embodiments of the present application.

Exemplarily, in the case that the above-mentioned shooting scene is a seaside, the target noise feature may be the sound of ocean waves or other sounds other than the sound of ocean waves; if the above-mentioned shooting scene is a concert, the target noise feature may be background music, Or other sounds other than the background music and the singer's voice, which may be specifically set according to actual needs, which are not limited in this embodiment of the present application.

In the embodiments of the present application, the above-mentioned noise reduction degree refers to the degree of noise suppression, that is, suppression of all or part of the noise. For example, the noise reduction degree can be classified as high, medium or low, the audio playback device determines that the noise reduction degree is high, and all noise can be suppressed; the audio playback device determines that the noise reduction degree is medium, and 50% of the noise can be suppressed, and the audio playback device determines The noise reduction degree is low, and 10% of the noise can be suppressed. The specific value can be set according to actual requirements, which is not limited in this embodiment of the present application.

In an example, when the audio playback device plays an audio file in a multimedia file, in addition to automatically determining the target noise feature, the user can also trigger the audio playback device to determine the target noise feature.

Step 202: The audio playback device determines the first audio in the audio file according to the above target noise feature.

Optionally, in this embodiment of the present application, the above-mentioned first audio may be: a target human voice or an ambient sound.

Exemplarily, the above-mentioned ambient sounds may include: background music, ocean waves, whistling sounds, or other human voices other than the target human voice in the audio file, and the like.

Example 1, the audio file includes ocean wave sound and human voice, and the audio playback device determines that the target noise feature is ocean wave sound, then the audio playback device can determine the first audio is human voice according to the ocean wave sound.

Example 2, the audio file includes ocean wave sound and human voice, and the audio playback device determines that the target noise feature is ocean wave sound, then the audio playback device can determine the first audio as ocean wave sound according to the ocean wave sound.

Example 3, the audio file includes ocean waves and human voices, and the audio playback device determines that the target noise feature is human voices, then the audio playback device can determine the first audio is ocean waves according to the human voice.

Example 4, the audio file includes the sound of ocean waves and human voice, and the audio playback device determines that the target noise feature is human voice, then the audio playback device can determine the first audio is human voice according to the human voice.

Step 203: The audio playing device extracts the target audio in the audio file according to the above-mentioned first audio, and plays the target audio.

Wherein, the above-mentioned target audio is the first audio or the second audio, and the second audio is the audio other than the first audio in the audio file.

It should be noted that there is no obvious sequence for the audio playback device to extract the target audio in the audio file and play the target audio. While playing the target audio in the target audio, the embodiment of the present application does not limit this.

Exemplarily, the second audio may be: target human voice or ambient sound. Specifically, in the case where the second audio is the target human voice, the first audio can be the ambient sound, and in the case where the second audio is the ambient sound, the first audio can be the target human voice, which can be set according to actual needs , which is not limited in the embodiments of the present application.

Example 1, the audio playback device extracts the first audio in the audio file as the target audio.

For example, the audio file includes the sound of ocean waves and human voices, and the user needs to obtain audio that only contains human voices (that is, the above-mentioned target audio), and the audio playback device can be triggered to determine that the target noise feature is the sound of ocean waves, then the audio playback device can be based on ocean waves. voice, determine the first audio as human voice, and extract the human voice.

Example 2, the audio playback apparatus extracts the second audio in the audio file as the target audio.

For example, the audio file includes the sound of ocean waves and human voices, and the user needs to obtain audio that only contains human voices (that is, the above-mentioned target audio), and the audio playback device can be triggered to determine that the target noise feature is the sound of ocean waves, then the audio playback device can be based on ocean waves. sound, determine that the first audio is the sound of ocean waves, and extract the human voice (that is, the above-mentioned second audio) from the audio file including the sound of ocean waves and the human voice according to the sound of the ocean waves.

In an example, the audio playback device may extract the target audio in the audio file according to the AI noise reduction model.

The audio playback device can train the AI noise reduction model according to the training samples in the training sample library, wherein the training sample library includes at least one training sample, if the target audio in the extracted audio file is human voice, the at least one training sample Each training sample includes human voice; if the target audio in the extracted audio file is ambient sound, each training sample in the at least one training sample includes ambient sound. For example, if the target audio in the extracted audio file is the sound of ocean waves, each training sample in the at least one training sample contains the sound of ocean waves; if the target audio in the extracted audio file is the sound of whistle, each training sample in the at least one training sample contains the sound of ocean waves. The training samples all contain whistle sounds.

Exemplarily, in the case where the target audio is the second audio, the playback device extracts the target audio in the audio file according to the above-mentioned first audio, which may be extracted in at least two possible ways.

Example 5, the audio playback device determines the second audio according to the above-mentioned first audio, and directly extracts the second audio from the audio file. For example, the audio playback device may extract the second audio through the first AI noise reduction model.

Example 6, the audio playback device extracts the first audio in the audio file according to the above-mentioned first audio, and then filters the first audio from the audio file to extract the second audio. For example, the audio playback device may extract the first audio through the second AI noise reduction model, and filter the first audio from the audio file to extract the second audio.

It should be noted that after the audio playback device extracts the target audio in the audio file, the target audio can be saved, and the target audio can be used for audio synthesis in subsequent shooting, which has achieved a better shooting effect.

In the audio playback method provided by the embodiment of the present application, the audio playback device can determine the target noise feature in the case of playing the audio file in the multimedia file. Then, the audio playback device may determine the first audio in the audio file according to the target noise feature. Then, the audio playback device can extract the target audio in the audio file according to the first audio, and play the target audio; wherein, the target audio is the first audio or the second audio, and the second audio is the first audio in the audio file except the first audio. Audio other than audio. Through the above solution, first, in the case that the audio playback device plays the audio file, the audio playback device can determine the target noise feature corresponding to the audio file. When the audio playback device determines the target noise feature, it can accurately determine the first audio in the audio file according to the target noise feature. Secondly, the audio playback device can extract the target audio in the audio file according to the first audio. Due to the improvement of the accuracy rate of determining the first audio, the accuracy of extracting the target audio in the audio file is also improved, so that the audio playback can be made. The device accurately suppresses the noise in the audio file, thereby obtaining the playback effect of the audio file required by the user. In this way, the purpose of improving the playback effect of the audio file is achieved.

Optionally, in this embodiment of the present application, the audio playback device may correspond to a variety of noise features, and the user selects the noise feature to be determined by the audio playback device.

Exemplarily, the above-mentioned step 201 may specifically include the following steps 201a to 202d:

Step 201a: In the case of playing the audio file in the multimedia file, the audio playback device receives the first input from the user.

Wherein, the above-mentioned first input may be a click input by the user on the screen of the audio playback device, or a voice command input by the user, or a specific gesture input by the user, which can be specifically determined according to actual use requirements. This is not limited.

The specific gesture in the embodiment of the present application may be any one of a single-click gesture, a sliding gesture, a drag gesture, a pressure recognition gesture, a long-press gesture, an area change gesture, a double-press gesture, and a double-click gesture; in the embodiment of the present application The click input can be single-click input, double-click input, or click input for any number of times, etc., and can also be long-press input or short-press input.

In an example, the user's click input on the screen of the audio playback device may specifically be: the user's click input on a target control on the screen.

It should be noted that the above-mentioned target control may be an existing control or a newly added control, which is not limited in this embodiment of the present application. Wherein, the target control may include at least one of the following: a physical key (also called a physical key or a mechanical key), and a virtual key.

Specifically, the first input may be an input in which the user clicks the target control to reach the first preset duration.

For example, the first input may be that the user presses the power button, the volume button, the newly added artificial intelligence (Artificial Intelligence, AI) button, and the like. Specifically, the first input may be that the user presses the power button and the AI button for 3 seconds, or the first input may be that the user presses the volume key "+" and the volume key "-" once respectively.

Step 201b: In response to the above-mentioned first input, the audio playback device displays M options.

The above M options correspond to M noise reduction models, and each noise reduction model has different noise characteristics.

Exemplarily, the audio playback apparatus may display M options in the above-mentioned multimedia playback interface, or may display M options in the first interface, which is not limited in this embodiment of the present application. The first interface may be an existing interface or a newly added interface, which is not limited in this embodiment of the present application.

Exemplarily, the above-mentioned first interface may be a window or a menu bar, which is not limited in this embodiment of the present application.

Exemplarily, the above-mentioned M noise reduction models may correspond to the same function, or may correspond to different functions, which are not limited in this embodiment of the present application.

For example, the functions corresponding to the M noise reduction models are all vocal enhancement, or the functions corresponding to the M noise reduction models are vocal suppression, or the function corresponding to at least one noise reduction model in the M noise reduction models is vocal enhancement , the function corresponding to the noise reduction models other than the at least one noise reduction model among the M noise reduction models is vocal suppression.

Step 201c: The audio playback device receives a second input from the user on the target option in the above-mentioned M options.

Exemplarily, the above-mentioned second input may be a user's click input on the target option, or a voice command input by the user, or a specific gesture input by the user. Not limited.

Step 201d: In response to the second input, the audio playback device determines a target noise reduction model corresponding to the target option, and determines the target noise feature according to the target noise reduction model.

Exemplarily, there is a one-to-one correspondence between the aforementioned noise reduction model and the aforementioned noise feature. After the audio playback device determines the target noise reduction model, the target noise feature may be determined through the correspondence.

It should be noted that the above-mentioned corresponding relationship may be preset by the system or set by the user, which is not limited in this embodiment of the present application.

For example, take the audio playback device as a mobile phone and the video 1 as an example of the singing and ocean waves of user A shot at the seaside, as shown in FIG. In this case, if the mobile phone user wants to obtain audio that only includes the singing voice of user A, first, the mobile phone user can click the "AI noise reduction" control 32 (ie, the above-mentioned first input). At this time, as shown in FIG. 3 , a window 33 (ie, the above-mentioned first interface) is superimposed and displayed on the playing interface 31 of the mobile phone. Among them, the window 33 displays 5 options corresponding to the 5 shooting scenes, namely the “outdoor” option, the “concert” option, the “seaside” option 34 , the “transit” option and the “home” option. Then, the mobile phone user can click on the "seaside" option 34 (ie, the above-mentioned second input). Next, the mobile phone determines the AI noise reduction model A, where the target noise feature corresponding to the noise reduction model A is the sound of waves, then the mobile phone can determine the first audio as the singing voice of user A according to the sound of the waves, and extract the sound according to the AI noise reduction model A. User A's singing voice.

It should be noted that when the audio playback device enables a specific application, the audio playback device can automatically display the above-mentioned "AI noise reduction" control for the user to choose whether to enable the noise reduction function, and the audio playback device displays "AI noise reduction" When the control reaches the second preset duration (for example, 3 seconds), the audio playback device may cancel the display of the "AI noise reduction" control. When the user requests the audio playback device to display the "AI noise reduction" control, the audio playback device may be triggered to display the "AI noise reduction" control by sliding input on the screen of the audio playback device.

The specific application may include a "camera" application, or a "recording" application, etc., or a chat application with a video shooting function, or a shopping application with a video playback function, and the like. The above-mentioned user sliding input on the screen may be: the user sliding left on the screen, or the user sliding right on the screen, or the user sliding up on the screen, or the user sliding down on the screen, This embodiment of the present application does not limit this.

The audio playback method provided by the embodiment of the present application can be applied to a scenario with noise characteristics determined by selecting an option. When a user wants to improve the playback effect of an audio file, the audio playback device can play the audio file and trigger a trigger according to requirements. The audio playback device displays M options, and the user selects a corresponding noise reduction model according to requirements, and the audio playback device can determine the target noise feature according to the target noise reduction model, thereby not only improving the accuracy of the audio playback device for suppressing noise in the audio file, The flexibility of the audio playback device for audio noise reduction is also improved.

Further, in the embodiment of the present application, in the case where M options correspond to the same function, before the audio playback device displays the M options, the audio playback device can also display at least two function options, and the user first selects the desired implementation function, and then select the corresponding noise reduction model.

In an example, the above-mentioned step 201b may specifically include the following steps 201b1 to 201b3:

Step 201b1: In response to the above-mentioned first input, the audio playback device displays N function options.

Among them, each function option corresponds to a different function, and N is a positive integer.

Step 201b2: The audio playback device receives a third input from the user on the target function option among the above N function options.

Exemplarily, the above-mentioned second input may be a user's click input on the target function option, or a voice command input by the user, or a specific gesture input by the user, which can be specifically determined according to actual use requirements. This is not limited.

Step 201b3: In response to the third input, the audio playback device displays M options.

Example 7, in conjunction with FIG. 2 , after the mobile phone user clicks the “AI noise reduction” control 31 (ie, the above-mentioned first input). As shown in FIG. 4 , two function options are displayed on the screen of the mobile phone, namely, a “voice enhancement” function option 41 and a “voice suppression” function option 42 . If the mobile phone user wants to obtain the audio that only contains the singing voice of user A, the user can click the “voice enhancement” function option 41. At this time, as shown in FIG. 3, the mobile phone displays 5 corresponding 5 shooting scenes in the window 33. The options are "outdoor" option, "concert" option, "seaside" option34, "transit" option and "home" option.

Example 8, in conjunction with FIG. 2 , after the mobile phone user clicks the “AI noise reduction” control 31 (ie, the above-mentioned first input). As shown in FIG. 4 , two function options are displayed on the screen of the mobile phone, namely, a “voice enhancement” function option 41 and a “voice suppression” function option 42 . If the mobile phone user wants to obtain the audio that only contains the sound of the ocean waves, the mobile phone user can click the "Voice Suppression" function option 42. At this time, as shown in FIG. 5, the mobile phone displays 3 corresponding noise reduction levels in the window 51 The options are "High" option 52, "Medium" option and "Low" option 53. If the mobile phone user wants to suppress the singing voice of user A slightly, the mobile phone user can click the "low" option 53; if the mobile phone user wants to completely suppress the singing voice of user A, the mobile phone user can click the "high" option 52, and the mobile phone only keeps the video The sound of waves in 1.

The audio playback method provided by the embodiment of the present application can be applied to a scenario where a function is determined by selecting a function option. The audio playback device can display N function options, and the user can select a corresponding function according to requirements, and then select the corresponding function according to the M function corresponding to the function. One option is to select the corresponding noise reduction model. In this way, not only the accuracy of the user selecting the noise reduction model can be improved, but also the flexibility of the audio playback device for audio noise reduction can be improved.

Optionally, in this embodiment of the present application, when the target noise feature is a noise feature corresponding to the noise reduction degree, the audio playback apparatus may extract the target audio by using the determined noise reduction degree.

In an example, in the above step 203, the target audio in the audio file is extracted according to the first audio, which may specifically include the following step 203a:

Step 203a: The audio playback device filters out part or all of the first audio from the audio file to obtain the second audio in the audio file.

Example 1, when the first audio is the target human voice, the audio playback device filters out part or all of the target human voice from the audio file to obtain the second audio in the audio file, that is, the audio file is suppressed.

Example 2, when the first audio is ambient sound, the audio playback device filters out part or all of the ambient sound from the audio file to obtain the second audio in the audio file, that is, performing vocal enhancement on the audio file.

In another example, the audio playback apparatus extracts all or part of the first audio from the audio file.

The audio playback method in the embodiment of the present application can be applied to the scenario of acquiring target audio with different noise reduction degrees, and the audio playback device can flexibly obtain target audio with different noise reduction degrees according to the user's needs, which improves the audio playback performance. flexibility.

It should be noted that, in the audio playback method provided by the embodiments of the present application, the execution body may be an audio playback device, or a control module in the audio playback device for executing the audio playback method. In the embodiments of the present application, an audio playing method performed by an audio playing device is used as an example to describe the audio playing device provided by the embodiments of the present application.

FIG. 6 is a schematic diagram of a possible structure for implementing an audio playback device provided by an embodiment of the present application. As shown in FIG. 6 , the audio playback device 600 includes: a determination module 601, a noise reduction module 602, and a playback module 603, wherein: a determination module 601, for determining the target noise feature under the situation of playing the audio file in the multimedia file; the noise reduction module 602, for determining the first audio in the audio file according to the target noise feature determined by the determining module 601; an audio, extracting the target audio in the audio file; the playing module 603, for playing the target audio extracted by the noise reduction module 602; wherein, the target audio is the first audio or the second audio, and the second audio is in the audio file Audio other than the first audio.

Optionally, as shown in FIG. 6 , the audio playback device 600 further includes: a receiving module 604 and a display module 605; the receiving module 604 is configured to receive the first input of the user in the case of playing the audio file in the multimedia file; displaying The module 605 is configured to display M options in response to the first input received by the receiving module 604, the M options correspond to M noise reduction models, the noise characteristics of each noise reduction model are different, and M is a positive integer; the receiving module 604, also used to receive the second input of the user to the target option in the M options; the determining module 601 is specifically configured to respond to the second input received by the receiving module 604, and determine the target noise reduction model corresponding to the target option, And determine the target noise feature according to the target noise reduction model.

Optionally, the noise reduction module 602 is specifically configured to filter out part or all of the first audio from the audio file to obtain the second audio in the audio file.

Optionally, the target noise feature is a noise feature corresponding to a shooting scene, or a noise feature corresponding to a noise reduction degree.

Optionally, the noise reduction module 602 is specifically configured to: in the case that the first audio is the target human voice, filter out part or all of the target human voice from the audio file to obtain the second audio in the audio file, The file is subjected to vocal suppression; or, when the first audio is ambient sound, part or all of the ambient sound is filtered from the audio file to obtain the second audio in the audio file, and vocal enhancement is performed on the audio file.

It should be noted that, as shown in FIG. 6 , the modules that must be included in the electronic device 600 are indicated by solid line frames, such as the determination module 601; the modules that may or may not be included in the electronic device 600 are indicated by dotted line frames, such as display modules 605.

In the audio playback device provided by the embodiment of the present application, the audio playback device can determine the target noise feature in the case of playing the audio file in the multimedia file. Then, the audio playback device may determine the first audio in the audio file according to the target noise feature. Then, the audio playback device can extract the target audio in the audio file according to the first audio, and play the target audio; wherein, the target audio is the first audio or the second audio, and the second audio is the first audio in the audio file except the first audio. Audio other than audio. Through the above solution, first, in the case that the audio playback device plays the audio file, the audio playback device can determine the target noise feature corresponding to the audio file. When the audio playback device determines the target noise feature, it can accurately determine the first audio in the audio file according to the target noise feature. Secondly, the audio playback device can extract the target audio in the audio file according to the first audio. Due to the improvement of the accuracy rate of determining the first audio, the accuracy of extracting the target audio in the audio file is also improved, so that the audio playback can be made. The device accurately suppresses the noise in the audio file, thereby obtaining the playback effect of the audio file required by the user. In this way, the purpose of improving the playback effect of the audio file is achieved.

The audio playback device in this embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The apparatus may be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc., non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.

The audio playback device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The audio playback device provided in the embodiment of the present application can implement each process implemented by the method embodiments in FIG. 1 to FIG. 5 , and to avoid repetition, details are not described here.

Optionally, as shown in FIG. 7 , an embodiment of the present application further provides an electronic device 700, including a processor 701, a memory 702, a program or instruction stored in the memory 702 and executable on the processor 701, When the program or instruction is executed by the processor 701, each process of the above-mentioned audio playback method embodiments can be implemented, and the same technical effect can be achieved. To avoid repetition, details are not described here.

It should be noted that the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.

FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110, etc. part.

Those skilled in the art can understand that the electronic device 100 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions. The structure of the electronic device shown in FIG. 8 does not constitute a limitation on the electronic device. The electronic device may include more or less components than those shown in the figure, or combine some components, or arrange different components, which will not be repeated here. .

Wherein, the processor 110 is used to determine the target noise feature under the condition of playing the audio file in the multimedia file; and according to the target noise feature, determine the first audio in the audio file; and according to the first audio, extract the audio in the audio file The audio output unit 103 is used for playing the target audio extracted by the processor 110; wherein, the target audio is the first audio or the second audio, and the second audio is the audio other than the first audio in the audio file audio.

Optionally, the user input unit 107 is used for receiving the first input of the user in the case of playing the audio file in the multimedia file; the display unit 106 is used for displaying M in response to the first input received by the user input unit 107 options, the M options correspond to M noise reduction models, each noise reduction model has different noise characteristics, and M is a positive integer; the user input unit 107 is further configured to receive the user's second selection of the target option in the M options. Input; the processor 110 is specifically configured to, in response to the second input received by the user input unit 107, determine a target noise reduction model corresponding to the target option, where the target noise reduction model corresponds to the target noise feature.

Optionally, the processor 110 is specifically configured to filter out part or all of the first audio from the audio file to obtain the second audio in the audio file.

Optionally, the processor 110 is specifically configured to: in the case that the first audio is a target human voice, filter out part or all of the target human voice from the audio file to obtain the second audio in the audio file, Perform vocal suppression; or, when the first audio is ambient sound, filter out part or all of the ambient sound from the audio file to obtain the second audio in the audio file, and perform vocal enhancement on the audio file.

In the electronic device provided by the embodiment of the present application, the electronic device can determine the target noise feature in the case of playing the audio file in the multimedia file. Then, the electronic device may determine the first audio in the audio file according to the target noise feature. Then, the device in Diaiyou can extract the target audio in the audio file according to the first audio, and play the target audio; wherein, the target audio is the first audio or the second audio, and the second audio is the audio file except the audio file. Audio other than the first audio. Through the above solution, firstly, when the electronic device plays the audio file, the electronic device can determine the target noise feature corresponding to the audio file. When the electronic device determines the target noise feature, it can accurately determine the first audio in the audio file according to the target noise feature. Secondly, the electronic device can extract the target audio in the audio file according to the first audio. Due to the improvement in the accuracy of determining the first audio, the accuracy of extracting the target audio in the audio file is also improved, so that the electronic device can be accurately The noise in the audio file is effectively suppressed, so as to obtain the playback effect of the audio file required by the user. In this way, the purpose of improving the playback effect of the audio file is achieved.

It should be understood that, in this embodiment of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042. Such as camera) to obtain still pictures or video image data for processing. The display unit 106 may include a display panel 1061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072 . The touch panel 1071 is also called a touch screen. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here. Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems. The processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 110 .

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium. When the program or instruction is executed by a processor, each process of the above-mentioned audio playback method embodiment can be achieved, and the same can be achieved. In order to avoid repetition, the technical effect will not be repeated here.

Wherein, the processor is the processor in the electronic device described in the foregoing embodiments. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the above audio playback method embodiments. Each process can achieve the same technical effect. In order to avoid repetition, it will not be repeated here.

It should be understood that the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element. In addition, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in the reverse order depending on the functions involved. To perform functions, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to some examples may be combined in other examples.

From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of this application, without departing from the scope of protection of the purpose of this application and the claims, many forms can be made, which all fall within the protection of this application.

Claims

A method for playing audio, the method comprising:

In the case of playing the audio file in the multimedia file, determine the target noise feature;

determining the first audio in the audio file according to the target noise feature;

According to the first audio, extract the target audio in the audio file, and play the target audio;

Wherein, the target audio is the first audio or the second audio, and the second audio is the audio other than the first audio in the audio file.
The method according to claim 1, wherein, in the case of playing an audio file in a multimedia file, determining the target noise feature comprises:

In the case of playing the audio file in the multimedia file, receiving the first input of the user;

In response to the first input, displaying M options, the M options correspond to M noise reduction models, each noise reduction model has different noise characteristics, and M is a positive integer;

receiving a second input from the user for a target option in the M options;

In response to the second input, a target noise reduction model corresponding to the target option is determined, and the target noise feature is determined according to the target noise reduction model.
The method according to claim 1, wherein the extracting the target audio in the audio file according to the first audio comprises:

Filter out part or all of the first audio from the audio file to obtain the second audio in the audio file.
The method according to any one of claims 1 to 3, wherein the target noise feature is a noise feature corresponding to a shooting scene, or a noise feature corresponding to a noise reduction degree.
The method according to any one of claims 1 to 3, wherein the extracting the target audio in the audio file according to the first audio comprises:

In the case where the first audio is the target human voice, filter out part or all of the target human voice from the audio file to obtain the second audio in the audio file, and perform human voice analysis on the audio file. sound suppression; or,

In the case where the first audio is ambient sound, filter out part or all of the ambient sound from the audio file to obtain the second audio in the audio file, and perform vocal enhancement on the audio file .
An audio playback device comprising: a determination module, a noise reduction module and a playback module;

The determining module is used to determine the target noise feature in the case of playing the audio file in the multimedia file;

The noise reduction module is configured to determine the first audio in the audio file according to the target noise feature determined by the determining module; and extract the target audio in the audio file according to the first audio;

The playing module is used to play the target audio extracted by the noise reduction module;

Wherein, the target audio is the first audio or the second audio, and the second audio is the audio other than the first audio in the audio file.
The audio playback device according to claim 6, wherein the audio playback device further comprises: a receiving module and a display module;

The receiving module is used to receive the first input of the user in the case of playing the audio file in the multimedia file;

The display module is configured to display M options in response to the first input received by the receiving module, the M options correspond to M noise reduction models, and the noise characteristics of each noise reduction model are different, and M is a positive integer;

The receiving module is further configured to receive a second input from the user to the target option in the M options;

The determining module is specifically configured to, in response to the second input received by the receiving module, determine a target noise reduction model corresponding to the target option, and determine the target noise feature according to the target noise reduction model .
The audio playback device according to claim 6, wherein the noise reduction module is specifically configured to filter out part or all of the first audio from the audio file to obtain the second audio in the audio file .
The audio playback device according to any one of claims 6 to 8, wherein the target noise feature is a noise feature corresponding to a shooting scene, or a noise feature corresponding to a noise reduction degree.
The audio playback device according to any one of claims 6 to 8, wherein the noise reduction module is specifically used for:

In the case where the first audio is the target human voice, filter out part or all of the target human voice from the audio file to obtain the second audio in the audio file, and perform human voice analysis on the audio file. sound suppression; or,

In the case where the first audio is ambient sound, filter out part or all of the ambient sound from the audio file to obtain the second audio in the audio file, and perform vocal enhancement on the audio file .
An electronic device, comprising a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being executed by the processor to achieve as claimed in claims 1 to 5 The steps of any one of the audio playback methods.
A readable storage medium on which programs or instructions are stored, and when the programs or instructions are executed by a processor, implement the steps of the audio playback method according to any one of claims 1 to 5.
A computer program product executed by at least one processor to implement the audio playback method of any one of claims 1 to 5.
An electronic device comprising the electronic device configured to perform the audio playback method of any one of claims 1 to 5.