CN114121050A

CN114121050A - Audio playing method and device, electronic equipment and storage medium

Info

Publication number: CN114121050A
Application number: CN202111451000.XA
Authority: CN
Inventors: 马晨光; 陈吉胜
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-01
Anticipated expiration: 2041-11-30
Also published as: CN114121050B

Abstract

The embodiment of the application discloses an audio playing method and device, electronic equipment and a storage medium. One embodiment of the method comprises: acquiring audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; and playing the optimized audio. The method and the device optimize the abnormal segments in the audio to be played and improve the user experience.

Description

Audio playing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an audio playing method, an audio playing device, electronic equipment and a storage medium.

Background

Along with the development of the intelligent terminal, the entertainment functions of the intelligent terminal are more and more abundant, and a user can use the intelligent terminal to perform entertainment activities such as listening to music, watching videos or playing games, but the user often receives the interference of some abnormal sound segments, such as noise segments, in the process, and the experience is very poor.

Disclosure of Invention

The embodiment of the application provides an audio playing method, an audio playing device, electronic equipment and a storage medium.

In a first aspect, some embodiments of the present application provide an audio playing method, including: acquiring audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; and playing the optimized audio.

In some embodiments, determining whether the audio to be played includes an abnormal segment includes: determining whether the audio to be played can be played through the earphone; and responding to the determination that the audio to be played is played through the earphone, and determining whether the audio to be played comprises an abnormal segment.

In some embodiments, determining whether the audio to be played includes an abnormal segment includes: acquiring a voiceprint to be played by audio; and determining whether the acquired voiceprints comprise abnormal voiceprints or not through a pre-established abnormal voiceprint recognition library.

In some embodiments, the abnormal voiceprint recognition library comprises a recognition library established via the steps of: and carrying out statistical storage on the voiceprints of the common noise and the preset abnormal sound.

In some embodiments, optimizing abnormal segments included in the audio to be played to obtain an optimized audio includes: and carrying out volume reduction on the abnormal segment included in the audio to be played and/or replacing the abnormal segment by using preset audio content.

In a second aspect, some embodiments of the present application provide an audio playback apparatus, including: an acquisition unit configured to acquire audio to be played; a determination unit configured to determine whether an abnormal section is included in the audio to be played; the optimizing unit is configured to respond to the fact that the abnormal segments are determined to be included in the audio to be played, and optimize the abnormal segments included in the audio to be played to obtain optimized audio; a playback unit configured to play the optimized audio.

In some embodiments, the determining unit comprises: a first determining subunit configured to determine whether audio to be played will be played through the headphones; and the second determining subunit is configured to determine whether the abnormal segment is included in the audio to be played in response to determining that the audio to be played is to be played through the earphone.

In some embodiments, the determining unit comprises: an obtaining subunit configured to obtain a voiceprint to be played by audio; and the identifying subunit is configured to determine whether the acquired voiceprints comprise abnormal voiceprints through a pre-established abnormal voiceprint identifying library.

In some embodiments, the apparatus further comprises an abnormal voiceprint recognition library creation unit configured to: and carrying out statistical storage on the voiceprints of the common noise and the preset abnormal sound.

In some embodiments, the optimization unit is further configured to: and carrying out volume reduction on the abnormal segment included in the audio to be played and/or replacing the abnormal segment by using preset audio content.

In a third aspect, some embodiments of the present application provide an apparatus comprising: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described above in the first aspect.

In a fourth aspect, some embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method as described above in the first aspect.

According to the audio playing method, the audio playing device, the electronic equipment and the storage medium, the audio to be played is obtained; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; the optimized audio is played, the abnormal segment in the audio to be played is optimized, and the user experience is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a diagram of an exemplary system architecture to which some of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of an audio playback method according to the present application;

FIG. 3 is a schematic block diagram of an embodiment of an audio playback device according to the present application;

FIG. 4 is a block diagram of a computer system suitable for use in implementing a server or terminal of some embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the audio playback method or audio playback apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as music playing applications, video playing applications, e-commerce applications, game applications, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with audio playing function, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, a background server providing support for applications installed on the

terminal devices

101, 102, and 103, and the server 105 may obtain audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; and playing the optimized audio.

It should be noted that the audio playing method provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the audio playing apparatus may be disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an audio playback method according to the present application is shown. The audio playing method comprises the following steps:

step 201, acquiring an audio to be played.

In this embodiment, an audio playing method execution main body (for example, the server or the terminal shown in fig. 1) may obtain the audio to be played in response to receiving an audio playing instruction or in response to some playing setting. The audio to be played may include audio to be played while the user is engaged in an entertainment activity such as listening to music, watching video, or playing a game.

Step 202, determining whether the audio to be played includes an abnormal segment.

In this embodiment, the executing entity may determine whether the audio to be played acquired in step 201 includes an abnormal segment, where the abnormal segment is an abnormal sound segment, and may include various dissonant sound segments, such as sounds with too large volume, and segments of annoying sounds such as glaring or unintelligent words, for example. The abnormal fragments can be detected by a preset abnormal voiceprint recognition library or a pre-trained abnormal voiceprint recognition model, and since the voiceprints also belong to the category of images, the abnormal fragments can be obtained by training by referring to a training method of a common image recognition model. In addition, since the voiceprint can be regarded as a waveform, the abnormal voiceprint can be identified by setting upper and lower limits of the waveform and the like.

In some optional implementations of this embodiment, determining whether the audio to be played includes an abnormal segment includes: determining whether the audio to be played can be played through the earphone; and responding to the determination that the audio to be played is played through the earphone, and determining whether the audio to be played comprises an abnormal segment. In this implementation manner, the execution main body may determine whether the audio to be played will be played through the earphone by detecting whether the terminal is connected to the earphone and/or by determining whether the user wears the earphone through hardware devices such as the earphone.

Because for the situation of playing audio through the loudspeaker, the influence of hearing the abnormal segment is more direct when the user wears the earphone, and the damage to the hearing ability or the experience of the user is larger, therefore, in the implementation manner, before determining whether the abnormal segment is included in the audio to be played, it is determined whether the audio to be played can be played through the earphone, and after determining that the audio to be played can be played through the earphone, it is determined whether the abnormal segment is included in the audio to be played, so that the detection of the abnormal segment is more accurate, and the audio playing efficiency is further improved.

In some optional implementations of this embodiment, determining whether the audio to be played includes an abnormal segment includes: acquiring a voiceprint to be played by audio; and determining whether the acquired voiceprints comprise abnormal voiceprints or not through a pre-established abnormal voiceprint recognition library. Voiceprint (Voiceprint) is the spectrum of sound waves carrying verbal information displayed with an electro-acoustic instrument. In this implementation manner, the abnormal voiceprint recognition library may store information that can determine the abnormal voiceprint, such as the abnormal voiceprint itself and/or the characteristics of the abnormal voiceprint, and the execution main body may determine whether a voiceprint fragment in the voiceprint to be played matches with information in the abnormal voiceprint recognition library that is established in advance. According to the implementation mode, the abnormal voiceprint recognition library is established in advance, so that whether the acquired voiceprints comprise the abnormal voiceprints can be determined more comprehensively and in an individualized mode.

In some optional implementations of this embodiment, the abnormal voiceprint recognition library comprises a recognition library established via: and carrying out statistical storage on the voiceprints of the common noise and the preset abnormal sound. In this implementation manner, the voiceprints of the common noise and the preset abnormal sound can be directly stored, and the common characteristics of the voiceprints of the common noise and the preset abnormal sound can be counted and stored to obtain the abnormal voiceprint recognition library. The preset abnormal sound may include an abnormal sound set by a user and/or an application provider.

Step 203, in response to determining that the audio to be played includes the abnormal segment, performing optimization processing on the abnormal segment included in the audio to be played to obtain an optimized audio.

In this embodiment, the executing entity may perform, in response to determining that the audio to be played includes the abnormal segment in step 202, optimization processing on the abnormal segment included in the audio to be played to obtain an optimized audio. The optimization processing may include volume reduction and/or replacement using preset audio content, or may perform more personalized optimization processing according to a difference of the abnormal segment or a difference of the user, for example, volume reduction processing is performed on an abnormal segment with too large volume, tone reduction processing is performed on an abnormal segment with sharp ears, deletion processing is performed on an abnormal segment repeated for many times, and replacement processing is performed on an abnormal segment of the inexplicable language.

In some optional implementation manners of this embodiment, optimizing an abnormal segment included in the audio to be played to obtain an optimized audio includes: and carrying out volume reduction on the abnormal segment included in the audio to be played and/or replacing the abnormal segment by using preset audio content. The execution main body can reduce the volume to a range comfortable for human ears, and the preset audio content can comprise common audio such as 'drop' sound or the like or can use audio content set by a user. According to the implementation mode, the abnormal segment can be optimized rapidly by reducing the volume and/or replacing the preset audio content, and the audio playing efficiency is further improved.

And step 204, playing the optimized audio.

In this embodiment, the execution subject may play the audio obtained by optimizing the abnormal segment included in the audio to be played in step 203.

The method provided by the above embodiment of the present application obtains the audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; the optimized audio is played, the abnormal segment in the audio to be played is optimized, and the user experience is improved.

With further reference to fig. 3, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an audio playing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.

As shown in fig. 3, the audio playing device 300 of the present embodiment includes: an acquisition unit 301, a first determination unit 302, a second determination unit 303, and a first generation unit 304. The acquisition unit is configured to acquire audio to be played; a determination unit configured to determine whether an abnormal section is included in the audio to be played; the optimizing unit is configured to respond to the fact that the abnormal segments are determined to be included in the audio to be played, and optimize the abnormal segments included in the audio to be played to obtain optimized audio; a playback unit configured to play the optimized audio.

In this embodiment, the specific processing of the acquiring unit 301, the determining unit 302, the optimizing unit 303 and the playing unit 304 of the audio playing apparatus 300 may refer to step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2.

In some optional implementations of this embodiment, the determining unit includes: a first determining subunit configured to determine whether audio to be played will be played through the headphones; and the second determining subunit is configured to determine whether the abnormal segment is included in the audio to be played in response to determining that the audio to be played is to be played through the earphone.

In some optional implementations of this embodiment, the determining unit includes: an obtaining subunit configured to obtain a voiceprint to be played by audio; and the identifying subunit is configured to determine whether the acquired voiceprints comprise abnormal voiceprints through a pre-established abnormal voiceprint identifying library.

In some optional implementations of this embodiment, the apparatus further includes an abnormal voiceprint recognition library establishing unit configured to: and carrying out statistical storage on the voiceprints of the common noise and the preset abnormal sound.

In some optional implementations of this embodiment, the optimization unit is further configured to: and carrying out volume reduction on the abnormal segment included in the audio to be played and/or replacing the abnormal segment by using preset audio content.

The device provided by the above embodiment of the present application obtains the audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; the optimized audio is played, the abnormal segment in the audio to be played is optimized, and the user experience is improved.

Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use in implementing a server or terminal of an embodiment of the present application is shown. The server or the terminal shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components may be connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a determination unit, an optimization unit, and a playback unit. Where the names of the units do not in some cases constitute a limitation on the units themselves, for example, the obtaining unit may also be described as "configured to obtain the unit to be audio played".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring audio to be played; determining whether the audio to be played comprises an abnormal segment; in response to the fact that the audio to be played comprises the abnormal segment, optimizing the abnormal segment in the audio to be played to obtain an optimized audio; and playing the optimized audio.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An audio playback method, comprising:

acquiring to-be-played audio;

determining whether the to-be-audio playing comprises an abnormal segment;

in response to the fact that the abnormal segment is included in the to-be-played audio, optimizing the abnormal segment included in the to-be-played audio to obtain an optimized audio;

and playing the optimized audio.

2. The method of claim 1, wherein the determining whether the to-be-audio-played includes an abnormal segment comprises:

determining whether the audio to be played can be played through an earphone;

and in response to determining that the to-be-played audio can be played through the earphone, determining whether the to-be-played audio comprises an abnormal segment.

3. The method of claim 1, wherein the determining whether the to-be-audio-played includes an abnormal segment comprises:

acquiring the voiceprint to be played by the audio;

and determining whether the acquired voiceprints comprise abnormal voiceprints or not through a pre-established abnormal voiceprint recognition library.

4. The method of claim 1, wherein the abnormal voiceprint recognition library comprises a recognition library established via:

and carrying out statistical storage on the voiceprints of the common noise and the preset abnormal sound.

5. The method according to any one of claims 1 to 4, wherein the optimizing the abnormal segment included in the to-be-played audio to obtain an optimized audio includes:

and carrying out volume reduction on the abnormal segment included in the audio playing to be processed and/or replacing the abnormal segment by using preset audio content.

6. An audio playback apparatus comprising:

an acquisition unit configured to acquire a to-be-audio-played;

a determining unit configured to determine whether an abnormal segment is included in the to-be-audio-played;

the optimizing unit is configured to respond to the fact that the abnormal segment is determined to be included in the to-be-audio playing, and optimize the abnormal segment included in the to-be-audio playing to obtain an optimized audio;

a playback unit configured to play the optimized audio.

7. The apparatus of claim 6, wherein the determining unit comprises:

a first determining subunit configured to determine whether the to-be-audio-played will be played through a headphone;

and the second determining subunit is configured to determine whether the to-be-audio-played audio includes an abnormal segment or not in response to determining that the to-be-audio-played audio is to be played through the earphone.

8. The apparatus of claim 6, wherein the determining unit comprises:

an obtaining subunit, configured to obtain the voiceprint to be played by the audio;

and the identifying subunit is configured to determine whether the acquired voiceprints comprise abnormal voiceprints through a pre-established abnormal voiceprint identifying library.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.