CN112307161A - Method and apparatus for playing audio - Google Patents

Method and apparatus for playing audio Download PDF

Info

Publication number
CN112307161A
CN112307161A CN202010120432.1A CN202010120432A CN112307161A CN 112307161 A CN112307161 A CN 112307161A CN 202010120432 A CN202010120432 A CN 202010120432A CN 112307161 A CN112307161 A CN 112307161A
Authority
CN
China
Prior art keywords
voice input
audio
frequency band
indication information
input indication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010120432.1A
Other languages
Chinese (zh)
Other versions
CN112307161B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010120432.1A priority Critical patent/CN112307161B/en
Publication of CN112307161A publication Critical patent/CN112307161A/en
Application granted granted Critical
Publication of CN112307161B publication Critical patent/CN112307161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application discloses a method and a device for playing audio. One embodiment of the method comprises: acquiring audio to be played; acquiring voice input indication information of target equipment, wherein the voice input indication information is used for indicating whether the target equipment allows receiving voice input in a preset time period; determining a frequency band matched with the voice input indication information; and playing the audio to be played according to the determined frequency band. The implementation mode improves the audio output frequency band under the situation that human-computer interaction is needed, so that the influence on human voice is reduced.

Description

Method and apparatus for playing audio
Technical Field
Embodiments of the present application relate to the field of computer technologies, and in particular, to a method and an apparatus for playing audio.
Background
With the development of computer technology, intelligent voice interaction equipment is applied more and more widely.
In order to reduce the noise input in the speech recognition process, a filter circuit is usually built in the smart device to filter the noise similar to the human voice frequency band.
Disclosure of Invention
The embodiment of the application provides a method and a device for playing audio.
In a first aspect, an embodiment of the present application provides a method for playing audio, the method including: acquiring audio to be played; acquiring voice input indication information of target equipment, wherein the voice input indication information is used for indicating whether the target equipment allows receiving voice input in a preset time period; determining a frequency band matched with the voice input indication information; and playing the audio to be played according to the determined frequency band.
In some embodiments, the determining the frequency band matching with the voice input indication information includes: in response to the fact that the voice input indication information is used for indicating that the target device is allowed to receive voice input in the preset time period, selecting a target number of frequency bands from the first preset corresponding relation table according to the sequence of the frequency bands from high to low; and determining a frequency band matched with the voice input indication information from the target number of frequency bands.
In some embodiments, the determining the frequency band matching with the voice input indication information includes: and in response to the fact that the voice input indication information is used for indicating that the target device is not allowed to receive the voice input in the preset time period, selecting a frequency band consistent with a frequency band corresponding to the audio to be played from the second preset corresponding relation table as a frequency band matched with the voice input indication information.
In some embodiments, the acquiring the voice input indication information of the target device includes: and generating voice input indication information for indicating that the target device is not allowed to receive voice input at the preset time in response to receiving the instruction for representing the audio playing.
In some embodiments, the playing the audio to be played according to the determined frequency band includes: performing sound wave transformation on the audio to be played to generate target audio, wherein the frequency band of the target audio is a subset of the determined frequency band; and playing the target audio.
In a second aspect, an embodiment of the present application provides an apparatus for playing audio, the apparatus including: a first acquisition unit configured to acquire audio to be played; a second acquisition unit configured to acquire voice input indication information of the target device, wherein the voice input indication information is used for indicating whether the target device is allowed to receive voice input in a preset time period; a determination unit configured to determine a frequency band matching the voice input indication information; and the playing unit is configured to play the audio to be played according to the determined frequency band.
In some embodiments, the determining unit includes: the selecting module is configured to select a target number of frequency bands from the first preset corresponding relation table according to the sequence of the frequency bands from high to low in response to the fact that the voice input indicating information is used for indicating that the target device is allowed to receive voice input in the preset time period; and the determining module is configured to determine a frequency band matched with the voice input indication information from the target number of frequency bands.
In some embodiments, the determining unit is further configured to: and in response to the fact that the voice input indication information is used for indicating that the target device is not allowed to receive the voice input in the preset time period, selecting a frequency band consistent with a frequency band corresponding to the audio to be played from the second preset corresponding relation table as a frequency band matched with the voice input indication information.
In some embodiments, the second obtaining unit is further configured to: in response to receiving an instruction characterizing playing of audio, generating voice input indication information for indicating that the target device is not allowed to receive voice input for a preset time period.
In some embodiments, the playing unit includes: the generating module is configured to perform sound wave transformation on the audio to be played to generate target audio, wherein the frequency band of the target audio is a subset of the determined frequency band; a playing module configured to play the target audio.
In a third aspect, an embodiment of the present application provides a terminal, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements the method as described in any of the implementations of the first aspect.
According to the method and the device for playing the audio, firstly, the audio to be played is obtained; then, acquiring voice input indication information of the target device, wherein the voice input indication information is used for indicating whether the target device is allowed to receive voice input in a preset time period; then, determining a frequency band matched with the voice input indication information; and finally, playing the audio to be played according to the determined frequency band. Therefore, the audio output frequency band is improved under the situation that human-computer interaction is needed, and the influence on human voice is reduced. Moreover, the audio is played in a normal mode under the situation that human-computer interaction is not needed, and the auditory experience of a user is guaranteed to be influenced as little as possible.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for playing audio according to the present application;
FIG. 3 is a schematic diagram of one application scenario of a method for playing audio according to an embodiment of the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for playing audio according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for playing audio according to the present application;
FIG. 6 is a schematic block diagram of an electronic device suitable for use in implementing embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary architecture 100 to which the method for playing audio or the apparatus for playing audio of the present application may be applied.
As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal apparatus 101 interacts with the server 103 through the network 102 to receive or transmit messages and the like. The terminal device 101 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, a voice interaction application, and the like.
The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices having a speaker and supporting human-computer interaction, including but not limited to a smart phone, a tablet computer, a smart speaker, a laptop portable computer, a desktop computer, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 103 may be a server providing various services, such as a background server providing support for playing audio on the terminal device 101. The background server may analyze and process the received voice input instruction information, generate a processing result (for example, a frequency band matched with the voice input prompt information), and may also feed back the generated processing result to the terminal device.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the terminal apparatus 101 may also directly analyze the voice input instruction information and generate a processing result. At this time, the network 102 and the server 103 may not exist. The method for playing audio provided by the embodiment of the present application is generally executed by the terminal device 101, and accordingly, the apparatus for playing audio is generally disposed in the terminal device 101. Alternatively,
it should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for playing audio in accordance with the present application is shown. The method for playing audio includes the steps of:
step 201, acquiring an audio to be played.
In this embodiment, the execution main body of the method for playing audio (such as the terminal 101 shown in fig. 1) may acquire the audio to be played through a wired connection manner or a wireless connection manner. As an example, the execution subject may locally acquire pre-stored audio to be played. As yet another example, the execution body may acquire audio to be played from a communicatively connected electronic device. The audio to be played may include various audio that can be played. The audio to be played may include an audio file, such as song audio, by way of example. As yet another example, the audio to be played may include audio generated by a speech synthesis technique, such as a "i'm listening" speech file.
Step 202, acquiring voice input indication information of the target device.
In this embodiment, the execution main body may acquire the voice input instruction information of the target device by a wired connection manner or a wireless connection manner. The voice input indication information may be used to indicate whether the target device is allowed to receive voice input in a preset time period. The voice input may generally refer to that the execution body receives a voice of a user for voice interaction. In practice, whether to allow receiving the voice input may be embodied as whether a microphone is turned on or not, or whether to transmit the collected voice to a processor in the execution main body, which is not limited herein. The preset time period is generally a time period counted from the present or later (e.g., after 1 minute). Optionally, the preset time period may also be related to the time when the audio is played. For example, assuming that the audio is played after 2s, the preset time period may be a time period from 2s to 1 min. The target device may be any device that is specified in advance according to actual application requirements. The target device may also be a device that is determined according to rules, for example, a device that plays audio in a manner controlled by the target device. When the execution agent is a terminal, the target device may be the execution agent itself.
As an example, in response to receiving an instruction characterizing switching to a voice interaction mode, the execution body may generate voice input indication information indicating that the target device is allowed to receive voice input for a preset time period. The instruction for switching the representation to the voice interaction mode may include a preset wake-up word.
In this embodiment, the execution main body may acquire the voice input instruction information from the target apparatus. As an example, the voice input indication information may be preset, for example, voice input indication information that indicates that the target device is not allowed to receive voice input for a preset time period by default. As still another example, the voice input indication information indicating whether the target device allows receiving the voice input for a preset time period may be switched according to a preset time.
Step 203, determining a frequency band matched with the voice input indication information.
In this embodiment, the execution subject may determine the frequency band matching the voice input indication information acquired in step 202 in various ways. As an example, in response to determining that the acquired voice input indication information indicates that the target device is not allowed to receive voice input for a preset time period, the execution main body may determine a frequency band corresponding to the audio to be played as a frequency band matching the voice input indication information.
As still another example, in response to determining that the acquired voice input indication information indicates that the target device is allowed to receive voice input for a preset time period, the execution main body may determine a frequency band matching the voice input indication information according to the following steps: first, the frequency band of the audio to be played is determined. And then, determining the intersection between the frequency band of the audio to be played and the preset frequency band. The preset frequency band can be a voice frequency band for a person to speak under the voice recognition scene. Then, in response to determining that the intersection meets a preset playing condition, the execution main body may determine a frequency band corresponding to the audio to be played as a frequency band matched with the voice input indication information. In response to determining that the intersection does not satisfy the preset playing condition, the execution main body may determine a frequency band matched with the voice input indication information as a frequency band higher than a frequency band corresponding to the audio to be played. The preset playing condition may include, but is not limited to, at least one of the following: the audio time corresponding to the intersection between the frequency bands is smaller than a preset time threshold, and the minimum frequency corresponding to the intersection between the frequency bands is larger than a preset frequency threshold.
In some optional implementation manners of this embodiment, in response to determining that the voice input indication information is used to indicate that the target device is not allowed to receive the voice input in the preset time period, the execution main body may further select, from the second preset correspondence table, a frequency band that is consistent with a frequency band corresponding to the audio to be played. The second preset correspondence table may be configured to represent a correspondence between a frequency band and voice input indication information. For example, the voice input indication information for indicating that the target device is not allowed to receive the voice input in the preset time period corresponds to a first frequency band, and the voice input indication information for indicating that the target device is allowed to receive the voice input in the preset time period corresponds to a second frequency band, where the second frequency band may be a subset of the first frequency band.
And step 204, playing the audio to be played according to the determined frequency band.
In this embodiment, the execution main body may play the audio to be played in various ways according to the determined frequency band. As an example, in response to determining that the determined frequency band is consistent with the frequency band with the playing audio, the execution main body may directly play the audio to be played. As another example, the execution main body may further transform the audio to be played to the determined frequency band and then play the audio.
In some optional implementation manners of this embodiment, based on the above optional implementation manners, the execution main body may further play the audio to be played through the following steps:
firstly, sound wave transformation is carried out on audio to be played to generate target audio.
In these implementations, the execution main body may perform sound wave transformation on the audio to be played in various ways to generate the target audio. The frequency bands of the target audio are usually a subset of the determined frequency bands. The sound wave transformation can be used for mapping the frequency band of the audio to be played to the determined frequency band range. The manner of sound wave transformation may include amplification operations for each frame frequency in the audio, which may include, but is not limited to, at least one of: multiplied by a preset value greater than 1, plus a preset value greater than 0, etc.
And secondly, playing the target audio.
In these implementations, the execution subject may play the target audio generated in the first step.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for playing audio according to an embodiment of the present application. In the application scenario of fig. 3, the user 301 says the wake-up word "small a" 303 to the terminal device 302. The terminal device 302 locally obtains the audio to be played "please talk …, i.e. listen" 304 in response to the wake-up word 303. The terminal device 302 acquires voice input indication information 305 indicating that the terminal device is allowed to receive voice input for a preset time period according to the received wake-up word 303. Then, the terminal device 302 may determine that the frequency band matching the voice input indication information 305 is a frequency band 306 with a higher frequency than the audio 304 to be played. Finally, the terminal device 302 may play the audio "please talk …, i am listening" 307 in the determined higher frequency band 306.
At present, in one of the prior arts, a filter circuit is usually built in an intelligent device to filter out noise similar to a human voice frequency band, which results in an increase in hardware manufacturing and maintenance costs. The method provided by the embodiment of the application determines the frequency band for playing the audio according to the voice input indication information, so that the audio output frequency band is improved under the situation that human-computer interaction is needed, and the influence on human voice is reduced. Moreover, the audio is played in a normal mode under the situation that human-computer interaction is not needed, and the auditory experience of a user is guaranteed to be influenced as little as possible.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for playing audio is shown. The process 400 of the method for playing audio includes the following steps:
step 401, obtaining an audio to be played.
Step 402, acquiring voice input indication information of a target device.
In some optional implementations of this embodiment, in response to receiving an instruction to characterize playing audio, an execution subject of the method for playing audio (e.g., the terminal device 101 shown in fig. 1) may generate voice input indication information indicating that the target device is not allowed to receive voice input for a preset time period. The instruction for playing the audio may be, for example, "play a song" or "play a piece of funny video" received in the voice interaction process.
It should be noted that, since most video files will have corresponding audio integrated therein, the instruction for playing audio may also include an instruction for playing video files with audio.
And step 403, in response to determining that the voice input indication information is used for indicating that the target device is allowed to receive voice input in a preset time period, selecting a target number of frequency bands from the first preset corresponding relation table according to the sequence of the frequency bands from high to low.
In this embodiment, in response to determining that the voice input indication information indicates that the target device is allowed to receive the voice input in the preset time period, the execution main body may select the target number of frequency bands from the first preset correspondence table in order of frequency bands from top to bottom. The first preset correspondence table may be configured to represent a correspondence between a frequency band and voice input indication information. The first preset mapping table may be the same as or different from the second preset mapping table, and is not limited herein. As an example, the frequency range of sound that can be heard by the human ear is 20Hz to 20 kHz. The first preset correspondence table may divide the frequency range into a preset number of frequency bands. Alternatively, in order to enrich the frequency range of the played audio, there may be an intersection between the above frequency bands. The target number may be any value (e.g., 2 or 3) specified in advance, or may be a value determined according to the actual application requirement (e.g., 50% of the preset number).
In step 404, a frequency band matching the voice input indication information is determined from the target number of frequency bands.
In this embodiment, the executing entity may determine a frequency band matching the voice input indication information from the target number of frequency bands selected in step 403 by various methods. As an example, the execution main body may randomly select a frequency band from the target number of frequency bands as a frequency band matched with the voice input indication information. As yet another example, the execution body may first determine a statistical value of a frequency band of the audio to be played. Wherein, the frequency band statistic may include, but is not limited to, at least one of the following: maximum, minimum, mean, variance, median. Then, the execution main body may select a frequency band having a statistical value closest to the statistical value of the frequency band of the audio to be played from the target number of frequency bands as a frequency band matched with the voice input indication information.
In some optional implementation manners of this embodiment, in response to determining that the voice input indication information is used to indicate that the target device is not allowed to receive the voice input in the preset time period, the execution main body may further select, from the second preset correspondence table, a frequency band that is consistent with a frequency band corresponding to the audio to be played as the frequency band that is matched with the voice input indication information. The second preset correspondence table may be configured to represent a correspondence between a frequency band and voice input indication information. The above consistency may refer to the same, or may refer to that the ratio of the frequency bands in the overlapping portion exceeds a preset threshold, which is not limited herein.
And 405, playing the audio to be played according to the determined frequency band.
Step 401, step 402, and step 405 are respectively consistent with step 201, step 202, step 204 and their optional implementations in the foregoing embodiments, and the above description on step 201, step 202, step 204 and their optional implementations also applies to step 401, step 402, and step 405, which is not described herein again.
As can be seen from fig. 4, the flow 400 of the method for playing audio in the present embodiment refines the step of determining the frequency band matching the voice input indication information. Therefore, the scheme described in the embodiment can set a proper output frequency band under the situation that human-computer interaction is needed, so that the interference on human voice input can be reduced, and the uncomfortable influence of the rising of the audio frequency on the auditory experience of a user is reduced as much as possible.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for playing audio, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for playing audio provided by the present embodiment includes a first acquiring unit 501, a second acquiring unit 502, a determining unit 503, and a playing unit 504. The first obtaining unit 501 is configured to obtain audio to be played; a second obtaining unit 502 configured to obtain voice input indication information of the target device, wherein the voice input indication information is used for indicating whether the target device is allowed to receive voice input in a preset time period; a determination unit 503 configured to determine a frequency band matching the voice input indication information; and a playing unit 504 configured to play the audio to be played according to the determined frequency band.
In the present embodiment, in the apparatus 500 for playing audio: the specific processing of the first obtaining unit 501, the second obtaining unit 502, the determining unit 503 and the playing unit 504 and the technical effects thereof can refer to the related descriptions of step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of the present embodiment, the determining unit 503 may include a selecting module (not shown in the figure) and a determining module (not shown in the figure). The selecting module may be configured to select, in response to determining that the voice input indication information indicates that the target device is allowed to receive the voice input in the preset time period, the target number of frequency bands from the first preset correspondence table in an order from high to low. The determining module may be configured to determine a frequency band matching the voice input indication information from the target number of frequency bands.
In some optional implementations of this embodiment, the determining unit 503 may be further configured to be a selecting module, configured to select, in response to determining that the voice input indication information is used to indicate that the target device is allowed to receive voice input in a preset time period, a target number of frequency bands from the first preset correspondence table in an order from high to low; and the determining module is configured to determine a frequency band matched with the voice input indication information from the target number of frequency bands.
In some optional implementations of the present embodiment, the second obtaining unit 502 may be further configured to: in response to receiving an instruction characterizing playing of audio, generating voice input indication information for indicating that the target device is not allowed to receive voice input for a preset time period.
In some optional implementations of this embodiment, the playing unit 504 may include: a generating module (not shown in the figure) and a playing module (not shown in the figure). The generating module may be configured to perform sound wave transformation on the audio to be played to generate the target audio. Wherein the frequency bands of the target audio may be a subset of the determined frequency bands. The playing module may be configured to play the target audio.
The apparatus provided in the foregoing embodiment of the present application first obtains the audio to be played through the first obtaining unit 501. Then, the second acquisition unit 502 acquires the voice input instruction information of the target apparatus. Wherein the voice input indication information is used for indicating whether the target device is allowed to receive the voice input in a preset time period. After that, the determination unit 503 determines a frequency band matching the voice input instruction information. The playing unit 504 plays the audio to be played according to the determined frequency band. Therefore, the audio output frequency band is improved under the situation that human-computer interaction is needed, and the influence on human voice is reduced. Moreover, the audio is played in a normal mode under the situation that human-computer interaction is not needed, and the auditory experience of a user is guaranteed to be influenced as little as possible.
Referring now to fig. 6, shown is a schematic diagram of an electronic device (e.g., the terminal device in fig. 1) 600 suitable for implementing embodiments of the present application. The terminal device in the embodiments of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a smart speaker, and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage device 608 including, for example, a flash memory; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present application.
It should be noted that the computer readable medium described in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device to: acquiring audio to be played; acquiring voice input indication information of target equipment, wherein the voice input indication information is used for indicating whether the target equipment allows receiving voice input in a preset time period; determining a frequency band matched with the voice input indication information; and playing the audio to be played according to the determined frequency band.
Computer program code for carrying out operations for embodiments of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises a first acquisition unit, a second acquisition unit, a determination unit and a playing unit. Where the names of the units do not in some cases constitute a limitation on the unit itself, for example, the first acquisition unit may also be described as a "unit that acquires audio to be played".
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present application is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present application are mutually replaced to form the technical solution.

Claims (12)

1. A method for playing audio, comprising:
acquiring audio to be played;
acquiring voice input indication information of target equipment, wherein the voice input indication information is used for indicating whether the target equipment allows receiving voice input in a preset time period;
determining a frequency band matched with the voice input indication information;
and playing the audio to be played according to the determined frequency band.
2. The method of claim 1, wherein the determining a frequency band matching the speech input indication information comprises:
in response to determining that the voice input indication information is used for indicating that the target device is allowed to receive voice input in a preset time period, selecting a target number of frequency bands from a first preset corresponding relation table according to a sequence of the frequency bands from high to low;
and determining a frequency band matched with the voice input indication information from the target number of frequency bands.
3. The method of claim 1 or 2, wherein the determining a frequency band matching the speech input indication information comprises:
and in response to determining that the voice input indication information is used for indicating that the target device is not allowed to receive voice input in a preset time period, selecting a frequency band consistent with a frequency band corresponding to the audio to be played from a second preset corresponding relation table as the frequency band matched with the voice input indication information.
4. The method of claim 3, wherein the obtaining of the voice input indication information of the target device comprises:
in response to receiving an instruction for playing audio, generating voice input indication information for indicating that the target device is not allowed to receive voice input for a preset time period.
5. The method of claim 2, wherein the playing the audio to be played according to the determined frequency band comprises:
performing sound wave transformation on the audio to be played to generate a target audio, wherein the frequency band of the target audio is a subset of the determined frequency band;
and playing the target audio.
6. An apparatus for playing audio, comprising:
a first acquisition unit configured to acquire audio to be played;
a second acquisition unit configured to acquire voice input indication information of a target device, wherein the voice input indication information is used for indicating whether the target device is allowed to receive voice input in a preset time period;
a determination unit configured to determine a frequency band matching the voice input indication information;
and the playing unit is configured to play the audio to be played according to the determined frequency band.
7. The apparatus of claim 6, wherein the determining unit comprises:
a selecting module configured to select a target number of frequency bands from a first preset correspondence table in an order from high to low in response to determining that the voice input indication information is used to indicate that the target device is allowed to receive voice input in a preset time period;
a determining module configured to determine a frequency band matching the voice input indication information from the target number of frequency bands.
8. The apparatus of claim 6 or 7, wherein the determining unit is further configured to:
and in response to determining that the voice input indication information is used for indicating that the target device is not allowed to receive voice input in a preset time period, selecting a frequency band consistent with a frequency band corresponding to the audio to be played from a second preset corresponding relation table as the frequency band matched with the voice input indication information.
9. The apparatus of claim 8, wherein the second obtaining unit is further configured to:
in response to receiving an instruction for playing audio, generating voice input indication information for indicating that the target device is not allowed to receive voice input for a preset time period.
10. The apparatus of claim 7, wherein the play unit comprises:
a generating module configured to perform sound wave transformation on the audio to be played to generate a target audio, wherein a frequency band of the target audio is a subset of the determined frequency band;
a playback module configured to play the target audio.
11. A terminal, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN202010120432.1A 2020-02-26 2020-02-26 Method and apparatus for playing audio Active CN112307161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010120432.1A CN112307161B (en) 2020-02-26 2020-02-26 Method and apparatus for playing audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010120432.1A CN112307161B (en) 2020-02-26 2020-02-26 Method and apparatus for playing audio

Publications (2)

Publication Number Publication Date
CN112307161A true CN112307161A (en) 2021-02-02
CN112307161B CN112307161B (en) 2022-11-22

Family

ID=74336686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010120432.1A Active CN112307161B (en) 2020-02-26 2020-02-26 Method and apparatus for playing audio

Country Status (1)

Country Link
CN (1) CN112307161B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793621A (en) * 2021-09-22 2021-12-14 Oppo广东移动通信有限公司 Audio playing method and device, electronic equipment and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010156738A (en) * 2008-12-26 2010-07-15 Pioneer Electronic Corp Sound volume adjusting device, sound volume adjustment method, sound volume adjustment program, and recording medium storing the sound volume adjustment program
US9087520B1 (en) * 2012-12-13 2015-07-21 Rawles Llc Altering audio based on non-speech commands
US9830924B1 (en) * 2013-12-04 2017-11-28 Amazon Technologies, Inc. Matching output volume to a command volume
US20180091913A1 (en) * 2016-09-27 2018-03-29 Sonos, Inc. Audio Playback Settings for Voice Interaction
CN108091330A (en) * 2017-12-13 2018-05-29 北京小米移动软件有限公司 Output sound intensity adjusting method, device, electronic equipment and storage medium
CN108307022A (en) * 2018-01-23 2018-07-20 青岛海信移动通信技术股份有限公司 Method for controlling volume and device
CN108369805A (en) * 2017-12-27 2018-08-03 深圳前海达闼云端智能科技有限公司 Voice interaction method and device and intelligent terminal
CN109671429A (en) * 2018-12-02 2019-04-23 腾讯科技(深圳)有限公司 Voice interactive method and equipment
US20190237070A1 (en) * 2018-01-31 2019-08-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice interaction method, device, apparatus and server
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010156738A (en) * 2008-12-26 2010-07-15 Pioneer Electronic Corp Sound volume adjusting device, sound volume adjustment method, sound volume adjustment program, and recording medium storing the sound volume adjustment program
US9087520B1 (en) * 2012-12-13 2015-07-21 Rawles Llc Altering audio based on non-speech commands
US9830924B1 (en) * 2013-12-04 2017-11-28 Amazon Technologies, Inc. Matching output volume to a command volume
US20180091913A1 (en) * 2016-09-27 2018-03-29 Sonos, Inc. Audio Playback Settings for Voice Interaction
CN108091330A (en) * 2017-12-13 2018-05-29 北京小米移动软件有限公司 Output sound intensity adjusting method, device, electronic equipment and storage medium
CN108369805A (en) * 2017-12-27 2018-08-03 深圳前海达闼云端智能科技有限公司 Voice interaction method and device and intelligent terminal
CN108307022A (en) * 2018-01-23 2018-07-20 青岛海信移动通信技术股份有限公司 Method for controlling volume and device
US20190237070A1 (en) * 2018-01-31 2019-08-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice interaction method, device, apparatus and server
CN109671429A (en) * 2018-12-02 2019-04-23 腾讯科技(深圳)有限公司 Voice interactive method and equipment
CN110211599A (en) * 2019-06-03 2019-09-06 Oppo广东移动通信有限公司 Using awakening method, device, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793621A (en) * 2021-09-22 2021-12-14 Oppo广东移动通信有限公司 Audio playing method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN112307161B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
US11115539B2 (en) Smart voice system, method of adjusting output voice and computer readable memory medium
CN112306448A (en) Method, apparatus, device and medium for adjusting output audio according to environmental noise
CN110809214B (en) Audio playing method, audio playing device and terminal equipment
CN110062309B (en) Method and device for controlling intelligent loudspeaker box
US11822854B2 (en) Automatic volume adjustment method and apparatus, medium, and device
JP2020003774A (en) Method and apparatus for processing speech
CN113362839B (en) Audio data processing method, device, computer equipment and storage medium
CN110096250B (en) Audio data processing method and device, electronic equipment and storage medium
CN112307161B (en) Method and apparatus for playing audio
CN112309352A (en) Audio information processing method, apparatus, device and medium
CN114121050B (en) Audio playing method, device, electronic equipment and storage medium
CN113840034B (en) Sound signal processing method and terminal device
CN106293607B (en) Method and system for automatically switching audio output modes
CN111045634A (en) Audio processing method and device
CN111147655B (en) Model generation method and device
CN109375892B (en) Method and apparatus for playing audio
CN111833883A (en) Voice control method and device, electronic equipment and storage medium
CN111370017A (en) Voice enhancement method, device and system
CN111145792B (en) Audio processing method and device
CN111048108B (en) Audio processing method and device
US20180254056A1 (en) Sounding device, audio transmission system, and audio analysis method thereof
CN111048107B (en) Audio processing method and device
CN111145776B (en) Audio processing method and device
CN111210837B (en) Audio processing method and device
CN111145770A (en) Audio processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant