CN113808611A - Audio playing method and device, computer readable storage medium and electronic equipment - Google Patents

Audio playing method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN113808611A
CN113808611A CN202111095336.7A CN202111095336A CN113808611A CN 113808611 A CN113808611 A CN 113808611A CN 202111095336 A CN202111095336 A CN 202111095336A CN 113808611 A CN113808611 A CN 113808611A
Authority
CN
China
Prior art keywords
audio
sound
playing
zone
audio playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111095336.7A
Other languages
Chinese (zh)
Inventor
刘松
朱长宝
牛建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Horizon Robotics Science and Technology Co Ltd
Original Assignee
Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Horizon Robotics Science and Technology Co Ltd filed Critical Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority to CN202111095336.7A priority Critical patent/CN113808611A/en
Publication of CN113808611A publication Critical patent/CN113808611A/en
Priority to PCT/CN2022/118396 priority patent/WO2023040820A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The embodiment of the disclosure discloses an audio playing method, an audio playing device, a computer readable storage medium and an electronic device, wherein the method comprises the following steps: determining audio playing sound zones from a preset number of sound zones in a target space; acquiring at least one path of original audio signal acquired by a preset microphone array; performing signal separation on at least one path of original audio signal to obtain at least one path of separated audio signal; determining sound areas corresponding to at least one path of separated audio signals respectively; and controlling the audio playing equipment in the target space to play the separated audio signals corresponding to the audio playing sound zone. This disclosed embodiment need not to set up solitary microphone and gathers audio signal, and the user also need not to hand or remove the position of the microphone that sets up alone and can accomplish and gather and play the audio frequency, has practiced thrift hardware resources, has made things convenient for user operation, can also shield the audio signal that other non-audio frequency broadcast sound zone gathered simultaneously when playing, has improved the quality of audio playback.

Description

Audio playing method and device, computer readable storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an audio playing method and apparatus, a computer-readable storage medium, and an electronic device.
Background
At present, in some spaces containing multiple persons, it is necessary to collect and play back sounds emitted by some persons or some areas. The mainstream scheme at present is to set up a separate microphone, and collect the sound of the user by means of holding or wearing the user. For example, in a scene where a user sings in a vehicle, an additional microphone device is required to be equipped in the vehicle as a sound pickup terminal, and parameters such as microphone sensitivity, directivity and the like of the sound pickup terminal are designed so that sound acquired by a microphone can shield the influence of sound feedback played by a loudspeaker. Alternatively, a mobile phone is used as a sound pickup terminal, and is connected to an in-vehicle system to be used as a microphone.
Disclosure of Invention
The embodiment of the disclosure provides an audio playing method and device, a computer readable storage medium and an electronic device.
An embodiment of the present disclosure provides an audio playing method, including: determining audio playing sound zones from a preset number of sound zones in a target space; acquiring at least one path of original audio signal acquired by a preset microphone array; performing signal separation on at least one path of original audio signal to obtain at least one path of separated audio signal; determining sound areas corresponding to at least one path of separated audio signals respectively; and controlling the audio playing equipment in the target space to play the separated audio signals corresponding to the audio playing sound zone.
According to another aspect of the embodiments of the present disclosure, there is provided an audio playing apparatus, including: the first determining module is used for determining audio playing sound zones from a preset number of sound zones in a target space; the first acquisition module is used for acquiring at least one path of original audio signals acquired by a preset microphone array; the separation module is used for carrying out signal separation on at least one path of original audio signals to obtain at least one path of separated audio signals; the second determining module is used for determining the sound zone corresponding to each of the at least one path of separated audio signals; and the control module is used for controlling the audio playing equipment in the target space to play the separated audio signals corresponding to the audio playing sound zone.
According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described audio playing method.
According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize the audio playing method.
Based on the audio playing method, device, computer readable storage medium and electronic device provided by the above embodiments of the present disclosure, an audio playing sound area is determined from a preset number of sound areas in a target space, then at least one original audio signal acquired by a microphone array is obtained, and signal separation is performed on the at least one original audio signal to obtain at least one separated audio signal, then sound areas corresponding to the at least one separated audio signal are determined, and finally the audio playing device is controlled to play the separated audio signal corresponding to the audio playing sound area, so that an audio signal emitted from a certain sound area is effectively acquired and played by using the microphone array fixedly provided with the microphone, an individual microphone is not required to be provided to acquire the audio signal, and a user can acquire and play audio without holding or moving to the position of the individual microphone, hardware resources are saved, user operation is facilitated, meanwhile, audio signals collected by other non-audio playing sound areas can be shielded during playing, and the quality of audio playing is improved.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a system diagram to which the present disclosure is applicable.
Fig. 2 is a flowchart illustrating an audio playing method according to an exemplary embodiment of the present disclosure.
Fig. 3 is a schematic diagram of an application scenario of an audio playing method according to an embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating an audio playing method according to another exemplary embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating an audio playing method according to another exemplary embodiment of the present disclosure.
Fig. 6 is a flowchart illustrating an audio playing method according to still another exemplary embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of an audio playing apparatus according to an exemplary embodiment of the present disclosure.
Fig. 8 is a schematic structural diagram of an audio playing apparatus according to another exemplary embodiment of the present disclosure.
Fig. 9 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In the existing audio acquisition and playing scheme, only one or two microphones are generally arranged in a target space, and the sharing of multiple persons cannot be realized. And the microphone is easy to lose, or has the problems of power shortage, re-adaptive connection and the like. A user holding a microphone may use it, while others may not. And when a certain user is used alone, the voice is interfered by other users, and the voice shielding function for other users is lacked. When the mobile phone is used as a pickup terminal, different mobile phones need to be reconnected with the audio playing device, which is time-consuming, and the problems of participation of many people and interference shielding of others cannot be solved.
Exemplary System
Fig. 1 illustrates an exemplary system architecture 100 of an audio playback method or audio playback apparatus to which embodiments of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, server 103, microphone array 104, and audio playback device 105. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The microphone array 104 may capture audio signals emitted within the target space. The audio playback device 105 may play back audio signals collected by the microphone array.
A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. The terminal device 101 may have various communication client applications installed thereon, such as a multimedia application, a search-type application, a web browser application, a shopping-type application, an instant messaging tool, and the like.
The terminal device 101 may be various electronic devices including, but not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc. The terminal device 101 may control a voice interaction device (which may be the terminal device 101 itself or another device connected to the terminal device 101) to perform voice interaction.
The server 103 may be a server that provides various services, such as a background server that processes audio signals uploaded by the terminal apparatus 101. The background server may separate the received original audio signals of one path of the sub-tree, determine the sound zone, and perform other processing to obtain a processing result (e.g., an audio signal corresponding to the audio playing sound zone).
It should be noted that the voice interaction method provided by the embodiment of the present disclosure may be executed by the server 103, or may be executed by the terminal device 101, and accordingly, the voice interaction apparatus may be disposed in the server 103, or may be disposed in the terminal device 101.
It should be understood that the number of terminal devices 101, networks 102, servers 103, microphone arrays 104 and audio playback devices 105 in fig. 1 is merely illustrative. There may be any number of terminal devices 101, networks 102, servers 103, microphone arrays 104, and audio playback devices 105, as desired for an implementation. For example, in the case where the audio signal does not require remote processing, the system architecture may not include a network and a server, and only includes a microphone array, a terminal device, and an audio playback device.
Exemplary method
Fig. 2 is a flowchart illustrating an audio playing method according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 2, the method includes the following steps:
step 201, determining audio playing sound zones from a preset number of sound zones in a target space.
In this embodiment, the electronic device may determine the audio playing soundzone from a preset number of soundzones within the target space. The target space may be various spaces such as in a car, in a room, etc., among others. The soundzone may be a plurality of regions into which the target space is artificially divided. For example, when the target sound zone is a vehicle interior space, the sound zone may be a space in which a driver's seat, a passenger seat, and seats on both sides of a rear seat are respectively located. As shown in fig. 3, the space where the four seats are located may be divided into corresponding sound zones, including 1L, 1R, 2L, and 2R.
An audio playback zone may be a zone in which sounds (e.g., human voice, animal voice, musical instrument performance voice, etc.) emitted by objects (e.g., humans, animals, musical instruments, etc.) located therein are collected and played. For example, the target space is a space in a vehicle, and the audio playing zone may be a space where a driver is located. The electronic device can determine the audio playback zone based on various means. For example, the audio playback zone may be determined according to an operation in which the user manually sets the audio playback zone. All the sound zones may also be determined as audio playing sound zones.
In one scenario, a passenger on a vehicle wants to sing, an audio playing zone can be selected by manipulating a touch screen on the vehicle, and in the subsequent step, the microphone array collects the sound of the passenger singing, and the sound of the passenger singing is played back by the audio playing device after the sound is processed in the subsequent step.
Step 202, at least one path of original audio signals collected by a preset microphone array is obtained.
In this embodiment, the electronic device may obtain at least one original audio signal collected by a preset microphone array. The microphone array (e.g., the microphone array 104 shown in fig. 1) is configured to collect sounds emitted in a target space to obtain at least one original audio signal, where each original audio signal corresponds to one microphone.
As an example, as shown in fig. 3, when the target space is a vehicle interior space, microphones a, b, c, d are respectively disposed beside the four seats, that is, the microphones a, b, c, d respectively collect audio signals of the four sound zones 1L, 1R, 2L, 2R.
Step 203, performing signal separation on at least one original audio signal to obtain at least one separated audio signal.
In this embodiment, the electronic device may perform signal separation on at least one original audio signal to obtain at least one separated audio signal.
As an example, the electronic device may employ existing blind source separation techniques to perform signal separation on at least one original audio signal. The blind source separation is a process of recovering each independent component from a source signal under the condition that parameters of the source signal and a transmission channel are not known. Blind source separation can employ existing algorithms such as ICA (independent component Analysis).
And at least one separated audio signal is obtained after separation, wherein each separated audio signal can be determined as an audio signal collected from a certain sound zone.
Optionally, when performing signal separation on at least one original audio signal, at least one original audio signal may be preprocessed first, and the preprocessing method may use the prior art. For example, an audio signal collected by a microphone and a reference signal played by an audio playing device are obtained, the signals are used for carrying out adaptive acoustic feedback elimination on at least one path of original audio signal, the reference signal is used for filtering an acoustic propagation path through adaptive fitting, and sound played by the audio playing device collected by the microphone is filtered from the at least one path of original audio signal, so that the phenomenon of acoustic feedback and howling or dragging caused by the formation of acoustic feedback during sound reproduction is avoided. After the pretreatment, the obtained pretreated audio is separated, and the noise collected by the microphone array can be filtered.
Optionally, after signal separation is performed on at least one path of original audio signals, multi-channel noise reduction may be performed on the separated signals. Specifically, each separated signal and the rest of the signals may be used, and the rest of the signals may be used as a reference by using an adaptive filtering algorithm, and the signal characteristics of each signal may be used to adaptively filter out the noise and the voice residue in the non-corresponding sound region in the separated signals. Or a deep learning model can be adopted, each path of separated signals are input into the model through a pre-trained model, and finally pure separated audio signals are output.
Step 204, determining the sound zone corresponding to each of the at least one separated audio signal.
In this embodiment, the electronic device may determine the sound zones corresponding to the at least one separated audio signal. Specifically, each separated audio signal may not correspond to an actual sound zone in a one-to-one manner, and therefore, the separated audio signal and the original audio signal (or the audio signal after the preprocessing operation) need to be matched to determine the sound zone corresponding to each separated audio signal.
As an example, a similarity between each separated audio signal and each original audio signal (or the audio signal after the preprocessing operation) may be determined, for each separated audio signal, an original audio signal corresponding to the maximum similarity corresponding to the separated audio signal may be determined, and according to a microphone corresponding to the determined original audio signal, a sound zone of the separated audio signal may be determined.
It should be noted that the microphones and the sound zones may not be in one-to-one correspondence. For example, a microphone may be disposed between two sound zones, and after determining a microphone corresponding to a certain separated audio signal, other methods may be used to determine from which sound zone the microphone collects the separated audio signal. For example, a camera may be disposed in the target space, the camera may capture images of each sound zone, and perform lip motion recognition on the captured images, so as to determine which sound zone has a sound source in each sound zone corresponding to the same microphone, and further determine the sound zone corresponding to the separated audio signal. Or determining the position relation between the microphone and the sound source by utilizing the existing sound source positioning technology, thereby determining the sound zone where the sound source is located and further determining the sound zone corresponding to the separated audio signal.
Step 205, controlling the audio playing device in the target space to play the separated audio signal corresponding to the audio playing audio zone.
In this embodiment, the electronic device may control an audio playing device (e.g., the audio playing device 105 shown in fig. 1) in the target space to play the separated audio signal corresponding to the audio playing zone.
Specifically, after the sound zone corresponding to each separated audio signal is determined, the separated audio signal corresponding to the audio playing sound zone can be determined. The electronic device may generate an instruction for instructing the audio playback device to play the separated audio signal, and the audio playback device plays the corresponding separated audio signal based on the instruction.
The method provided by the above embodiment of the present disclosure determines audio playing sound areas from a preset number of sound areas in a target space, then obtains at least one original audio signal collected by a microphone array, and performs signal separation on the at least one original audio signal to obtain at least one separated audio signal, then determines sound areas corresponding to the at least one separated audio signal, and finally controls an audio playing device to play the separated audio signals corresponding to the audio playing sound areas, thereby effectively utilizing the fixedly arranged microphone array to collect and play audio signals sent out in a certain sound area, without arranging a separate microphone to collect audio signals, and without holding or moving to a position of the separately arranged microphone by a user to complete audio collection and playing, saving hardware resources, facilitating user operation, and simultaneously shielding audio signals collected by other non-audio playing sound areas during playing, the quality of audio playing is improved.
In some alternative implementations, step 201 may be performed as follows:
first, a current audio play mode is determined.
The audio playing mode may include a plurality of modes. For example, a mode in which sounds in a single sound zone are collected and played, a mode in which sounds in a plurality of sound zones are collected and played, and the like.
Then, based on the audio playing mode, the audio playing sound zone is determined from the preset number of sound zones.
As an example, if the audio playing mode is the first mode, it is determined that the preset number of sound zones are all audio playing sound zones. The first mode may be a manually set mode or a default mode. For example, on a vehicle, when a singing application is turned on, a first mode, a chorus mode, is defaulted. At this time, each audio zone is an audio playing audio zone.
And if the audio playing mode is the second mode, determining the audio playing sound zone from the preset number of sound zones based on the sound zone selection operation performed by the user.
The second mode is a mode supporting the user to select at least one sound zone as an audio playing sound zone, and the second mode may be a manually set mode or a default mode. In the second mode, the user may select an audio playback zone. For example, in a vehicle, a passenger may select a sound zone corresponding to a certain seat as an audio playing sound zone by touching a screen, pressing a key, triggering with voice, and the like.
This implementation can realize setting up audio playback sound zone in a flexible way through setting up audio playback mode, and the user need not to remove the microphone or remove the position of self, can make electronic equipment gather and play the sound of audio playback sound zone to the convenience of audio playback has been improved.
In some optional implementations, as shown in fig. 4, the method may further include the steps of:
step 401, acquiring a voice signal for voice collection sent by a user.
The voice signal may be a signal collected by the microphone array.
Step 402, recognizing the voice signal to obtain a voice recognition result.
The method for recognizing the voice signal can adopt the prior art, and the voice recognition result can be expressed by words.
Step 403, updating the audio playing sound zone from the preset number of sound zones based on the voice recognition result.
Specifically, preset keywords may be extracted from the voice recognition result, and the audio playing sound zone may be updated according to the keywords. For example, if a certain collected voice signal includes the keyword "i want to sing", the sound zone corresponding to the voice signal may be determined according to step 204, and the sound zone corresponding to the voice signal may be determined as the audio playing sound zone.
It should be noted that steps 401 to 403 may be executed at any time after step 201, for example, during the playing of the separated audio signal by the audio playing apparatus, or before or after the playing of the separated audio signal.
This implementation, through discerning user's pronunciation, audio playback sound zone is updated based on speech recognition result, can realize through the interactive mode of pronunciation, nimble and conveniently adjust audio playback sound zone, and the user need not manual operation, has improved the convenience of audio playback greatly.
In some alternative implementations, step 401 may be performed as follows:
first, a target sound zone indicated by a voice recognition result is determined.
Specifically, the electronic device may determine keywords indicating the target soundzone from the speech recognition result. For example, if the speech recognition result contains "chorus", all the sound zones may be determined as target sound zones; if the speech recognition result includes "front row reception", 1L and 1R shown in fig. 3 are target sound zones; if the speech recognition result does not include the keyword indicating the target sound zone but includes the keyword for adjusting the audio playing sound zone, the sound zone generating the speech signal may be determined as the target sound zone according to the method described in step 204. For example, if the voice recognition result includes keywords "i do not want to sing", "i want to sing", and the like, it may be determined that the voice recognition result does not include a keyword indicating a sound zone, but the keyword is a keyword for adjusting a sound zone of audio playback.
And then, in response to the fact that the target sound zone is determined to be an audio playing sound zone and the voice recognition result is information representing that the playing of the separated audio signal corresponding to the target sound zone is stopped, controlling the audio playing device to stop playing the separated audio signal corresponding to the target audio playing sound zone.
Based on the above example, if the voice recognition result includes the keyword "i don't want to sing", and the target sound zone generating the voice signal is the audio playing sound zone, it may be determined that the voice recognition result represents that the playing of the separated audio signal corresponding to the target sound zone is stopped, and an instruction for controlling the audio playing device to stop playing the separated audio signal corresponding to the target sound zone is generated.
According to the implementation mode, the target sound zone indicated by the voice recognition result is determined, and when the voice recognition result shows that the separated audio signal corresponding to the target sound zone stops playing, the audio playing device is controlled to stop playing the separated audio signal, so that the audio playing device can be flexibly controlled by a user to stop playing the collected audio signal through voice, the user does not need to manually operate, and the convenience of audio playing is further improved.
In some optional implementations, after determining the target sound zone indicated by the speech recognition result, the method may further include:
in response to determining that the speech recognition result is information indicating that the target soundzone is adjusted to the audio playback soundzone, adjusting the target soundzone to the audio playback soundzone.
Specifically, as an example, if the voice recognition result includes the keyword "front row reception", 1L and 1R as shown in fig. 3 are adjusted to the audio playing zone, and if the voice recognition result includes the keyword "rear row reception", 2L and 2R as shown in fig. 3 are adjusted to the audio playing zone.
The realization mode adjusts the audio playing sound zone through voice control, realizes that users at any position can conveniently participate in the process of sound playback, does not need manual operation, and further improves the convenience of audio playing.
In some optional implementations, after step 402, the method may further include:
and responding to the information that the voice recognition result is the preset sound effect, determining the sound effect corresponding to the voice recognition result, and playing the sound effect.
As an example, if the voice recognition result includes "good stick to sing", sound effect audio of corresponding applause, cheering, and the like may be extracted and played.
This implementation mode plays corresponding audio through discernment pronunciation, can make the content of audio playback abundanter.
In some optional implementations, as shown in fig. 5, after step 402, the method may further include:
in response to determining that the speech recognition result is information indicating that the target audio zone is to be adjusted to the primary audio playing audio zone, step 404 adjusts the target audio zone to the primary audio playing audio zone and adjusts audio playing audio zones other than the target audio zone to the consonant audio playing audio zone.
The number of the audio playing sound zones in the step is at least two, wherein one of the audio playing sound zones is adjusted to be a main audio playing sound zone, and the other audio playing sound zones are auxiliary audio playing sound zones. As an example, if the voice recognition result includes "i want to sing mainly", the target sound zone where the voice signal is generated may be determined as the main audio playing sound zone.
And 405, suppressing the separated audio signals corresponding to the consonant audio playing sound zone to obtain suppressed audio signals.
As an example, the volume of the audio corresponding to the consonant audio playback zone may be reduced when played. Or, performing harmony processing on the audio signals collected by the main audio playing sound area and the auxiliary audio playing sound area, namely, taking the audio signals collected by the main audio playing sound area as a main melody, and taking the audio signals collected by the auxiliary audio playing sound area as a harmony part for mixed playing.
And step 406, mixing and playing the separated audio signal and the suppressed audio signal corresponding to the main audio playing sound zone.
This implementation mode confirms main audio frequency broadcast sound zone and consonant audio frequency broadcast sound zone through speech control to the audio signal after the separation that consonant audio frequency broadcast sound zone corresponds suppresses, can make the audio frequency of mixed broadcast highlight the sound of main audio frequency broadcast sound zone, and the user can distinguish the sound that main audio frequency broadcast sound zone gathered more clearly, and the user can adjust main and consonant audio frequency broadcast sound zone in a flexible way, has richened the mode of audio frequency broadcast control, has further improved the convenience of audio frequency broadcast.
In some optional implementations, as shown in fig. 6, after step 205, the method may further include:
in step 206, the current audio playback mode is determined.
Step 207, in response to determining that the audio playing mode is the third mode, scoring the separated audio signals respectively played in the at least one audio playing zone.
Wherein the third mode is a mode for scoring the played separated audio signal. As an example, if the user clicks a button on the screen indicating switching to the third mode (e.g., a button displaying "PK once"), or recognizes that the user's voice includes a keyword (e.g., keyword "PK", "race", etc.) indicating switching to the third mode, the current audio playing mode is switched to the third mode. And when detecting that the current audio playing mode is the third mode, the electronic equipment begins to score the played separated audio signals. The scoring method may adopt an existing method for scoring the audio, for example, in a singing scene, scoring is performed according to whether the sound frequency of the user is aligned with the reference frequency, whether the volume is appropriate, and the like.
Based on the score, a manager soundtrack is selected from the at least one audio playback soundtrack, step 208.
As an example, the audio playback zone corresponding to the highest score may be determined as the manager zone.
Step 209, obtain the voice signal corresponding to the administrator's voice zone.
Specifically, through the microphone array, the voice signals of the users in the administrator's range can be collected, while the voice signals of the users in the other ranges are shielded.
And step 210, performing voice interactive operation based on the voice signal corresponding to the sound zone of the manager.
Specifically, the voice signal corresponding to the manager's vocal range can be identified, and the process of audio playing can be controlled according to the voice identification result. Such as voice control updating audio playing ranges, selecting tracks to play, etc.
This implementation mode is through scoring the post-separation audio signal that each audio playback sound zone broadcast under the third mode, switches the administrator sound zone based on the result of scoring, can further make the process of audio playback abundanter, improves the degree of automation of switching the administrator sound zone.
Exemplary devices
Fig. 7 is a schematic structural diagram of an audio playing apparatus according to an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, as shown in fig. 7, the audio playing apparatus includes: a first determining module 701, configured to determine an audio playing sound zone from a preset number of sound zones in a target space; a first obtaining module 702, configured to obtain at least one original audio signal collected by a preset microphone array; a separation module 703, configured to perform signal separation on at least one original audio signal to obtain at least one separated audio signal; a second determining module 704, configured to determine sound regions corresponding to the at least one separated audio signal respectively; the control module 705 is configured to control an audio playing device in the target space to play the separated audio signal corresponding to the audio playing audio zone.
In this embodiment, the first determining module 701 may determine the audio playing range from a preset number of ranges in the target space. The target space may be various spaces such as in a car, in a room, etc., among others. The soundzone may be a plurality of regions into which the target space is artificially divided. For example, when the target sound zone is a vehicle interior space, the sound zone may be a space in which a driver's seat, a passenger seat, and seats on both sides of a rear seat are respectively located. As shown in fig. 3, the space where the four seats are located may be divided into corresponding sound zones, including 1L, 1R, 2L, and 2R.
The audio playing sound zone may be a sound zone in which sound emitted is collected and played. For example, the target space is a space in a vehicle, and the audio playing zone may be a space where a driver is located. The first determination module 701 may determine the audio playback zone based on various ways. For example, the audio playback zone may be determined according to an operation in which the user manually sets the audio playback zone. All the sound zones may also be determined as audio playing sound zones.
In one scenario, a passenger on a vehicle wants to sing, an audio playing zone can be selected by manipulating a touch screen on the vehicle, and in the subsequent step, the microphone array collects the sound of the passenger singing, and the sound of the passenger singing is played back by the audio playing device after the sound is processed in the subsequent step.
In this embodiment, the first obtaining module 702 may obtain at least one original audio signal collected by a preset microphone array. The microphone array (e.g., the microphone array 104 shown in fig. 1) is configured to collect sounds emitted in a target space to obtain at least one original audio signal, where each original audio signal corresponds to one microphone.
As an example, as shown in fig. 3, when the target space is a vehicle interior space, microphones a, b, c, d are respectively disposed beside the four seats, that is, the microphones a, b, c, d respectively collect audio signals of the four sound zones 1L, 1R, 2L, 2R.
In this embodiment, the separation module 703 may perform signal separation on at least one original audio signal to obtain at least one separated audio signal.
As an example, the separation module 703 may employ an existing blind source separation technique to perform signal separation on at least one original audio signal. The blind source separation is a process of recovering each independent component from a source signal under the condition that parameters of the source signal and a transmission channel are not known. Blind source separation can employ existing algorithms such as ICA (independent component Analysis).
And at least one separated audio signal is obtained after separation, wherein each separated audio signal can be determined as an audio signal collected from a certain sound zone.
Optionally, when performing signal separation on at least one original audio signal, at least one original audio signal may be preprocessed first, and the preprocessing method may use the prior art. For example, an audio signal collected by a microphone and a reference signal played by an audio playing device are obtained, the signals are used for carrying out adaptive acoustic feedback elimination on at least one path of original audio signal, the reference signal is used for filtering an acoustic propagation path through adaptive fitting, and sound played by the audio playing device collected by the microphone is filtered from the at least one path of original audio signal, so that the phenomenon of acoustic feedback and howling or dragging caused by the formation of acoustic feedback during sound reproduction is avoided. After the pretreatment, the obtained pretreated audio is separated, and the noise collected by the microphone array can be filtered.
Optionally, after signal separation is performed on at least one path of original audio signals, multi-channel noise reduction may be performed on the separated signals. Specifically, each separated signal and the rest of the signals may be used, and the rest of the signals may be used as a reference by using an adaptive filtering algorithm, and the signal characteristics of each signal may be used to adaptively filter out the noise and the voice residue in the non-corresponding sound region in the separated signals. Or a deep learning model can be adopted, each path of separated signals are input into the model through a pre-trained model, and finally pure separated audio signals are output.
In this embodiment, the second determining module 704 may determine sound regions corresponding to at least one of the separated audio signals. Specifically, each separated audio signal may not correspond to an actual sound zone in a one-to-one manner, and therefore, the separated audio signal and the original audio signal (or the audio signal after the preprocessing operation) need to be matched to determine the sound zone corresponding to each separated audio signal.
As an example, a similarity between each separated audio signal and each original audio signal (or the audio signal after the preprocessing operation) may be determined, for each separated audio signal, an original audio signal corresponding to the maximum similarity corresponding to the separated audio signal may be determined, and according to a microphone corresponding to the determined original audio signal, a sound zone of the separated audio signal may be determined.
It should be noted that the microphones and the sound zones may not be in one-to-one correspondence. For example, a microphone may be disposed between two sound zones, and after determining a microphone corresponding to a certain separated audio signal, other methods may be used to determine from which sound zone the microphone collects the separated audio signal. For example, a camera may be disposed in the target space, the camera may capture images of each sound zone, and perform lip motion recognition on the captured images, so as to determine which sound zone has a sound source in each sound zone corresponding to the same microphone, and further determine the sound zone corresponding to the separated audio signal. Or determining the position relation between the microphone and the sound source by utilizing the existing sound source positioning technology, thereby determining the sound zone where the sound source is located and further determining the sound zone corresponding to the separated audio signal.
In this embodiment, the control module 705 may control an audio playing device (e.g., the audio playing device 105 shown in fig. 1) in the target space to play the separated audio signal corresponding to the audio playing zone.
Specifically, after the sound zone corresponding to each separated audio signal is determined, the separated audio signal corresponding to the audio playing sound zone can be determined. The control module 705 may generate an instruction for instructing the audio playback device to play the separated audio signal, and the audio playback device plays the corresponding separated audio signal based on the instruction.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an audio playing apparatus according to another exemplary embodiment of the present disclosure.
In some optional implementations, the first determining module 701 may include: a first determining unit 7011, configured to determine a current audio playing mode; a second determining unit 7012, configured to determine an audio playing range from a preset number of ranges based on the audio playing mode.
In some optional implementations, the apparatus may further include: a second obtaining module 706, configured to obtain a voice signal for voice collection sent by a user; the recognition module 707 is configured to recognize a voice signal to obtain a voice recognition result; and an updating module 708, configured to update the audio playing soundzone from a preset number of soundzones based on the speech recognition result.
In some alternative implementations, the update module 708 may include: a third determining unit 7081 configured to determine a target sound zone indicated by the speech recognition result; the control unit 7082 is configured to, in response to determining that the target sound zone is an audio playing sound zone and the voice recognition result is information indicating that the playing of the separated audio signal corresponding to the target sound zone is stopped, control the audio playing device to stop playing the separated audio signal corresponding to the target sound zone.
In some optional implementations, the update module 708 further includes: a first adjusting unit 7083, configured to adjust the target sound zone to the audio playing sound zone in response to determining that the voice recognition result is information indicating that the target sound zone is adjusted to the audio playing sound zone.
In some optional implementations, the apparatus may further include: the first playing module 709 is configured to determine a sound effect corresponding to the voice recognition result in response to determining that the voice recognition result is information indicating that a preset sound effect is played, and play the sound effect.
In some optional implementations, the apparatus may further include: a second adjusting module 710, configured to adjust the target sound zone to the primary audio playing sound zone and adjust the audio playing sound zones other than the target sound zone to the secondary audio playing sound zone in response to determining that the voice recognition result is information indicating that the target sound zone is adjusted to the primary audio playing sound zone; the suppression module 711 is configured to suppress the separated audio signal corresponding to the consonant audio playing zone to obtain a suppressed audio signal; and a second playing module 712, configured to mix and play the separated audio signal and the suppressed audio signal corresponding to the main audio playing zone.
In some optional implementations, the apparatus may further include: a third determining module 713, configured to determine a current audio playing mode; a scoring module 714, configured to score the separated audio signals respectively played in the at least one audio playing zone in response to determining that the audio playing mode is the third mode; a selection module 715 for selecting a manager zone from the at least one audio playback zone based on the score; a third obtaining module 716, configured to obtain a voice signal corresponding to a sound zone of the administrator; and the interaction module 717 is used for performing voice interaction operation based on the voice signal corresponding to the administrator voice zone.
The audio playing device provided by the above embodiment of the present disclosure determines audio playing sound areas from a preset number of sound areas in a target space, then obtains at least one original audio signal collected by a microphone array, and performs signal separation on the at least one original audio signal to obtain at least one separated audio signal, then determines sound areas corresponding to the at least one separated audio signal, and finally controls an audio playing device to play the separated audio signals corresponding to the audio playing sound areas, so that the fixed microphone array is effectively utilized to collect and play audio signals sent out in a certain sound area, an individual microphone is not required to be set to collect audio signals, a user does not need to hold or move to a position of the individual microphone to complete audio collection and audio playing, hardware resources are saved, user operation is facilitated, and audio signals collected by other non-audio playing sound areas can be shielded during playing, the quality of audio playing is improved.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 9. The electronic device may be either or both of the terminal device 101 and the server 103 as shown in fig. 1, or a stand-alone device separate from them, which may communicate with the terminal device 101 and the server 103 to receive the collected input signals therefrom.
FIG. 9 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 9, the electronic device 900 includes one or more processors 901 and memory 902.
The processor 901 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.
Memory 902 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 901 to implement the audio playback methods of the various embodiments of the present disclosure above and/or other desired functions. Various contents such as an original audio signal may also be stored in the computer-readable storage medium.
In one example, the electronic device 900 may further include: an input device 903 and an output device 904, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the terminal device 101 or the server 103, the input device 903 may be a microphone, a mouse, a keyboard, or the like, for inputting an original audio signal, various instructions, or the like. When the electronic device is a stand-alone device, the input device 903 may be a communication network connector for receiving input original audio signals, various instructions, and the like from the terminal device 101 and the server 103.
The output device 904 may output various information including the separated audio signal to the outside. The output devices 904 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 900 relevant to the present disclosure are shown in fig. 9, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 900 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the audio playback method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the audio playback method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. An audio playback method, comprising:
determining audio playing sound zones from a preset number of sound zones in a target space;
acquiring at least one path of original audio signal acquired by a preset microphone array;
performing signal separation on the at least one path of original audio signal to obtain at least one path of separated audio signal;
determining the sound areas corresponding to the at least one path of separated audio signals respectively;
and controlling audio playing equipment in the target space to play the separated audio signals corresponding to the audio playing sound zone.
2. The method of claim 1, wherein said determining an audio playing zone from a preset number of zones within a target space comprises:
determining a current audio playing mode;
and determining an audio playing sound zone from the preset number of sound zones based on the audio playing mode.
3. The method of claim 1, wherein the method further comprises:
acquiring a voice signal for acquiring voice sent by a user;
recognizing the voice signal to obtain a voice recognition result;
and updating the audio playing sound zones from the preset number of sound zones based on the voice recognition result.
4. The method of claim 3, wherein said updating an audio playing soundzone from the preset number of soundzones based on the speech recognition result comprises:
determining a target sound zone indicated by the voice recognition result;
and controlling the audio playing equipment to stop playing the separated audio signal corresponding to the target sound zone in response to the fact that the target sound zone is determined to be an audio playing sound zone and the voice recognition result is information representing that the separated audio signal corresponding to the target sound zone stops playing.
5. The method of claim 3, wherein after the obtaining the speech recognition result, the method further comprises:
and responding to the information that the voice recognition result is the preset sound effect, determining the sound effect corresponding to the voice recognition result, and playing the sound effect.
6. The method of claim 3, wherein after the obtaining the speech recognition result, the method further comprises:
in response to determining that the voice recognition result is information indicating that the target sound zone is adjusted to a primary audio playing sound zone, adjusting the target sound zone to a primary audio playing sound zone, and adjusting audio playing sound zones other than the target sound zone to a secondary audio playing sound zone;
suppressing the separated audio signals corresponding to the auxiliary audio playing sound zone to obtain suppressed audio signals;
and mixing and playing the separated audio signal corresponding to the main audio playing sound zone and the suppressed audio signal.
7. The method of claim 1, wherein after said playing the separated audio signal corresponding to the audio playing soundzone, the method further comprises:
determining a current audio playing mode;
in response to determining that the audio playback mode is the third mode, scoring the separated audio signals respectively played in at least one audio playback zone;
selecting a manager soundzone from the at least one audio playback soundzone based on the score;
acquiring a voice signal corresponding to the sound zone of the manager;
and carrying out voice interactive operation based on the voice signal corresponding to the administrator voice zone.
8. An audio playback apparatus comprising:
the first determining module is used for determining audio playing sound zones from a preset number of sound zones in a target space;
the first acquisition module is used for acquiring at least one path of original audio signals acquired by a preset microphone array;
the separation module is used for carrying out signal separation on the at least one path of original audio signals to obtain at least one path of separated audio signals;
the second determining module is used for determining the sound zones corresponding to the at least one path of separated audio signals respectively;
and the control module is used for controlling the audio playing equipment in the target space to play the separated audio signals corresponding to the audio playing sound zone.
9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.
10. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1 to 7.
CN202111095336.7A 2021-09-17 2021-09-17 Audio playing method and device, computer readable storage medium and electronic equipment Pending CN113808611A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111095336.7A CN113808611A (en) 2021-09-17 2021-09-17 Audio playing method and device, computer readable storage medium and electronic equipment
PCT/CN2022/118396 WO2023040820A1 (en) 2021-09-17 2022-09-13 Audio playing method and apparatus, and computer-readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111095336.7A CN113808611A (en) 2021-09-17 2021-09-17 Audio playing method and device, computer readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113808611A true CN113808611A (en) 2021-12-17

Family

ID=78895850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111095336.7A Pending CN113808611A (en) 2021-09-17 2021-09-17 Audio playing method and device, computer readable storage medium and electronic equipment

Country Status (2)

Country Link
CN (1) CN113808611A (en)
WO (1) WO2023040820A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023040820A1 (en) * 2021-09-17 2023-03-23 深圳地平线机器人科技有限公司 Audio playing method and apparatus, and computer-readable storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637532A (en) * 2018-12-25 2019-04-16 百度在线网络技术(北京)有限公司 Audio recognition method, device, car-mounted terminal, vehicle and storage medium
CN109785819A (en) * 2018-12-22 2019-05-21 深圳唐恩科技有限公司 Correlating method, storage medium, microphone and the singing system of multiple microphones
CN109922290A (en) * 2018-12-27 2019-06-21 蔚来汽车有限公司 Audio-video synthetic method, device, system, equipment and vehicle for vehicle
CN112397065A (en) * 2020-11-04 2021-02-23 深圳地平线机器人科技有限公司 Voice interaction method and device, computer readable storage medium and electronic equipment
CN112468936A (en) * 2019-09-06 2021-03-09 雅马哈株式会社 Vehicle-mounted sound system and vehicle
CN113225716A (en) * 2021-04-19 2021-08-06 北京塞宾科技有限公司 Vehicle-mounted karaoke realization method, system, equipment and storage medium
CN113270082A (en) * 2020-02-14 2021-08-17 广州汽车集团股份有限公司 Vehicle-mounted KTV control method and device and vehicle-mounted intelligent networking terminal

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950525B2 (en) * 2001-10-12 2005-09-27 General Motors Corporation Automated system and method for automotive time-based audio verification
US10095470B2 (en) * 2016-02-22 2018-10-09 Sonos, Inc. Audio response playback
CN110996308B (en) * 2019-12-10 2024-03-08 歌尔股份有限公司 Sound playing device, control method thereof, control device thereof and readable storage medium
CN112435682B (en) * 2020-11-10 2024-04-16 广州小鹏汽车科技有限公司 Vehicle noise reduction system, method and device, vehicle and storage medium
CN113014983B (en) * 2021-03-08 2022-12-27 Oppo广东移动通信有限公司 Video playing method and device, storage medium and electronic equipment
CN113345401A (en) * 2021-05-31 2021-09-03 锐迪科微电子(上海)有限公司 Calibration method and device of active noise reduction system of wearable device, storage medium and terminal
CN113808611A (en) * 2021-09-17 2021-12-17 深圳地平线机器人科技有限公司 Audio playing method and device, computer readable storage medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785819A (en) * 2018-12-22 2019-05-21 深圳唐恩科技有限公司 Correlating method, storage medium, microphone and the singing system of multiple microphones
CN109637532A (en) * 2018-12-25 2019-04-16 百度在线网络技术(北京)有限公司 Audio recognition method, device, car-mounted terminal, vehicle and storage medium
CN109922290A (en) * 2018-12-27 2019-06-21 蔚来汽车有限公司 Audio-video synthetic method, device, system, equipment and vehicle for vehicle
CN112468936A (en) * 2019-09-06 2021-03-09 雅马哈株式会社 Vehicle-mounted sound system and vehicle
CN113270082A (en) * 2020-02-14 2021-08-17 广州汽车集团股份有限公司 Vehicle-mounted KTV control method and device and vehicle-mounted intelligent networking terminal
CN112397065A (en) * 2020-11-04 2021-02-23 深圳地平线机器人科技有限公司 Voice interaction method and device, computer readable storage medium and electronic equipment
CN113225716A (en) * 2021-04-19 2021-08-06 北京塞宾科技有限公司 Vehicle-mounted karaoke realization method, system, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023040820A1 (en) * 2021-09-17 2023-03-23 深圳地平线机器人科技有限公司 Audio playing method and apparatus, and computer-readable storage medium and electronic device

Also Published As

Publication number Publication date
WO2023040820A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
CN112118485B (en) Volume self-adaptive adjusting method, system, equipment and storage medium
CN112352441B (en) Enhanced environmental awareness system
US11915687B1 (en) Systems and methods for generating labeled data to facilitate configuration of network microphone devices
JP2020115206A (en) System and method
CN112687286A (en) Method and device for adjusting noise reduction model of audio equipment
JP7453712B2 (en) Audio reproduction method, device, computer readable storage medium and electronic equipment
US20170148438A1 (en) Input/output mode control for audio processing
US20230164509A1 (en) System and method for headphone equalization and room adjustment for binaural playback in augmented reality
CN111863020A (en) Voice signal processing method, device, equipment and storage medium
JP2024507916A (en) Audio signal processing method, device, electronic device, and computer program
WO2023040820A1 (en) Audio playing method and apparatus, and computer-readable storage medium and electronic device
JP6678315B2 (en) Voice reproduction method, voice interaction device, and voice interaction program
US20170206898A1 (en) Systems and methods for assisting automatic speech recognition
CN111696566B (en) Voice processing method, device and medium
CN111627417B (en) Voice playing method and device and electronic equipment
CN114734942A (en) Method and device for adjusting sound effect of vehicle-mounted sound equipment
KR102650763B1 (en) Psychoacoustic enhancement based on audio source directivity
CN110289010B (en) Sound collection method, device, equipment and computer storage medium
CN111696565B (en) Voice processing method, device and medium
CN111696564B (en) Voice processing method, device and medium
CN116320144B (en) Audio playing method, electronic equipment and readable storage medium
US20230267942A1 (en) Audio-visual hearing aid
CN110446142B (en) Audio information processing method, server, device, storage medium and client
CN114664294A (en) Audio data processing method and device and electronic equipment
Haeb-Umbach et al. Speech processing in the networked home environment-a view on the amigo project.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination