CN117854526A - Speech enhancement method, device, electronic equipment and computer readable storage medium - Google Patents

Speech enhancement method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN117854526A
CN117854526A CN202410264048.7A CN202410264048A CN117854526A CN 117854526 A CN117854526 A CN 117854526A CN 202410264048 A CN202410264048 A CN 202410264048A CN 117854526 A CN117854526 A CN 117854526A
Authority
CN
China
Prior art keywords
voice
voice acquisition
acquisition equipment
user
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410264048.7A
Other languages
Chinese (zh)
Other versions
CN117854526B (en
Inventor
黄润乾
陈东鹏
张伟彬
李亚桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Voiceai Technologies Co ltd
Original Assignee
Voiceai Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voiceai Technologies Co ltd filed Critical Voiceai Technologies Co ltd
Priority to CN202410264048.7A priority Critical patent/CN117854526B/en
Publication of CN117854526A publication Critical patent/CN117854526A/en
Application granted granted Critical
Publication of CN117854526B publication Critical patent/CN117854526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a voice enhancement method, a device, an electronic device and a computer readable storage medium, which are applied to the electronic device in a voice enhancement system comprising M voice acquisition devices, and comprise the following steps: acquiring voice information acquired by N voice acquisition devices in M voice acquisition devices, determining K users, K voiceprints and K voice acquisition device groups according to the voice information and the N voice acquisition devices, if the voice acquisition device groups with the number of the voice acquisition devices being larger than 1 exist in the K voice acquisition device groups, determining target voice acquisition device arrays of L users according to the L voiceprints and the L voice acquisition device groups, and performing multi-channel voice enhancement on the voice information acquired by the target voice acquisition device arrays according to the positions of each voice acquisition device in the L voiceprints and the target voice acquisition device arrays to obtain target voice information of the L users. In the embodiment of the application, the multi-channel voice enhancement is performed, so that the voice enhancement effect can be improved.

Description

Speech enhancement method, device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of speech technology, and in particular, to a speech enhancement method, apparatus, electronic device, and computer readable storage medium.
Background
Speech enhancement refers to a technique for extracting useful speech signals from noise background, suppressing and reducing noise interference when speech signals are disturbed or even submerged by various kinds of noise. The voice enhancement is very dependent on the voice acquisition equipment, and the good voice acquisition equipment can greatly reduce the difficulty of voice enhancement, so that the effect of voice enhancement is improved. Under the condition that the voice acquisition equipment is fixed, how to improve the voice enhancement effect becomes a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the application discloses a voice enhancement method, a voice enhancement device, electronic equipment and a computer readable storage medium, which are used for improving the voice enhancement effect.
In a first aspect, an embodiment of the present application discloses a method for enhancing voice, where the method is applied to an electronic device in a voice enhancement system, where the voice enhancement system further includes M voice acquisition devices, where the M voice acquisition devices are disposed at different positions, and M is an integer greater than 1, and the method includes:
Acquiring voice information acquired by N voice acquisition devices, wherein the N voice acquisition devices are voice acquisition devices in an awake state in the M voice acquisition devices, and N is an integer greater than 0 and less than or equal to M;
determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information and the N voice acquisition equipment, wherein the K users are in one-to-one correspondence with the K voiceprints, the K users are in one-to-one correspondence with the K voice acquisition equipment groups, and K is an integer greater than 1;
under the condition that voice acquisition equipment groups with the number of voice acquisition equipment larger than 1 exist in the K voice acquisition equipment groups, determining target voice acquisition equipment arrays of L users according to L voice prints and L voice acquisition equipment groups, wherein the L voice acquisition equipment groups are voice acquisition equipment groups with the number of voice acquisition equipment larger than 1 in the K voice acquisition equipment groups, the L users are users corresponding to the L voice acquisition equipment groups, the L voice prints are voice prints corresponding to the L voice acquisition equipment groups, and L is an integer larger than 0 and smaller than or equal to K;
and carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of the L users.
In a second aspect, an embodiment of the present application discloses a voice enhancement device, where the device is applied to an electronic device in a voice enhancement system, where the voice enhancement system further includes M voice acquisition devices, where the M voice acquisition devices are disposed at different positions, and M is an integer greater than 1, and includes:
the voice acquisition unit is used for acquiring voice information acquired by N voice acquisition devices, wherein the N voice acquisition devices are voice acquisition devices in an awakening state in the M voice acquisition devices, and N is an integer which is more than 0 and less than or equal to M;
the first determining unit is used for determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information and the N voice acquisition equipment, wherein the K users are in one-to-one correspondence with the K voiceprints, the K users are in one-to-one correspondence with the K voice acquisition equipment groups, and K is an integer larger than 1;
the second determining unit is configured to determine, according to L voiceprints and L voice acquisition device groups, a target voice acquisition device array of L users when the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, where the L users are voice acquisition device groups with the number of voice acquisition devices in the K voice acquisition device groups being greater than 1, the L voiceprints are voiceprints corresponding to the L voice acquisition device groups, and L is an integer greater than 0 and less than or equal to K;
And the voice enhancement unit is used for carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of the L users.
As a possible implementation manner, the first determining unit is specifically configured to:
determining the users corresponding to the voice signals included in the voice information to obtain K users;
determining voice acquisition equipment which is respectively awakened by the K users in the N voice acquisition equipment to obtain K voice acquisition equipment groups;
and respectively extracting voiceprints of the K users from the voice information to obtain K voiceprints.
As a possible implementation manner, the second determining unit is specifically configured to:
determining a first voice acquisition device and a second voice acquisition device in a first voice acquisition device group, wherein the first voice acquisition device group is any voice acquisition device group in the L voice acquisition device groups;
determining the position of a first user according to a first voiceprint, the position of the first voice acquisition equipment and the position of the second voice acquisition equipment, wherein the first voiceprint is the voiceprint corresponding to the first voice acquisition equipment group, and the first user is the user corresponding to the first voice acquisition equipment group;
Determining a voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user to obtain target voice acquisition device arrays of L users, wherein the target voice acquisition device arrays of the first user comprise the first voice acquisition device and the second voice acquisition device.
As a possible implementation manner, the second determining unit determines that the first voice capturing device and the second voice capturing device in the first voice capturing device group include:
under the condition that the voice acquisition equipment independently awakened by the first user exists in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the voice acquisition equipment independently awakened by the first user as first voice acquisition equipment;
under the condition that the voice acquisition equipment which is independently awakened by the first user does not exist in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the first voice acquisition equipment group as first voice acquisition equipment;
and determining the voice acquisition equipment with the smallest distance between the voice acquisition equipment and the first voice acquisition equipment in the first voice acquisition equipment group as second voice acquisition equipment.
As a possible implementation manner, the second determining unit determining the location of the first user according to the first voiceprint, the location of the first voice capturing device, and the location of the second voice capturing device includes:
acquiring first voice information acquired by the first voice acquisition equipment;
acquiring second voice information acquired by the second voice acquisition equipment;
extracting the voice signal of the first user from the first voice information according to the first voiceprint to obtain a first voice signal;
extracting the voice signal of the first user from the second voice information according to the first voiceprint to obtain a second voice signal;
and determining the position of the first user according to the first voice signal, the second voice signal, the position of the first voice acquisition device and the position of the second voice acquisition device.
As a possible implementation manner, the second determining unit determines the voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user, and obtaining the target voice acquisition device arrays of the L users includes:
Determining an initial voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user to obtain initial voice acquisition device arrays of L users;
under the condition that the initial voice acquisition equipment arrays of the L users are overlapped, performing de-duplication processing and/or equalization processing on the initial voice acquisition equipment arrays of the L users to obtain target voice acquisition equipment arrays of the L users;
and under the condition that no overlap exists among the initial voice acquisition equipment arrays of the L users, determining the initial voice acquisition equipment arrays of the L users as target voice acquisition equipment arrays of the L users.
As a possible implementation manner, the voice enhancement unit is further configured to perform single-channel voice enhancement on the voice information acquired by the third voice acquisition device to obtain target voice information of a second user when a voice acquisition device group with the number equal to 1 exists in the K voice acquisition device groups, and the third voice acquisition device is a voice acquisition device in any voice acquisition device group with the number equal to 1 in the K voice acquisition device groups, where the second user is a user corresponding to the third voice acquisition device.
As a possible implementation manner, the voice enhancement unit is further configured to, in a case where a voice collection device group with the number of voice collection devices equal to 1 exists in the K voice collection device groups, and the third voice collection device includes a plurality of microphones, perform multi-channel voice enhancement on voice information collected by the plurality of microphones, and obtain target voice information of the second user.
In a third aspect, embodiments of the present application disclose an electronic device comprising a processor and a memory, the processor invoking a computer program stored in the memory to perform the method disclosed in the first aspect.
In a fourth aspect, embodiments of the present application disclose a computer readable storage medium having stored thereon a computer program or computer instructions which, when executed by a processor, implement a method as disclosed in the first aspect above.
In a fifth aspect, embodiments of the present application disclose a computer program product comprising computer program code which, when executed by a processor, causes the above-mentioned method to be performed.
In the embodiment of the application, an electronic device in a voice enhancement system acquires voice information acquired by N voice acquisition devices in an awake state in M voice acquisition devices, determines K users, K voiceprints and K voice acquisition device groups according to the voice information and the N voice acquisition devices, determines target voice acquisition device arrays of L users according to the L voiceprints and the L voice acquisition device groups under the condition that the number of the voice acquisition devices in the K voice acquisition device groups is greater than 1, and carries out multichannel voice enhancement on the voice information acquired by the target voice acquisition device arrays according to the positions of each voice acquisition device in the L voiceprints and the target voice acquisition device arrays to obtain target voice information of the L users. Therefore, under the condition that a plurality of voice acquisition devices are awakened by a user, the voice acquisition device array for acquiring the voice information of the user can be determined, then the voice information acquired by the voice acquisition device array is subjected to multi-channel voice enhancement, the voice information of the user can be acquired by the voice acquisition devices, and then the voice information acquired by the voice devices is subjected to multi-channel voice enhancement, so that the effect of voice enhancement can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a network architecture as disclosed in an embodiment of the present application;
FIG. 2 is a flow chart of a speech enhancement method disclosed in an embodiment of the present application;
FIG. 3 is a schematic diagram of the locations of 4 voice acquisition devices and a user according to an embodiment of the present application;
FIG. 4 is a flow chart of another speech enhancement method disclosed in an embodiment of the present application;
FIG. 5 is a schematic diagram of a speech enhancement apparatus according to an embodiment of the present disclosure;
fig. 6 is a structural view of an electronic device disclosed in an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.
The embodiment of the application discloses a voice enhancement method, a voice enhancement device, electronic equipment and a computer readable storage medium, which are used for improving the voice enhancement effect. The following will describe in detail.
For a better understanding of the embodiments of the present application, the related art will be described first.
The voice enhancement method is very dependent on voice acquisition equipment, and good voice acquisition equipment can greatly reduce the difficulty of voice enhancement, so that the effect of voice enhancement is improved. Microphone arrays are particularly effective in speech acquisition devices, while most of the devices currently supporting speech acquisition functions are typically single microphones, so speech enhancement methods relying on them are often quite limited in effectiveness.
In order to solve the problems, the application discloses a voice enhancement method, electronic equipment in a voice enhancement system acquires voice information acquired by N voice acquisition equipment in an awake state in M voice acquisition equipment, determines K users, K voiceprints and K voice acquisition equipment groups according to the voice information and the N voice acquisition equipment, determines target voice acquisition equipment arrays of L users according to the L voiceprints and the L voice acquisition equipment groups under the condition that the voice acquisition equipment groups with the number of the voice acquisition equipment groups being greater than 1 exist in the K voice acquisition equipment groups, and carries out multichannel voice enhancement on the voice information acquired by the target voice acquisition equipment arrays according to the positions of each voice acquisition equipment in the L voiceprints and the target voice acquisition equipment arrays to obtain target voice information of the L users. Therefore, under the condition that a plurality of voice acquisition devices are awakened by a user, the voice acquisition device array for acquiring the voice information of the user can be determined, then the voice information acquired by the voice acquisition device array is subjected to multi-channel voice enhancement, the voice information of the user can be acquired by the voice acquisition devices, and then the voice information acquired by the voice devices is subjected to multi-channel voice enhancement, so that the effect of voice enhancement can be improved.
For a better understanding of the embodiments of the present application, the network architecture of the present application is first described below.
Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include an electronic device 101 and a plurality of voice capture devices 102.
The electronic device 101 may be connected to a plurality of voice capture devices 102 via a network. The network may be a wide area network, a local area network, or a combination of both.
The plurality of voice acquisition devices 102 may be interconnected by wifi, bluetooth, zigbee, or the like.
The voice acquisition device 102 may acquire voice information and may transmit the acquired voice information to the electronic device 101. Accordingly, the electronic device 101 may receive voice information from the voice capture device 102, and may thereafter process the voice information.
The voice capture device 102 may also send status information to the electronic device 101 in the event of a change in status. Accordingly, the electronic device 101 may also receive status information from the voice capture device 102, and may store up-to-date status information of the electronic device 101. In addition, the electronic device 101 may delete the stored historical state information of the electronic device 101, so as to ensure that the state of the voice capturing device 102 stored in the electronic device 101 is the current state of the voice capturing device 102, that is, the latest state.
The state of the voice acquisition device 102 may be either an awake state or a non-awake state. The non-awake state may be a sleep state or an off state. In the case where the voice acquisition device 102 is in the off state, the voice acquisition device 102 cannot acquire voice information, and thus, the non-awake state in the following embodiment is the sleep state. When the voice acquisition device 102 is in the sleep state, voice information can be normally acquired, but some unnecessary functional modules are in the sleep state, so that the power consumption of the voice acquisition device 102 can be reduced.
The electronic device 101 may be a terminal device having a communication and data processing function, a server, a gateway, or other devices having a communication and data processing function.
The voice acquisition device 102 is a device having a voice acquisition function. The speech acquisition device 102 may be a single microphone device, i.e., a device having only one microphone; a multi-microphone device, i.e. a device with multiple microphones, is also possible.
Based on the above network architecture, please refer to fig. 2, fig. 2 is a flow chart of a voice enhancement method disclosed in an embodiment of the present application. The voice enhancement method can be applied to electronic equipment in a voice enhancement system, and the voice enhancement system can further comprise M voice acquisition devices, wherein the M voice acquisition devices are arranged at different positions, namely the positions of the M voice acquisition devices are fixed and different. M is an integer greater than 1. As shown in fig. 2, the speech enhancement method may include the following steps.
201. And acquiring voice information acquired by the N voice acquisition devices.
Each of the M voice capture devices may capture voice information of the user in real time. After the voice information of the user is collected by the voice collection device, whether the current state is the wake-up state or not can be judged, and the collected voice information can be directly sent to the electronic device under the condition that the current state is the wake-up state.
Under the condition that the current state is judged to be the non-awakening state, the voice acquisition equipment can switch the state from the non-awakening state to the awakening state, and then the acquired voice information and the awakening information can be sent to the electronic equipment. The wake-up information may include wake-up time information. The wake-up time information is information of the time when the voice acquisition equipment enters the wake-up state, namely information of the wake-up time of the voice acquisition equipment. In one case, the wake time information may indicate that the state of the voice acquisition device is a wake state. In another case, the wake information may further include wake indication information, where the wake indication information may indicate that the state of the voice acquisition device is a wake state. After the electronic device receives the voice information and the wake-up information from the voice acquisition device, the state of the voice acquisition device can be updated from a non-wake-up state to a wake-up state according to the wake-up time information or the wake-up indication information, and the wake-up time of the voice acquisition device can be stored for subsequent calling.
In addition, the voice acquisition device may also send an identification of the voice acquisition device to the electronic device, so that the electronic device may determine from the identification of the voice acquisition device which voice information was acquired by. Further, the voice acquisition device may also send the acquisition time information of the voice information to the electronic device, so that the electronic device may determine the acquisition time of the voice information according to the acquisition time information of the voice information. The electronic device may store the time of collection of the voice information for later recall.
Under the condition that the voice information of the user is not collected by the voice collecting device within the preset time, the voice collecting device indicates that the user does not speak, the non-wake-up indication information can be sent to the electronic device, and then the state can be switched from the wake-up state to the non-wake-up state. The non-wake indication information may indicate that the state of the voice acquisition device is a non-wake state. After the electronic device receives the non-wake-up indication information from the voice acquisition device, the state of the voice acquisition device can be updated from the wake-up state to the non-wake-up state according to the non-wake-up indication information. In addition, the voice acquisition device may also send the identification of the voice acquisition device to the electronic device, so that the electronic device may determine which voice acquisition device is in a non-awake state according to the identification of the voice acquisition device.
The voice acquisition device may send the above information to the electronic device in the form of a data packet or message.
The N voice acquisition devices are all voice acquisition devices in an awake state in the M voice acquisition devices, and N is an integer greater than 0 and less than or equal to M. It can be seen that the number of voice acquisition devices in the awake state in the M voice acquisition devices may be one or more.
It should be understood that the number of voice acquisition devices in the awake state among the M voice acquisition devices may also be 0. In the case where the number of voice acquisition devices in the awake state is 0, the electronic device does not perform the subsequent steps, and thus, consideration is not given to this case.
The electronic device may obtain the voice information collected by the N voice collecting devices in real time or periodically. The acquisition of the voice information acquired by the voice acquisition device may be understood as locally acquiring the voice information acquired by the voice acquisition device, or may be understood as receiving the voice information from the voice acquisition device.
202. And determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information acquired by the N voice acquisition equipment and the N voice acquisition equipment.
K users are in one-to-one correspondence with K voiceprints, K users are in one-to-one correspondence with K voice acquisition equipment groups, and K is an integer greater than 1.
It should be understood that K may also be 1, and the present application does not consider the case where K is 1, but only considers the case where K is greater than 1.
After the voice information acquired by the N voice acquisition devices is acquired, the users corresponding to the voice signals included in the voice information acquired by the N voice acquisition devices can be determined, and K users are obtained. The voice information collected by the N voice collection devices can be subjected to voice separation and voice segmentation, and the number of speaking users is counted to obtain K users.
The voice acquisition equipment which is respectively awakened by K users in the N voice acquisition equipment can be determined, and K voice acquisition equipment groups are obtained. All voice acquisition devices of the N voice acquisition devices which are awakened by any user can be determined first, and then the voice acquisition device group corresponding to the user can be determined according to all voice acquisition devices which are awakened by the user. The voice acquisition device group may include one voice acquisition device or may include a plurality of voice acquisition devices.
The voice acquisition device awakened by a user can be understood as that the voice acquisition device acquires the voice information of the user within a preset duration, that is, the time interval between the time when the voice acquisition device last acquired the voice information of the user and the current time is smaller than the preset duration. Under the condition that the time interval between the time when the voice acquisition device acquires the voice information of the user last time and the current time is longer than the preset duration, even if the voice acquisition device is in the awakening state, the voice acquisition device can be considered to be not the voice acquisition device awakened by the user.
For example, the voice acquisition device 1 acquires the voice information of the user 1 at 13:18:00, and the voice acquisition device 1 acquires the voice information of the user 2 at 13:28:00, and the preset time period is 30s. Assuming that the current time is 13:50:00, the voice acquisition device 1 is in the wake-up state, since the time interval between 13:50:00 and 13:18:00 is 32s, which is greater than 30s, the voice acquisition device 1 is a voice acquisition device that is woken up by the user 2, and is not a voice acquisition device that is woken up by the user 1. Assuming that the current time is 13:40:00, the voice acquisition device 1 is in a wake-up state, and the voice acquisition device 1 is a voice acquisition device which is woken up by the user 1 and the user 2.
The voiceprints corresponding to the K users can be respectively extracted from the voice information acquired by the N voice acquisition devices, so that the K voiceprints are obtained. Specifically, voice print extraction can be performed on voice signals of K users obtained through voice separation and voice segmentation, so as to obtain K voice prints of K users.
203. Under the condition that the number of voice acquisition equipment in the K voice acquisition equipment groups is greater than 1, determining target voice acquisition equipment arrays of L users according to the L voiceprints and the L voice acquisition equipment groups.
The L voice acquisition equipment groups are voice acquisition equipment groups with the number of voice acquisition equipment in the K voice acquisition equipment groups being greater than 1, the L users are users corresponding to the L voice acquisition equipment groups, the L voiceprints are voiceprints corresponding to the L voice acquisition equipment groups, and the L is an integer greater than 0 and less than or equal to K.
After determining the K voice acquisition equipment groups, judging whether the voice acquisition equipment groups with the number of the voice acquisition equipment larger than 1 exist in the K voice acquisition equipment groups, under the condition that the voice acquisition equipment groups with the number of the voice acquisition equipment larger than 1 exist in the K voice acquisition equipment groups, indicating that the voice of the user existing in the K users is acquired by the voice acquisition equipment groups, determining all the voice acquisition equipment groups with the number of the voice acquisition equipment larger than 1 in the K voice acquisition equipment groups to obtain L voice acquisition equipment groups, and determining target voice acquisition equipment arrays of the L users according to the L voice prints and the L voice acquisition equipment groups. L users are users corresponding to L voice acquisition equipment groups, L voiceprints are voiceprints corresponding to the L voice acquisition equipment groups, and L is an integer greater than 0 and less than or equal to K.
The first voice acquisition device and the second voice acquisition device in the first voice acquisition device group can be determined first, then the position of the first user can be determined according to the first voiceprint, the position of the first voice acquisition device and the position of the second voice acquisition device, and then the voice acquisition device array of the first user can be determined according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user, so that the target voice acquisition device arrays of L users can be obtained. The target voice acquisition device array of the first user comprises a first voice acquisition device and a second voice acquisition device. The first voice acquisition equipment group is any voice acquisition equipment group in the L voice acquisition equipment groups. The first voiceprint is a voiceprint corresponding to the first voice acquisition equipment group. The first user is a user corresponding to the first voice acquisition equipment group.
The method can judge whether the voice acquisition equipment independently awakened by the first user exists in the first voice acquisition equipment group, and can determine the voice acquisition equipment with earliest awakening time in the voice acquisition equipment independently awakened by the first user in the first voice acquisition equipment group as the first voice acquisition equipment under the condition that the voice acquisition equipment independently awakened by the first user exists in the first voice acquisition equipment group. Under the condition that the voice acquisition equipment which is independently awakened by the first user does not exist in the first voice acquisition equipment group, the voice acquisition equipment with earliest awakening time in the first voice acquisition equipment group can be determined to be the first voice acquisition equipment. Therefore, the voice acquisition equipment which is independently awakened by the first user can be preferentially determined as the first voice acquisition equipment, the mutual influence among voices of different users can be avoided, and the voice enhancement effect can be improved.
The voice acquisition devices in the first voice acquisition device group can be firstly ordered according to the order of the wake-up time from the early to the late to obtain a first order, then whether the voice acquisition devices independently wake-up by the first user exist in the first voice acquisition device group can be judged, and under the condition that the voice acquisition devices independently wake-up by the first user do not exist in the first voice acquisition device group, the voice acquisition device at the forefront in the first order can be determined to be the first voice acquisition device. Under the condition that the voice acquisition equipment independently awakened by the first user exists in the first voice acquisition equipment group, the voice acquisition equipment independently awakened by the first user in the first ordering can be adjusted to be in front of the voice acquisition equipment which is not independently awakened by the first user according to the sequence of the awakening time, a second ordering is obtained, and then the voice acquisition equipment at the forefront in the second ordering can be determined to be the first voice acquisition equipment.
For example, assuming that L is 2, the voice acquisition device group 1 includes a voice acquisition device 1, a voice acquisition device 2, and a voice acquisition device 3, the voice acquisition device group 2 includes a voice acquisition device 3, a voice acquisition device 4, and a voice acquisition device 5, and the time when the voice acquisition device 1, the voice acquisition device 2, and the voice acquisition device 3 are awakened by the user 1 is 13:18:10, 13:18:08, and 13:18:09, respectively, and the time when the voice acquisition device 3, the voice acquisition device 4, and the voice acquisition device 5 are awakened by the user 2 is 13:18:15, 13:18:14, and 13:18:13, respectively. The voice acquisition equipment 1, the voice acquisition equipment 2 and the voice acquisition equipment 3 are ordered according to the wake-up time from the morning to the evening, the obtained ordering 1 is the voice acquisition equipment 2, the voice acquisition equipment 3 and the voice acquisition equipment 1, and as the voice acquisition equipment 1 and the voice acquisition equipment 2 are independently awakened by the user 1, the ordering 2 after the ordering 1 is adjusted is the voice acquisition equipment 2, the voice acquisition equipment 1 and the voice acquisition equipment 3, and the voice acquisition equipment 2 can be determined as a first voice acquisition equipment in the voice acquisition equipment group 1. The voice acquisition device 3, the voice acquisition device 4 and the voice acquisition device 5 are ordered according to the wake-up time from the morning to the evening, the obtained ordering 3 is the voice acquisition device 5, the voice acquisition device 4 and the voice acquisition device 3, the voice acquisition device 4 and the voice acquisition device 5 are independently awakened by the user 2, but the voice acquisition device 4 and the voice acquisition device 5 are in front of the voice acquisition device 3, so that the ordering table 3 pair can be directly determined as the final ordering table 4, and the voice acquisition device 5 can be determined as the first voice acquisition device in the voice acquisition device group 2.
It should be appreciated that the foregoing is an exemplary illustration of the manner in which the first speech acquisition device is determined, and is not limiting.
And then determining the distance between the first voice acquisition equipment and the voice acquisition equipment except the first voice acquisition equipment in the first voice acquisition equipment group according to the positions of the voice acquisition equipment in the first voice acquisition equipment group, and determining the voice acquisition equipment with the minimum distance between the first voice acquisition equipment and the first voice acquisition equipment in the first voice acquisition equipment group as the second voice acquisition equipment.
After the first voice acquisition equipment and the second voice acquisition equipment are determined, the first voice information which is acquired by the first voice acquisition equipment and the second voice information which is acquired by the second voice acquisition equipment can be acquired. Under the condition that the first voice acquisition equipment is independently awakened by the first user, the voice signal in the first voice information can be determined to be the voice signal of the first user, and the first voice signal is obtained. Under the condition that the first voice acquisition equipment is not independently awakened by the first user, the voice signal of the first user can be extracted from the first voice information according to the first voiceprint, and the first voice signal is obtained. Similarly, under the condition that the second voice acquisition device is independently awakened by the first user, the voice signal in the second voice information can be determined to be the voice signal of the first user, so that the second voice signal is obtained. Under the condition that the second voice acquisition equipment is not independently awakened by the first user, the voice signal of the first user can be extracted from the second voice information according to the first voiceprint, and the second voice signal is obtained. The location of the user may then be determined from the first voice signal, the second voice signal, the location of the first voice acquisition device, and the location of the second voice acquisition device. The position of the first user can be determined by using a voice positioning algorithm, namely the first voice signal, the second voice signal, the position of the first voice acquisition device and the position of the second voice acquisition device can be input into the voice positioning algorithm, and the output of the voice positioning algorithm is the position of the first user.
In this case, the location of the first user may be stored directly, after which subsequent steps may be performed.
In another case, it may be determined first whether the determined position of the first user is the same as the stored position of the first user, and if it is determined that the determined position of the first user is the same as the stored position of the first user, it indicates that the position of the first user is unchanged, and the subsequent steps may not be performed. In case it is determined that the determined position of the first user is different from the stored position of the first user, it is indicated that the position of the first user has changed, and a subsequent step may be performed. In addition, the stored location of the first user may also be updated based on the determined location of the first user. Therefore, the target voice acquisition device array of the first user is redetermined under the condition that the position of the first user is changed, and the target voice acquisition device array of the first user can not be redetermined under the condition that the position of the first user is not changed, so that unnecessary processing procedures can be reduced, and the power consumption of the electronic device can be reduced and the processing resources of the electronic device can be saved.
In another case, it may be determined whether to process the voice information collected by the M voice collecting devices for the first time, and if it is determined that the voice information collected by the M voice collecting devices is processed for the first time, the location of the first user may be directly stored, and then the subsequent steps may be performed. Under the condition that the voice information acquired by the M voice acquisition devices is not processed for the first time, whether the position of the first user stored in the determined position of the first user is the same or not can be judged, and under the condition that the position of the first user is judged to be the same as the stored position of the first user, no change occurs to the position of the first user, and the follow-up steps can be omitted. In case it is determined that the determined position of the first user is different from the stored position of the first user, it is indicated that the position of the first user has changed, and a subsequent step may be performed. In addition, the stored location of the first user may also be updated based on the determined location of the first user.
After determining the position of the first user, the initial voice acquisition device array of the first user can be determined according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user, so as to obtain the initial voice acquisition device arrays of L users.
The voice acquisition device array is an array formed by a plurality of voice acquisition devices. The voice acquisition device array at least comprises a first voice acquisition device and a second voice acquisition device. The voice acquisition device array includes a number of voice acquisition devices greater than or equal to 2 and less than or equal to N.
The number of the voice acquisition devices in the first voice acquisition device group can be judged to be 2, and the first voice acquisition device and the second voice acquisition device can be directly determined to be an initial voice acquisition device array of the first user under the condition that the number of the voice acquisition devices in the first voice acquisition device group is judged to be 2. And under the condition that the number of the voice acquisition devices in the first voice acquisition device group is not 2, namely more than 2, determining an initial voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user.
The first area can be determined according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the fourth voice acquisition device, and the area surrounded by the first voice acquisition device, the second voice acquisition device and the fourth voice acquisition device can be determined as the first area. The fourth voice acquisition device is one voice acquisition device in the first voice acquisition device group except the first voice acquisition device and the second voice acquisition device.
And then judging whether the first user is outside the first area according to the position of the first user, determining the first array according to the first voice acquisition device, the second voice acquisition device and the fourth voice acquisition device when the first user is outside the first area, continuously judging whether the first array comprises all the voice acquisition devices in the first voice acquisition device group, and determining the first array as an initial voice acquisition device array of the first user when the first array comprises all the voice acquisition devices in the first voice acquisition device group.
Under the condition that the first user is in the first area, whether other voice collecting devices except the first voice collecting device, the second voice collecting device and the fourth voice collecting device exist in the first voice collecting device or not can be judged, and under the condition that no other voice collecting devices except the first voice collecting device, the second voice collecting device and the fourth voice collecting device exist in the first voice collecting device, the first voice collecting device and the second voice collecting device can be determined to be an initial voice collecting device array of the first user. And under the condition that the first voice acquisition equipment is judged to have other voice acquisition equipment besides the first voice acquisition equipment, the second voice acquisition equipment and the fourth voice acquisition equipment, the second area can be determined according to the position of the first voice acquisition equipment, the position of the second voice acquisition equipment and the position of the fifth voice acquisition equipment. The fifth voice acquisition device is one voice acquisition device except the first voice acquisition device, the second voice acquisition device and the fourth voice acquisition device in the first voice acquisition device. The processing situation of determining the second area is the same as the processing situation of determining the first area, and the detailed description may refer to the processing situation of determining the first area, which is not described herein.
In the case that it is determined that the first array does not include all of the voice capture devices in the first voice capture device, the third region may be determined according to the position of the first voice capture device, the position of the second voice capture device, the position of the fourth voice capture device, and the position of the fifth voice capture device. The processing situation of determining the third area is the same as the processing situation of determining the first area, and the detailed description may refer to the processing situation of determining the first area, which is not described herein.
The fourth voice capture device may be any voice capture device in the first voice capture device group other than the first voice capture device and the second voice capture device. The fourth voice acquisition device may be a voice acquisition device with the smallest distance between the first voice acquisition device and/or the second voice acquisition device except the first voice acquisition device and the second voice acquisition device in the first voice acquisition device group, and the determination from near to far based on the first voice acquisition device and the second voice acquisition device may be realized.
For example, it is assumed that the first voice capturing device group includes 4 voice capturing devices, please refer to fig. 3, and fig. 3 is a schematic diagram of the positions of the 4 voice capturing devices and the user disclosed in the embodiments of the present application. As shown in fig. 3, the 4 voice acquisition devices are a voice acquisition device 1, a voice acquisition device 2, a voice acquisition device 3 and a voice acquisition device 4, respectively, wherein the first voice acquisition device is the voice acquisition device 1, and the second voice acquisition device is the voice acquisition device 2. The area 1 may be determined according to the voice capturing device 1, the voice capturing device 2 and the voice capturing device 3, the user may be outside the area 1, and further the area 2 may be determined according to the voice capturing device 1, the voice capturing device 2, the voice capturing device 3 and the voice capturing device 4, and the user may be within the area 2, so that the voice capturing device 1, the voice capturing device 2 and the voice capturing device 3 may be determined as an initial voice capturing device array of the first user. The area 3 may also be determined according to the voice capturing device 1, the voice capturing device 2 and the voice capturing device 4, the user may be outside the area 3, and further the area 2 may be determined according to the voice capturing device 1, the voice capturing device 2, the voice capturing device 3 and the voice capturing device 4, and the user may be within the area 2, so that the voice capturing device 1, the voice capturing device 2 and the voice capturing device 4 may be determined as an initial voice capturing device array of the first user.
It can be seen that the manner in which the fourth speech acquisition device is determined is different, and the initial array of speech acquisition devices for the determined first user may be different.
After the initial voice acquisition device arrays of the L users are obtained, whether the initial voice acquisition device arrays of the L users are overlapped or not can be judged, and the initial voice acquisition device arrays of the L users can be determined to be target voice acquisition device arrays of the L users under the condition that the initial voice acquisition device arrays of the L users are not overlapped. Under the condition that the initial voice acquisition equipment arrays of the L users are overlapped, the initial voice acquisition equipment arrays of the L users can be subjected to de-duplication processing and/or equalization processing, namely, the initial voice acquisition equipment arrays with the overlapped initial voice acquisition equipment arrays in the initial voice acquisition equipment arrays of the L users are subjected to de-duplication processing and/or equalization processing, so that the target voice acquisition equipment arrays of the L users are obtained. The interaction between different users can be reduced, so that the effect of voice enhancement can be improved.
The initial voice acquisition device arrays of the L users are overlapped, so that the initial voice acquisition device arrays of all the users in the L users can be overlapped, and the initial voice acquisition device arrays of part of the users in the L users can be overlapped.
Under the condition that the initial voice acquisition device arrays of the first user and other users in the L users are overlapped, and the number of the voice acquisition devices in the initial voice acquisition device arrays of the first user is larger than that of the voice acquisition devices in the initial voice acquisition device arrays of the other users, the voice acquisition devices overlapped with the initial voice acquisition device arrays of the other users in the initial voice acquisition device arrays of the first user can be eliminated until the number difference between the initial voice acquisition device arrays of the first user and the other users is not overlapped or included is smaller than a number threshold.
After determining the K voice acquisition equipment groups, the method can also determine whether the voice acquisition equipment groups with the number of the voice acquisition equipment equal to 1 exist in the K voice acquisition equipment groups, determine whether the third voice acquisition equipment comprises a plurality of microphones under the condition that the voice acquisition equipment groups with the number of the voice acquisition equipment equal to 1 exist in the K voice acquisition equipment groups, and determine that the third voice acquisition equipment does not comprise a plurality of microphones, namely, under the condition that the third voice acquisition equipment comprises one microphone, perform single-channel voice enhancement on voice information acquired by the third voice acquisition equipment to obtain the voice information of the second user. The third voice acquisition equipment is voice acquisition equipment in any voice acquisition equipment group with the number of the voice acquisition equipment equal to 1 in the K voice acquisition equipment groups, and the second user is a user corresponding to the third voice acquisition equipment.
Under the condition that the third voice acquisition equipment is judged to comprise a microphone, the voice information which is acquired by the third voice acquisition equipment at the latest can be acquired firstly, then whether the third voice acquisition equipment is independently awakened by a second user can be judged, and under the condition that the third voice acquisition equipment is independently awakened by the second user, single-channel voice enhancement can be directly carried out on the voice information which is acquired by the third voice acquisition equipment at the latest, so that the target voice information of the second user is obtained. Under the condition that the third voice acquisition equipment is not independently awakened by the second user, the voice signal of the second user can be extracted from the voice information which is newly acquired by the third voice acquisition equipment according to the second voice channel, and then single-channel voice enhancement can be carried out on the extracted voice signal of the second user to obtain target voice information of the second user. The second voiceprint is the voiceprint corresponding to the second user.
And under the condition that the third voice acquisition equipment comprises a plurality of microphones, carrying out multi-channel voice enhancement on voice information acquired by the plurality of microphones to obtain target voice information of the second user. The voice information which is latest collected by the third voice collection device can be firstly obtained, then whether the third voice collection device is independently awakened by the second user can be judged, and under the condition that the third voice collection device is independently awakened by the second user, the voice information which is latest collected by the microphones can be directly subjected to multi-channel voice enhancement, so that the target voice information of the second user is obtained. Under the condition that the third voice acquisition equipment is not independently awakened by the second user, the voice signals of the second user can be extracted from the voice information newly acquired by the plurality of microphones according to the second voice channels, and then multichannel voice enhancement can be carried out on the extracted voice signals of the second user to obtain target voice information of the second user.
204. And carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of L users.
After the target voice acquisition equipment arrays of the L users are determined, the latest acquired voice information of the target voice acquisition equipment arrays can be acquired, and then the latest acquired voice information of the target voice acquisition equipment arrays can be subjected to multi-channel voice enhancement according to the position of each voice acquisition equipment in the target voice acquisition equipment arrays to obtain the voice information of the users.
After the latest acquired voice information of the target voice acquisition equipment array of the first user is acquired, the voice signals of the first user can be extracted from the voice information according to the first voiceprint, and the voice signals of the first user can be subjected to multi-channel voice enhancement according to the position of each voice acquisition equipment in the target voice acquisition equipment array of the first user, so that the voice information of the user is obtained.
In the voice synthesis method described in fig. 2, an electronic device in a voice enhancement system acquires voice information acquired by N voice acquisition devices in an awake state in M voice acquisition devices, determines K users, K voiceprints and K voice acquisition device groups according to the voice information and the N voice acquisition devices, determines a target voice acquisition device array of L users according to L voiceprints and L voice acquisition device groups when the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, and performs multi-channel voice enhancement on the voice information acquired by the target voice acquisition device array according to the positions of each voice acquisition device in the L voiceprints and the target voice acquisition device array to obtain target voice information of L users. Therefore, under the condition that a plurality of voice acquisition devices are awakened by a user, the voice acquisition device array for acquiring the voice information of the user can be determined, then the voice information acquired by the voice acquisition device array is subjected to multi-channel voice enhancement, the voice information of the user can be acquired by the voice acquisition devices, and then the voice information acquired by the voice devices is subjected to multi-channel voice enhancement, so that the effect of voice enhancement can be improved.
Referring to fig. 4, fig. 4 is a flow chart of another speech synthesis method according to an embodiment of the present disclosure. The voice synthesis method can be applied to electronic equipment capable of performing data processing. As shown in fig. 4, the voice synthesis method may include the following steps.
401. And acquiring voice information acquired by the N voice acquisition devices.
Wherein the detailed description of step 401 may refer to the description of step 201.
402. And determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information acquired by the N voice acquisition equipment and the N voice acquisition equipment.
Wherein the detailed description of step 402 may refer to the description of step 202.
403. Under the condition that the number of voice acquisition equipment in the K voice acquisition equipment groups is greater than 1, determining target voice acquisition equipment arrays of L users according to the L voiceprints and the L voice acquisition equipment groups.
Wherein the detailed description of step 403 may refer to the description of step 203.
404. And carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of L users.
Wherein the detailed description of step 404 may refer to the description of step 204.
405. And under the condition that the number of the voice acquisition equipment is equal to 1 in the K voice acquisition equipment groups and the third voice acquisition equipment comprises a microphone, performing single-channel voice enhancement on the voice information acquired by the third voice acquisition equipment to obtain target voice information of the second user.
Wherein the detailed description of step 405 may refer to the following related description of step 203.
406. And under the condition that the number of the voice acquisition equipment is equal to 1 in the K voice acquisition equipment groups and the third voice acquisition equipment comprises a plurality of microphones, carrying out multi-channel voice enhancement on voice information acquired by the microphones to obtain target voice information of a second user.
Wherein, the detailed description of step 406 can refer to the related description below of step 203.
In the voice enhancement method described in fig. 4, voice information acquired by N voice acquisition devices is acquired, K users, K voiceprints and K voice acquisition device groups are determined according to the voice information acquired by N voice acquisition devices and N voice acquisition devices, and under the condition that the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, a target voice acquisition device array of L users is determined according to the L voiceprints and the L voice acquisition device groups, and multi-channel voice enhancement is performed on the voice information acquired by the target voice acquisition device array according to the position of each voice acquisition device in the L voiceprints and the target voice acquisition device array, so as to obtain target voice information of L users. And under the condition that the number of the voice acquisition equipment is equal to 1 in the K voice acquisition equipment groups and the third voice acquisition equipment comprises a microphone, performing single-channel voice enhancement on the voice information acquired by the third voice acquisition equipment to obtain target voice information of the second user. And under the condition that the number of the voice acquisition equipment is equal to 1 in the K voice acquisition equipment groups and the third voice acquisition equipment comprises a plurality of microphones, carrying out multi-channel voice enhancement on voice information acquired by the microphones to obtain target voice information of a second user. It can be seen that in different situations, speech enhancement can be achieved, and flexibility of speech enhancement can be improved. In addition, under the condition of meeting the multi-channel voice enhancement, the multi-channel voice enhancement is carried out, so that the voice enhancement effect can be improved.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a voice enhancement device according to an embodiment of the present disclosure. The voice enhancement device can be applied to electronic equipment in a voice enhancement system, the voice enhancement system further comprises M voice acquisition devices, the M voice acquisition devices are arranged at different positions, and M is an integer greater than 1. As shown in fig. 5, the voice enhancement apparatus may include:
an acquiring unit 501, configured to acquire voice information acquired by N voice acquisition devices, where the N voice acquisition devices are voice acquisition devices in an awake state in the M voice acquisition devices, and N is an integer greater than 0 and less than or equal to M;
the first determining unit 502 is configured to determine, according to the voice information collected by the N voice collecting devices and the N voice collecting devices, K users, K voiceprints, and K voice collecting device groups, where the K users are in one-to-one correspondence with the K voiceprints, the K users are in one-to-one correspondence with the K voice collecting device groups, and K is an integer greater than 1;
a second determining unit 503, configured to determine, according to L voiceprints and L voice acquisition device groups, a target voice acquisition device array of L users in the case where the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, where the L voice acquisition device groups are voice acquisition device groups in which the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, the L users are users corresponding to the L voice acquisition device groups, the L voiceprints are voiceprints corresponding to the L voice acquisition device groups, and L is an integer greater than 0 and less than or equal to K;
The voice enhancement unit 504 is configured to perform multi-channel voice enhancement on the voice information collected by the target voice collection device array according to the L voiceprints and the position of each voice collection device in the target voice collection device array, so as to obtain target voice information of L users.
In some embodiments, the first determining unit 502 is specifically configured to:
determining users corresponding to voice signals included in voice information acquired by N voice acquisition devices to obtain K users;
determining voice acquisition equipment which is respectively awakened by K users in N voice acquisition equipment to obtain K voice acquisition equipment groups;
and respectively extracting the voiceprints of the K users from the voice information acquired by the N voice acquisition devices to obtain K voiceprints.
In some embodiments, the second determining unit 503 is specifically configured to:
determining a first voice acquisition device and a second voice acquisition device in a first voice acquisition device group, wherein the first voice acquisition device group is any voice acquisition device group in L voice acquisition device groups;
determining the position of a first user according to the first voiceprint, the position of the first voice acquisition equipment and the position of the second voice acquisition equipment, wherein the first voiceprint is the voiceprint corresponding to the first voice acquisition equipment group, and the first user is the user corresponding to the first voice acquisition equipment group;
And determining a voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user to obtain target voice acquisition device arrays of L users, wherein the target voice acquisition device arrays of the first user comprise the first voice acquisition device and the second voice acquisition device.
In some embodiments, the second determining unit 503 determines that the first voice capturing device and the second voice capturing device in the first voice capturing device group include:
under the condition that voice acquisition equipment independently awakened by a first user exists in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the voice acquisition equipment independently awakened by the first user as first voice acquisition equipment;
under the condition that the voice acquisition equipment which is independently awakened by the first user does not exist in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the first voice acquisition equipment group as first voice acquisition equipment;
and determining the voice acquisition equipment with the smallest distance with the first voice acquisition equipment in the first voice acquisition equipment group as second voice acquisition equipment.
In some embodiments, the second determining unit 503 determines the location of the first user according to the first voiceprint, the location of the first voice capture device, and the location of the second voice capture device comprises:
acquiring first voice information acquired by first voice acquisition equipment;
acquiring second voice information acquired by second voice acquisition equipment;
extracting a voice signal of a first user from the first voice information according to the first voiceprint to obtain a first voice signal;
extracting a voice signal of a first user from the second voice information according to the first voiceprint to obtain a second voice signal;
and determining the position of the first user according to the first voice signal, the second voice signal, the position of the first voice acquisition device and the position of the second voice acquisition device.
In some embodiments, the second determining unit 503 determines the voice capturing device array of the first user according to the position of the first voice capturing device, the position of the second voice capturing device, and the position of the first user, and the obtaining the target voice capturing device arrays of the L users includes:
determining an initial voice acquisition equipment array of the first user according to the position of the first voice acquisition equipment, the position of the second voice acquisition equipment and the position of the first user to obtain initial voice acquisition equipment arrays of L users;
Under the condition that the initial voice acquisition equipment arrays of the L users are overlapped, performing de-duplication processing and/or equalization processing on the initial voice acquisition equipment arrays of the L users to obtain target voice acquisition equipment arrays of the L users;
and under the condition that no overlap exists between the initial voice acquisition device arrays of the L users, determining the initial voice acquisition device arrays of the L users as target voice acquisition device arrays of the L users.
In some embodiments, the voice enhancement unit 504 is further configured to perform single-channel voice enhancement on voice information acquired by a third voice acquisition device to obtain target voice information of a second user when a voice acquisition device group with the number of voice acquisition devices equal to 1 exists in the K voice acquisition device groups, and the third voice acquisition device includes a microphone, where the third voice acquisition device is a voice acquisition device in any one of the K voice acquisition device groups with the number of voice acquisition devices equal to 1, and the second user is a user corresponding to the third voice acquisition device.
In some embodiments, the voice enhancement unit 504 is further configured to, in a case where a voice capturing device group with the number of voice capturing devices equal to 1 exists in the K voice capturing device groups, and the third voice capturing device includes a plurality of microphones, perform multi-channel voice enhancement on voice information captured by the plurality of microphones, and obtain target voice information of the second user.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the acquiring unit 502, the first determining unit 502, the second determining unit 503 and the voice enhancing unit 504 described above may be corresponding processes in the foregoing method embodiments, which are not repeated herein.
In several embodiments provided herein, the coupling of the units to each other may be electrical, mechanical, or other.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 6, the electronic device may include a processor 601 and a memory 602. Memory 602 may store one or more computer programs. The one or more computer programs are configured to perform the methods as described in the foregoing method embodiments. The memory 602 may be separate or integrated with the processor 601.
Processor 601 may include one or more processing cores. The processor 601 may connect various portions of the overall electronic device using various interfaces and lines, and may perform various functions of the electronic device and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 602, and invoking data stored in the memory 602. Alternatively, the processor 601 may be implemented in at least one hardware form of digital signal processing (digital signal processing, DSP), field programmable gate array (field programmable gate array, FPGA), programmable logic array (programmable logic array, PLA). The processor 601 may integrate one or a combination of several of a central processing unit (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 601 and may be implemented solely by a single communication chip.
The memory 602 may include random access memory (random access memory, RAM) or read-only memory (ROM). Memory 602 may be used to store instructions, programs, code, a set of codes, or a set of instructions. The memory 602 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the electronic device in use (e.g., phonebook, audio-video data, chat-record data), etc.
The processor 601 may be adapted to perform the various operations performed by the electronic device in the method embodiments described above when the computer program instructions stored in the memory 602 are executed. The specific implementation of these operations may be found in the previous embodiments, and will not be described here in detail.
The embodiment of the application also discloses a computer readable storage medium, and the computer readable storage medium stores computer program code, and the computer program code can be called by a processor to execute various operations in the embodiment of the method. The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.
The computer readable storage medium may be an electronic memory such as flash memory, electrically erasable programmable read-only memory (electrically erasable programmable read only memory, EEPROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), hard disk, or ROM. Alternatively, the computer readable storage medium may comprise a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium has storage space for program code to perform any of the method steps described above. These computer program code can be read from or written to one or more computer program products. The computer program code may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (11)

1. A method for speech enhancement, the method being applied to an electronic device in a speech enhancement system, the speech enhancement system further comprising M speech acquisition devices, the M speech acquisition devices being arranged in different positions, M being an integer greater than 1, the method comprising:
acquiring voice information acquired by N voice acquisition devices, wherein the N voice acquisition devices are voice acquisition devices in an awake state in the M voice acquisition devices, and N is an integer greater than 0 and less than or equal to M;
determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information and the N voice acquisition equipment, wherein the K users are in one-to-one correspondence with the K voiceprints, the K users are in one-to-one correspondence with the K voice acquisition equipment groups, and K is an integer greater than 1;
under the condition that voice acquisition equipment groups with the number of voice acquisition equipment larger than 1 exist in the K voice acquisition equipment groups, determining target voice acquisition equipment arrays of L users according to L voice prints and L voice acquisition equipment groups, wherein the L voice acquisition equipment groups are voice acquisition equipment groups with the number of voice acquisition equipment larger than 1 in the K voice acquisition equipment groups, the L users are users corresponding to the L voice acquisition equipment groups, the L voice prints are voice prints corresponding to the L voice acquisition equipment groups, and L is an integer larger than 0 and smaller than or equal to K;
And carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of the L users.
2. The method of claim 1, wherein said determining K users, K voiceprints, and K groups of voice capture devices from said voice information and said N voice capture devices comprises:
determining the users corresponding to the voice signals included in the voice information to obtain K users;
determining voice acquisition equipment which is respectively awakened by the K users in the N voice acquisition equipment to obtain K voice acquisition equipment groups;
and respectively extracting voiceprints of the K users from the voice information to obtain K voiceprints.
3. The method of claim 1, wherein determining the target voice capture device array for the L users based on the L voiceprints and the L voice capture device groups comprises:
determining a first voice acquisition device and a second voice acquisition device in a first voice acquisition device group, wherein the first voice acquisition device group is any voice acquisition device group in the L voice acquisition device groups;
Determining the position of a first user according to a first voiceprint, the position of the first voice acquisition equipment and the position of the second voice acquisition equipment, wherein the first voiceprint is the voiceprint corresponding to the first voice acquisition equipment group, and the first user is the user corresponding to the first voice acquisition equipment group;
determining a voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user to obtain target voice acquisition device arrays of L users, wherein the target voice acquisition device arrays of the first user comprise the first voice acquisition device and the second voice acquisition device.
4. The method of claim 3, wherein the determining the first voice capture device and the second voice capture device in the first voice capture device group comprises:
under the condition that the voice acquisition equipment independently awakened by the first user exists in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the voice acquisition equipment independently awakened by the first user as first voice acquisition equipment;
Under the condition that the voice acquisition equipment which is independently awakened by the first user does not exist in the first voice acquisition equipment group, determining the voice acquisition equipment with earliest awakening time in the first voice acquisition equipment group as first voice acquisition equipment;
and determining the voice acquisition equipment with the smallest distance between the voice acquisition equipment and the first voice acquisition equipment in the first voice acquisition equipment group as second voice acquisition equipment.
5. The method of claim 3, wherein the determining the location of the first user based on the first voiceprint, the location of the first voice capture device, and the location of the second voice capture device comprises:
acquiring first voice information acquired by the first voice acquisition equipment;
acquiring second voice information acquired by the second voice acquisition equipment;
extracting the voice signal of the first user from the first voice information according to the first voiceprint to obtain a first voice signal;
extracting the voice signal of the first user from the second voice information according to the first voiceprint to obtain a second voice signal;
and determining the position of the first user according to the first voice signal, the second voice signal, the position of the first voice acquisition device and the position of the second voice acquisition device.
6. The method of claim 3, wherein the determining the first user's voice capture device array based on the location of the first voice capture device, the location of the second voice capture device, and the location of the first user, the obtaining the L user's target voice capture device array comprises:
determining an initial voice acquisition device array of the first user according to the position of the first voice acquisition device, the position of the second voice acquisition device and the position of the first user to obtain initial voice acquisition device arrays of L users;
under the condition that the initial voice acquisition equipment arrays of the L users are overlapped, performing de-duplication processing and/or equalization processing on the initial voice acquisition equipment arrays of the L users to obtain target voice acquisition equipment arrays of the L users;
and under the condition that no overlap exists among the initial voice acquisition equipment arrays of the L users, determining the initial voice acquisition equipment arrays of the L users as target voice acquisition equipment arrays of the L users.
7. The method according to any one of claims 1-6, further comprising:
Under the condition that the number of voice acquisition equipment in the K voice acquisition equipment groups is equal to 1, and the third voice acquisition equipment comprises a microphone, voice information acquired by the third voice acquisition equipment is subjected to single-channel voice enhancement to obtain target voice information of a second user, the third voice acquisition equipment is voice acquisition equipment in any voice acquisition equipment group with the number of voice acquisition equipment in the K voice acquisition equipment groups being equal to 1, and the second user is the user corresponding to the third voice acquisition equipment.
8. The method of claim 7, wherein the method further comprises:
and under the condition that the number of the voice acquisition equipment groups in the K voice acquisition equipment groups is equal to 1, and the third voice acquisition equipment comprises a plurality of microphones, carrying out multi-channel voice enhancement on voice information acquired by the microphones to obtain target voice information of the second user.
9. The utility model provides a speech enhancement device, its characterized in that, the device is applied to the electronic equipment in speech enhancement system, speech enhancement system still includes M speech acquisition equipment, M speech acquisition equipment sets up in different positions, and M is the integer that is greater than 1, includes:
The voice acquisition unit is used for acquiring voice information acquired by N voice acquisition devices, wherein the N voice acquisition devices are voice acquisition devices in an awakening state in the M voice acquisition devices, and N is an integer which is more than 0 and less than or equal to M;
the first determining unit is used for determining K users, K voiceprints and K voice acquisition equipment groups according to the voice information and the N voice acquisition equipment, wherein the K users are in one-to-one correspondence with the K voiceprints, the K users are in one-to-one correspondence with the K voice acquisition equipment groups, and K is an integer larger than 1;
the second determining unit is configured to determine, according to L voiceprints and L voice acquisition device groups, a target voice acquisition device array of L users when the number of voice acquisition devices in the K voice acquisition device groups is greater than 1, where the L users are voice acquisition device groups with the number of voice acquisition devices in the K voice acquisition device groups being greater than 1, the L voiceprints are voiceprints corresponding to the L voice acquisition device groups, and L is an integer greater than 0 and less than or equal to K;
And the voice enhancement unit is used for carrying out multi-channel voice enhancement on the voice information acquired by the target voice acquisition equipment array according to the L voiceprints and the position of each voice acquisition equipment in the target voice acquisition equipment array to obtain target voice information of the L users.
10. An electronic device comprising a processor and a memory, the processor invoking a computer program stored in the memory to perform the method of any of claims 1-8.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or computer instructions, which, when executed by a processor, implement the method according to any of claims 1-8.
CN202410264048.7A 2024-03-08 2024-03-08 Speech enhancement method, device, electronic equipment and computer readable storage medium Active CN117854526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410264048.7A CN117854526B (en) 2024-03-08 2024-03-08 Speech enhancement method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410264048.7A CN117854526B (en) 2024-03-08 2024-03-08 Speech enhancement method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN117854526A true CN117854526A (en) 2024-04-09
CN117854526B CN117854526B (en) 2024-05-24

Family

ID=90542088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410264048.7A Active CN117854526B (en) 2024-03-08 2024-03-08 Speech enhancement method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117854526B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053368A (en) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 Speech enhancement method, electronic device, and storage medium
CN114598963A (en) * 2022-03-30 2022-06-07 北京地平线机器人技术研发有限公司 Voice processing method and device, computer readable storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053368A (en) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 Speech enhancement method, electronic device, and storage medium
CN114598963A (en) * 2022-03-30 2022-06-07 北京地平线机器人技术研发有限公司 Voice processing method and device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN117854526B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN111223497B (en) Nearby wake-up method and device for terminal, computing equipment and storage medium
CN110858483A (en) Intelligent device, voice awakening method, voice awakening device and storage medium
CN109274831B (en) Voice call method, device, equipment and readable storage medium
CN108665895B (en) Method, device and system for processing information
EP3185521B1 (en) Voice wake-up method and device
CN109151211B (en) Voice processing method and device and electronic equipment
CN109284080B (en) Sound effect adjusting method and device, electronic equipment and storage medium
CN205051764U (en) Electronic equipment
CN108320751B (en) Voice interaction method, device, equipment and server
CN110968353A (en) Central processing unit awakening method and device, voice processor and user equipment
KR20200094732A (en) Method and system for classifying time series data
CN110600058A (en) Method and device for awakening voice assistant based on ultrasonic waves, computer equipment and storage medium
CN111192590A (en) Voice wake-up method, device, equipment and storage medium
CN112233676A (en) Intelligent device awakening method and device, electronic device and storage medium
CN110956968A (en) Voice wake-up and voice wake-up function triggering method and device, and terminal equipment
CN117854526B (en) Speech enhancement method, device, electronic equipment and computer readable storage medium
WO2024099359A1 (en) Voice detection method and apparatus, electronic device and storage medium
CN113808585A (en) Earphone awakening method, device, equipment and storage medium
CN108877799A (en) Voice control device and method
CN113470646A (en) Voice wake-up method, device and equipment
CN104780258A (en) Noise removing method based on acceleration sensor, host processor and dispatching terminal
CN118155641A (en) Speech enhancement method, device, electronic equipment and computer readable storage medium
CN114400003B (en) Control method and system for automatic switching microphone, electronic equipment and storage medium
CN110517682A (en) Audio recognition method, device, equipment and storage medium
CN203911924U (en) Bluetooth device with voice wake-up

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant