CN117059115A - Voice enhancement method, device, system, storage medium and hearing aid earphone - Google Patents

Voice enhancement method, device, system, storage medium and hearing aid earphone Download PDF

Info

Publication number
CN117059115A
CN117059115A CN202311126152.1A CN202311126152A CN117059115A CN 117059115 A CN117059115 A CN 117059115A CN 202311126152 A CN202311126152 A CN 202311126152A CN 117059115 A CN117059115 A CN 117059115A
Authority
CN
China
Prior art keywords
voice
target
speaker
voice signal
voiceprint information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311126152.1A
Other languages
Chinese (zh)
Inventor
高顺
李良斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202311126152.1A priority Critical patent/CN117059115A/en
Publication of CN117059115A publication Critical patent/CN117059115A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice enhancement method, a device, a system, a storage medium and an auxiliary hearing earphone, which relate to the technical field of auxiliary hearing earphones, and the method comprises the following steps: in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; enhancing a target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal; outputting the enhanced speech; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in the initial voice segment. The technical scheme provided by the invention can enhance the sound of the main speaker in the collected sound by the auxiliary hearing earphone, so that a wearer of the auxiliary hearing earphone can hear the sound of the main speaker more clearly.

Description

Voice enhancement method, device, system, storage medium and hearing aid earphone
Technical Field
The invention relates to the technical field of hearing-aid headphones, in particular to a voice enhancement method, a device, a system, a storage medium and a hearing-aid headphone.
Background
The hearing aid earphone, namely the earphone with the hearing aid function, can amplify external sound and help a wearer, particularly a hearing impaired person, to acquire external sound information more clearly.
Currently, the hearing aid headphones are provided with headphone-related parameters according to the hearing loss of the wearer, so that the surrounding sound can be matched to the hearing loss of the wearer. In a scene where a specific person needs to be listened to, such as a lecture, a classroom, a lecture and the like, the sound of a main speaker needs to be heard for a wearer, but the current auxiliary hearing earphone is the same processing strategy for the sound of all external people, and cannot highlight the sound of a specific person.
Disclosure of Invention
The invention provides a voice enhancement method, a device, a system, a storage medium and an auxiliary hearing earphone, which are used for solving the problem that the auxiliary hearing earphone in the prior art cannot highlight the sound of a specific person in the collected sound, realizing the enhancement of the sound of a main speaker in the collected sound, and enabling the wearer of the auxiliary hearing earphone to hear the sound of the main speaker more clearly.
The invention provides a voice enhancement method which is applied to an auxiliary hearing earphone and comprises the following steps:
in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in an initial voice section;
enhancing the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal;
and outputting the enhanced voice.
According to the voice enhancement method provided by the invention, before the first voice signal is collected, the method further comprises the following steps:
under the condition that an initial voice analysis instruction is detected, acquiring the initial voice segment, and carrying out speaker separation on the initial voice segment to obtain at least one speaker voice signal;
determining, for each of the speaker speech signals, a score for the speaker speech signal based on attribute characteristics of the speaker speech signal;
determining the speaker voice signal with the highest score in the speaker voice signals as a second voice signal of the target main speaker;
and extracting voiceprint features of the second voice signal to obtain the target voiceprint information.
According to the voice enhancement method provided by the invention, the attribute characteristics comprise at least one of voice duration, volume, voice definition and voice source; the determining the score of the speaker voice signal based on the attribute features of the speaker voice signal comprises:
and carrying out weighted summation on all the attribute characteristics of the voice signals of the speaker to obtain the scores of the voice signals of the speaker.
According to the voice enhancement method provided by the invention, before the first voice signal is collected, the method further comprises the following steps:
under the condition of receiving a communication connection request sent by terminal equipment, establishing communication connection with the terminal equipment;
receiving the target voiceprint information sent by the terminal equipment, and entering the voice enhancement mode when the target voiceprint information is received; the target voiceprint information is selected from a voiceprint information base by the terminal equipment according to the selection operation of the main speaker, and the corresponding relation between the main speaker and the voiceprint information is stored in the voiceprint information base.
The voice enhancement method provided by the invention further comprises the following steps:
under the condition that an instruction for exiting the voice enhancement mode is detected, acquiring the operation parameters of the auxiliary hearing earphone before entering the voice enhancement mode to obtain target operation parameters;
and restoring the operation parameters of the hearing aid earphone to the target operation parameters.
The invention also provides a voice enhancement device applied to the hearing aid earphone, which comprises:
the voice acquisition module is used for acquiring a first voice signal in a voice enhancement mode;
the voice recognition module is used for recognizing a target voice signal of a target main speaker in the first voice signal based on the acquired target voiceprint information; the target voiceprint information is generated by extracting voiceprint characteristics of a second voice signal of the target main speaker in an initial voice section;
the voice enhancement module is used for enhancing the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal;
and the voice output module is used for outputting the enhanced voice.
The invention also provides an auxiliary hearing earphone which comprises a microphone, a sounding unit, a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the microphone, the sounding unit and the memory are respectively in communication connection with the processor; the processor, when executing the computer program, implements a speech enhancement method as described in any of the above.
The invention also provides a voice enhancement system, which comprises terminal equipment and the hearing aid earphone, wherein the terminal equipment is in communication connection with the hearing aid earphone;
the terminal equipment is used for determining a target main speaker according to main speaker selection operation, selecting target voiceprint information corresponding to the target main speaker from a voiceprint information base, and sending the target voiceprint information to the auxiliary hearing earphone;
and storing the corresponding relation between the main speaker and the voiceprint information in the voiceprint information base.
According to the voice enhancement system provided by the invention, the terminal equipment is further used for outputting a main speaker management interface under the condition that the main speaker management instruction is detected, and managing the voiceprint information base in response to the management operation of the main speaker management interface.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a speech enhancement method as described in any of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a speech enhancement method as described in any of the above.
According to the voice enhancement method, the device, the system, the storage medium and the auxiliary hearing earphone, the acquired target voiceprint information generated by voiceprint feature extraction based on the second voice signal of the target main speaker in the initial voice section is utilized, the target voice signal of the target main speaker in the acquired first voice signal is identified in a voice enhancement mode, then the target voice signal in the first voice signal is enhanced to obtain enhanced voice corresponding to the first voice signal, and the enhanced voice is output, so that the voice of the main speaker in the acquired voice signal can be enhanced only, the voice of the main speaker in the output voice is clearer, and a wearer of the auxiliary hearing earphone can hear the clearer voice of the main speaker.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a speech enhancement method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a voice enhancement device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an auxiliary hearing earphone according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a speech enhancement system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in the present invention, the numbers of the described objects, such as "first", "second", etc., are only used to distinguish the described objects, and do not have any sequence or technical meaning.
A hearing aid headset is an electronic device that can help hearing impaired people to obtain information of external sounds more clearly. In some scenes, such as lectures, classroom teaching, lectures and the like, people are more required to hear the voice of a main speaker, such as a teacher in a classroom, and a lecture place is required to hear the voice of the lecturer, but the current hearing aid earphone is only configured with operation parameters according to the hearing loss condition of a wearer, and the collected surrounding voice is processed identically, so that the voice of a specific person cannot be highlighted. Under the scenes of lectures, classroom teaching, lectures and the like, if the sound of the main speaker in the collected sound can be optimized and enhanced, the sound of the main speaker can be greatly helped to be answered by a wearer, the interference of other noise is avoided, and the hearing assisting performance of the Gao Fu hearing earphone is improved.
Based on the above, the embodiment of the invention provides a voice enhancement method, which utilizes the obtained target voiceprint information generated by voiceprint feature extraction based on the second voice signal of the target main speaker in the initial voice section, in the voice enhancement mode, identifies the target voice signal of the target main speaker in the collected first voice signal, enhances the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal, and outputs the enhanced voice, thereby realizing the purpose of enhancing the main speaker voice in the collected voice by the auxiliary hearing earphone, and enabling the wearer to hear clearer main speaker voice.
The speech enhancement method of the present invention is described below in connection with fig. 1. The voice enhancement method can be applied to the hearing aid earphone, and can also be applied to a voice enhancement device capable of being applied to the hearing aid earphone, wherein the voice enhancement device can be realized by software, hardware or a combination of the two.
Fig. 1 is a schematic flow chart illustrating a speech enhancement method according to an embodiment of the present invention, and referring to fig. 1, the speech enhancement method may include the following steps 110 to 140.
Step 110: in a speech enhancement mode, a first speech signal is acquired.
And the hearing aid earphone enters a voice enhancement mode under the condition of receiving a voice enhancement instruction. The voice enhancement instruction includes at least one of a voice instruction, a touch instruction, a wireless communication instruction, and the like, but is not limited thereto.
For example, a touch area can be designed on the auxiliary hearing earphone, and a wearer can control the auxiliary hearing earphone to enter a voice enhancement mode through touch operation of the touch area. The touch operation may include clicking, double clicking, or continuous pressing for a set period of time, etc. For example, in the scenes of lectures, classroom teaching, lectures and the like, a wearer of the auxiliary hearing earphone needs to obtain a clear voice for a main speaker (i.e., a main speaker), and the auxiliary hearing earphone can be controlled to enter a voice enhancement mode by clicking a touch control area of the auxiliary hearing earphone.
For example, a physical key can be designed on the auxiliary hearing earphone, and the wearer can control the auxiliary hearing earphone to enter a voice enhancement mode through triggering operation of the physical key.
For example, the auxiliary hearing earphone can collect a voice signal through the microphone, and when the voice signal is detected to include a voice wake-up word, the mode command keyword recognition is performed on the voice signal, and if the command keyword such as 'enter voice enhancement mode' or 'enhance voice' is recognized, the auxiliary hearing earphone is controlled to enter the voice enhancement mode.
For example, the auxiliary hearing earphone may also establish communication connection with a terminal device of the wearer, when the auxiliary hearing earphone needs to enter a voice enhancement mode, the wearer may select the voice enhancement mode in a mode control interface displayed by the terminal device, after determining, the terminal device may send a mode control instruction for entering the voice enhancement mode to the auxiliary hearing earphone, and after receiving the mode control instruction, the auxiliary hearing earphone enters the voice enhancement mode.
After the hearing aid earphone enters a voice enhancement mode, voice signals of surrounding environments are collected, and a first voice signal is obtained.
Step 120: and identifying a target voice signal of the target main speaker in the first voice signal based on the acquired target voiceprint information.
The target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in the initial voice segment.
After the auxiliary hearing earphone enters a voice enhancement mode, an initial voice section which is acquired first can be analyzed, a voice signal of a main speaker is determined, and then voiceprint feature extraction is carried out on the voice signal of the main speaker, so that voiceprint information of the main speaker is obtained. In the scenes of lectures, classroom teaching, lectures and the like, the main speaker, teacher and lecturer are the target main speaker in the scenes, and the corresponding voiceprint information is the target voiceprint information. Or, the wearer can send an initial voice analysis instruction to the auxiliary hearing earphone through touch operation on the touch area of the auxiliary hearing earphone, the auxiliary hearing earphone acquires an initial voice section with preset duration after receiving the instruction, then target voiceprint information of a target main speaker is extracted from the initial voice section, and the auxiliary hearing earphone enters a voice enhancement mode when the target voiceprint information is acquired. It will be appreciated that the mode control instruction for the secondary listening earpiece to enter the speech enhancement mode may include the initial speech analysis instruction or an information instruction when the target voiceprint information is obtained.
After the target voiceprint information of the target main speaker is obtained, the auxiliary hearing earphone can identify the target voice signal of the target main speaker in the subsequently acquired first voice signals by utilizing the target voiceprint information. Specifically, the first voice signal may be compared with the target voiceprint information, and if the first voice signal includes a speaker voice signal matched with the target voiceprint information, the matched speaker voice signal is the target voice signal of the target main speaker.
For example, after the auxiliary hearing earphone obtains the target voiceprint information of the target main speaker, the target voiceprint information can be stored in the voiceprint information base, and when the voice of the target speaker needs to be continuously heard next time, for example, the target voiceprint information of the target speaker can be directly called by hearing the lectures of the same teacher for multiple times.
Step 130: and enhancing the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal.
After the auxiliary hearing earphone determines the target voice signal of the target main speaker in the first voice signal, the voice enhancement algorithm can be utilized to carry out voice enhancement on the target voice signal in the first voice signal only, so that enhanced voice corresponding to the first voice signal is obtained.
By way of example, the speech enhancement algorithm may include at least one of a wavelet analysis-based speech enhancement algorithm, a kalman filter-based speech enhancement algorithm, a signal subspace-based enhancement method, an auditory masking effect-based speech enhancement method, an independent component analysis-based speech enhancement method, and a neural network-based speech enhancement method, but is not limited thereto.
Step 140: and outputting the enhanced voice.
According to the voice enhancement method provided by the embodiment of the invention, the acquired target voice information generated by voice characteristic extraction based on the second voice signal of the target main speaker in the initial voice section is utilized, the target voice signal of the target main speaker in the acquired first voice signal is identified in a voice enhancement mode, then the target voice signal in the first voice signal is enhanced to obtain enhanced voice corresponding to the first voice signal, and the enhanced voice is output, so that only the voice of the main speaker in the acquired voice signal is enhanced, the voice of the main speaker in the output voice is clearer, and the wearer of the auxiliary hearing earphone can hear the clearer voice of the main speaker.
Based on the speech enhancement method of the corresponding embodiment of fig. 1, in an example embodiment, before the first speech signal is collected, the method may further include: under the condition that an initial voice analysis instruction is detected, acquiring an initial voice segment, and carrying out speaker separation on the initial voice segment to obtain at least one speaker voice signal; determining, for each of the speaker speech signals, a score for the speaker speech signal based on the attribute characteristics of the speaker speech signal; determining the speaker voice signal with the highest score in the voice signals of all the speakers as a second voice signal of the target main speaker; and extracting voiceprint features of the second voice signal to obtain target voiceprint information.
By way of example, the attribute features may include at least one of a duration of speech, a volume, a clarity of speech, and a source of speech, which may include a person speaking and a speaker speaking. Accordingly, determining the score of the speaker speech signal based on the attribute characteristics of the speaker speech signal may include: and carrying out weighted summation on all attribute characteristics of the speaker voice signals to obtain scores of the speaker voice signals.
The auxiliary hearing earphone wearer can enter a class by touching the touch control area of the auxiliary hearing earphone, for example, by clicking the touch control area, send an initial voice analysis instruction to the auxiliary hearing earphone, collect voice data from the surrounding environment after the auxiliary hearing earphone detects the initial voice analysis instruction, and stop collecting when the collected voice data reaches a preset condition, so as to obtain an initial voice segment. The preset conditions may include: the duration of continuously detecting the voice reaches the preset duration, or the definition of continuously detecting the voice reaches the definition threshold. After the initial voice segment is obtained, the hearing aid earphone performs speaker separation on the initial voice segment to obtain at least one speaker voice signal.
In the case of speaking with a speaker, it can be understood that speaking with a microphone and playing the speaking content through a speaker is clear than the directly-emitted voice, the voice with the large speaking volume is clear, and the person with the longest speaking duration is the core speaker (i.e. the main speaker), based on which the voice signal can be obtainedCorresponding attribute features including speech source, volume, duration and clarity etc. are given different weights, e.g. speech source X 1 The weight of (a) is a, and the voice definition X 2 The weight of (2) is b, the volume X 3 Weight of c, duration of speech X 4 The score P of the speaker's speech signal can be determined using the following scoring formula (1) with the weight d:
P=a*X 1 +b*X 2 +c*X 3 +d*X 4 (1)
wherein, if the speaker uses the microphone, X is taken 1 =1; if the speaker does not use the microphone, take X 1 =0. The probability of the speaker's speech signal being the dominant speaker's speech signal can be characterized by the score P.
Based on the above, after the speaker voice signals are separated, for each speaker voice signal, the score of each speaker voice signal can be determined by using the scoring formula (1) according to attribute characteristics such as voice duration, volume, voice definition, voice source and the like, and then the speaker voice signal with the highest score in each speaker voice signal is determined as the voice signal of the target main speaker.
For example, the speaker speech signal A of the speaker 1, the speaker speech signal B of the speaker 2 and the speaker speech signal C of the speaker 3 are separated from the initial speech segment, and the score P of the speaker speech signal A is determined by the scoring formula (1) A Score P of speaker speech signal B B And score P of speaker speech signal C C Let P be A >P B >P C Then the speaker 1 may be determined to be the dominant speaker and the speaker speech signal a may be the second speech signal of the determined target dominant speaker. Then, the voiceprint feature of the speaker voice signal a can be extracted to obtain the target voiceprint feature of the main speaker 1. Then, the hearing aid earphone can enhance the voice of the speaker 1 in the voice signals acquired by the follow-up microphone according to the target voiceprint information, so that the wearer can hear the voice of the speaker 1 more clearly.
For example, the target voiceprint feature may be saved to a voiceprint information library, and the target voiceprint feature of speaker 1 saved in the voiceprint information library may be directly invoked when the wearer continues listening in the scene spoken by speaker 1 the next time.
Based on the speech enhancement method of the corresponding embodiment of fig. 1, in an example embodiment, before the first speech signal is collected, the method may further include: under the condition of receiving a communication connection request sent by the terminal equipment, establishing communication connection with the terminal equipment; receiving target voiceprint information sent by terminal equipment, and entering a voice enhancement mode when the target voiceprint information is received; the target voiceprint information is selected from a voiceprint information base by the terminal equipment according to the selection operation of the main speaker, and the corresponding relation between the main speaker and the voiceprint information is stored in the voiceprint information base.
For example, after the secondary earpiece analyzes the voiceprint information of the primary speaker using the initial speech segment, the voiceprint information may be stored in a voiceprint information library. The wearer of the hearing aid earphone can manage the voiceprint information of each main speaker recorded in the voiceprint information base through the terminal equipment, such as marking, deleting, viewing, selecting and the like. When the wearer listens to the speech of the same main speaker again, for example, when listening to the class of a teacher again, the main speaker management interface can be opened through the terminal device, a main speaker selection menu can be displayed in the management interface, the wearer can perform main speaker selection operation on the menu, the terminal device searches the target voiceprint information of the target main speaker from the voiceprint information base according to the selected target main speaker, and sends the target voiceprint information to the auxiliary hearing earphone. After receiving the target voiceprint information, the auxiliary hearing earphone enters a voice enhancement mode, and the target voiceprint information is utilized to enhance the voice signal of the main speaker on the subsequently acquired voice signal.
In this way, the terminal equipment can be used for controlling the hearing aid earphone to quickly enter the voice enhancement mode, and the analysis of the initial voice segment is not required to be carried out every time.
Based on the voice enhancement method of the above embodiments, in an example embodiment, the voice enhancement method may further include: under the condition that an instruction for exiting the voice enhancement mode is detected, acquiring the operation parameters of the auxiliary hearing earphone before entering the voice enhancement mode, and acquiring target operation parameters; and restoring the operation parameters of the hearing aid earphone to the target operation parameters.
For example, when the user of the auxiliary hearing earphone needs to exit the voice enhancement mode, the touch area of the auxiliary hearing earphone can be double-clicked, and an instruction for exiting the voice enhancement mode is sent to the auxiliary hearing earphone, and then the auxiliary hearing earphone can be restored to the running state before entering the voice enhancement mode.
The following describes a speech enhancement apparatus provided by the present invention, and the speech enhancement apparatus described below and the speech enhancement method described above may be referred to correspondingly to each other.
Fig. 2 is a schematic structural diagram of a voice enhancement device according to an embodiment of the present invention, and referring to fig. 2, a voice enhancement device 200 may include: a voice acquisition module 210, configured to acquire a first voice signal in a voice enhancement mode; the voice recognition module 220 is configured to recognize a target voice signal of a target main speaker in the first voice signal based on the obtained target voice print information, where the target voice print information is generated by extracting voice print features of a second voice signal of the target main speaker in the initial voice segment; the voice enhancement module 230 is configured to enhance a target voice signal in the first voice signal, so as to obtain enhanced voice corresponding to the first voice signal; the voice output module 240 is used for outputting the enhanced voice.
In an example embodiment, the speech enhancement apparatus 200 further comprises: the speaker separation module is used for collecting an initial voice section under the condition that an initial voice analysis instruction is detected, and performing speaker separation on the initial voice section to obtain at least one speaker voice signal; a scoring module for determining, for each speaker speech signal, a score for the speaker speech signal based on the attribute characteristics of the speaker speech signal; the main speaker determining module is used for determining the speaker voice signal with the highest score in the voice signals of all the speakers as a second voice signal of the target main speaker; and the voiceprint feature extraction module is used for extracting voiceprint features of the second voice signal to obtain target voiceprint information.
In one example embodiment, the attribute features include at least one of a duration of speech, a volume, a clarity of speech, and a source of speech; correspondingly, the scoring module is specifically configured to perform weighted summation on each attribute feature of the speaker voice signal, so as to obtain a score of the speaker voice signal.
In an example embodiment, the speech enhancement apparatus 200 further comprises: the communication module is used for establishing communication connection with the terminal equipment under the condition of receiving a communication connection request sent by the terminal equipment; the information receiving module is used for receiving target voiceprint information sent by the terminal equipment and entering a voice enhancement mode when receiving the target voiceprint information; the target voiceprint information is selected from a voiceprint information base by the terminal equipment according to the selection operation of the main speaker, and the corresponding relation between the main speaker and the voiceprint information is stored in the voiceprint information base.
In an example embodiment, the speech enhancement apparatus 200 further comprises: the parameter acquisition module is used for acquiring the operation parameters of the hearing aid earphone before entering the voice enhancement mode under the condition that the command of exiting the voice enhancement mode is detected, so as to acquire target operation parameters; and the parameter recovery module is used for recovering the operation parameters of the hearing aid earphone to the target operation parameters.
Fig. 3 illustrates a schematic structure of a hearing aid earphone, and as shown in fig. 3, the hearing aid earphone may include: processor 310, communication interface (Communication Interface) 320, memory 330, communication bus 340, microphone 350 and sound generating unit 360, wherein processor 310, communication interface 320, memory 330, microphone 350 and sound generating unit 360 may communicate with each other via communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform the speech enhancement method provided by the method embodiments described above, which may include, for example: in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; enhancing a target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal; outputting the enhanced speech; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in the initial voice segment.
By way of example, the processor 310 may be a central processing unit (Central Processing Unit, CPU) or may be a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The embodiment of the invention also provides a voice enhancement system, fig. 4 schematically illustrates a structure diagram of the voice enhancement system provided by the embodiment of the invention, and referring to fig. 4, the voice enhancement system includes a terminal device 410 and an auxiliary hearing earphone 420, where the terminal device 410 is communicatively connected with the auxiliary hearing earphone 420. Wherein, the auxiliary hearing earphone 420 may be an auxiliary hearing earphone as in the corresponding embodiment of fig. 3; the terminal device 410 is configured to determine a target main speaker according to a main speaker selection operation, select target voiceprint information corresponding to the target main speaker from the voiceprint information base, and send the target voiceprint information to the auxiliary hearing earphone. And storing the corresponding relation between the main speaker and the voiceprint information in the voiceprint information base.
By way of example, the terminal device 410 may include, but is not limited to, a cell phone, a computer, a tablet computer, a wearable device, and the like.
In an exemplary embodiment, the terminal device 410 is further configured to output a master speaker management interface in case of detecting a master speaker management instruction, and manage the voiceprint information base in response to a management operation directed to the master speaker management interface.
For example, the terminal device 410 may provide a physical key for inputting the master speaker management instruction, and when the physical key is activated, the terminal device 410 may detect the master speaker management instruction. Alternatively, the terminal device 410 may receive a main speaker management instruction input by the user through its own touch screen.
Illustratively, a main speaker selection menu may be included in the main speaker management interface, through which a user may mark, delete, view, and select the voiceprint information library.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, where the computer program, when executed by a processor, can perform the speech enhancement method provided by the above method embodiments, where the method may include: in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; enhancing a target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal; outputting the enhanced speech; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in the initial voice segment.
In still another aspect, the present invention further provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the speech enhancement method provided by the above method embodiments, the method may include, for example: in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; enhancing a target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal; outputting the enhanced speech; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in the initial voice segment.
By way of example, computer-readable storage media includes non-transitory computer-readable storage media.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of speech enhancement for use with a hearing assistance headset, the method comprising:
in a voice enhancement mode, collecting a first voice signal, and identifying a target voice signal of a target main speaker in the first voice signal based on the obtained target voiceprint information; the target voiceprint information is generated based on voiceprint feature extraction of a second voice signal of the target main speaker in an initial voice section;
enhancing the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal;
and outputting the enhanced voice.
2. The method of claim 1, wherein prior to collecting the first speech signal, the method further comprises:
under the condition that an initial voice analysis instruction is detected, acquiring the initial voice segment, and carrying out speaker separation on the initial voice segment to obtain at least one speaker voice signal;
determining, for each of the speaker speech signals, a score for the speaker speech signal based on attribute characteristics of the speaker speech signal;
determining the speaker voice signal with the highest score in the speaker voice signals as a second voice signal of the target main speaker;
and extracting voiceprint features of the second voice signal to obtain the target voiceprint information.
3. The method of claim 2, wherein the attribute features include at least one of voice duration, volume, voice clarity, and voice source; the determining the score of the speaker voice signal based on the attribute features of the speaker voice signal comprises:
and carrying out weighted summation on all the attribute characteristics of the voice signals of the speaker to obtain the scores of the voice signals of the speaker.
4. The method of claim 1, wherein prior to collecting the first speech signal, the method further comprises:
under the condition of receiving a communication connection request sent by terminal equipment, establishing communication connection with the terminal equipment;
receiving the target voiceprint information sent by the terminal equipment, and entering the voice enhancement mode when the target voiceprint information is received; the target voiceprint information is selected from a voiceprint information base by the terminal equipment according to the selection operation of the main speaker, and the corresponding relation between the main speaker and the voiceprint information is stored in the voiceprint information base.
5. The speech enhancement method according to any one of claims 1 to 4, further comprising:
under the condition that an instruction for exiting the voice enhancement mode is detected, acquiring the operation parameters of the auxiliary hearing earphone before entering the voice enhancement mode to obtain target operation parameters;
and restoring the operation parameters of the hearing aid earphone to the target operation parameters.
6. A speech enhancement apparatus for use with a hearing assistance headset, the apparatus comprising:
the voice acquisition module is used for acquiring a first voice signal in a voice enhancement mode;
the voice recognition module is used for recognizing a target voice signal of a target main speaker in the first voice signal based on the acquired target voiceprint information; the target voiceprint information is generated by extracting voiceprint characteristics of a second voice signal of the target main speaker in an initial voice section;
the voice enhancement module is used for enhancing the target voice signal in the first voice signal to obtain enhanced voice corresponding to the first voice signal;
and the voice output module is used for outputting the enhanced voice.
7. The auxiliary hearing earphone is characterized by comprising a microphone, a sounding unit, a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the microphone, the sounding unit and the memory are respectively in communication connection with the processor; the processor, when executing the computer program, implements the speech enhancement method according to any of claims 1 to 5.
8. A speech enhancement system comprising a terminal device and the hearing aid earpiece of claim 7, the terminal device being communicatively coupled to the hearing aid earpiece;
the terminal equipment is used for determining a target main speaker according to main speaker selection operation, selecting target voiceprint information corresponding to the target main speaker from a voiceprint information base, and sending the target voiceprint information to the auxiliary hearing earphone;
and storing the corresponding relation between the main speaker and the voiceprint information in the voiceprint information base.
9. The speech enhancement system according to claim 8, wherein the terminal device is further configured to output a main speaker management interface in case of detecting a main speaker management instruction, and manage the voiceprint information base in response to a management operation directed to the main speaker management interface.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the speech enhancement method according to any of claims 1 to 5.
CN202311126152.1A 2023-09-01 2023-09-01 Voice enhancement method, device, system, storage medium and hearing aid earphone Pending CN117059115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311126152.1A CN117059115A (en) 2023-09-01 2023-09-01 Voice enhancement method, device, system, storage medium and hearing aid earphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311126152.1A CN117059115A (en) 2023-09-01 2023-09-01 Voice enhancement method, device, system, storage medium and hearing aid earphone

Publications (1)

Publication Number Publication Date
CN117059115A true CN117059115A (en) 2023-11-14

Family

ID=88662632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311126152.1A Pending CN117059115A (en) 2023-09-01 2023-09-01 Voice enhancement method, device, system, storage medium and hearing aid earphone

Country Status (1)

Country Link
CN (1) CN117059115A (en)

Similar Documents

Publication Publication Date Title
US11450337B2 (en) Multi-person speech separation method and apparatus using a generative adversarial network model
US10923137B2 (en) Speech enhancement and audio event detection for an environment with non-stationary noise
JP6651973B2 (en) Interactive processing program, interactive processing method, and information processing apparatus
CN112242149B (en) Audio data processing method and device, earphone and computer readable storage medium
CN108476072B (en) Method and system for determining sound parameters associated with sound types
CN214226506U (en) Sound processing circuit, electroacoustic device, and sound processing system
CN111491236A (en) Active noise reduction earphone, awakening method and device thereof and readable storage medium
CN109308900B (en) Earphone device, voice processing system and voice processing method
CN111081275B (en) Terminal processing method and device based on sound analysis, storage medium and terminal
CN110232909A (en) A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN113035225B (en) Visual voiceprint assisted voice separation method and device
CN111800700B (en) Method and device for prompting object in environment, earphone equipment and storage medium
US20190304457A1 (en) Interaction device and program
US11940896B2 (en) Information processing device, information processing method, and program
CN117059115A (en) Voice enhancement method, device, system, storage medium and hearing aid earphone
JP6755843B2 (en) Sound processing device, voice recognition device, sound processing method, voice recognition method, sound processing program and voice recognition program
JP2014149571A (en) Content search device
US12073844B2 (en) Audio-visual hearing aid
CN114650492A (en) Wireless personal communication via a hearing device
KR101022457B1 (en) Method to combine CASA and soft mask for single-channel speech separation
CN110992951A (en) Method for protecting personal privacy based on countermeasure sample
KR102239676B1 (en) Artificial intelligence-based active smart hearing aid feedback canceling method and system
KR102239675B1 (en) Artificial intelligence-based active smart hearing aid noise canceling method and system
JP7316971B2 (en) CONFERENCE SUPPORT SYSTEM, CONFERENCE SUPPORT METHOD, AND PROGRAM
US20230290356A1 (en) Hearing aid for cognitive help using speaker recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination