CN118057837A - Transparent transmission mode switching method and switching device - Google Patents

Transparent transmission mode switching method and switching device Download PDF

Info

Publication number
CN118057837A
CN118057837A CN202310931487.4A CN202310931487A CN118057837A CN 118057837 A CN118057837 A CN 118057837A CN 202310931487 A CN202310931487 A CN 202310931487A CN 118057837 A CN118057837 A CN 118057837A
Authority
CN
China
Prior art keywords
sound
user
earphone
time
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310931487.4A
Other languages
Chinese (zh)
Inventor
杨昭
韩荣
韩欣宇
王耀光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202310931487.4A priority Critical patent/CN118057837A/en
Publication of CN118057837A publication Critical patent/CN118057837A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Headphones And Earphones (AREA)

Abstract

The application provides a transparent transmission mode switching method and a transparent transmission mode switching device, which can start the transparent transmission mode or keep the state of starting the transparent transmission mode when a user is in a listening state, thereby being beneficial to improving user experience. The method provided by the application can be applied to the earphone, and the method can comprise the following steps: the earphone collects and stores the sound of the first object at a first time; the earphone collects first environmental sounds in an environment where a user wearing the earphone is located at a second time, a first object is in a speaking state in the environment at the second time, and the second time is after the first time; in response to the first ambient sound, the headset turns on a pass-through mode.

Description

Transparent transmission mode switching method and switching device
The application discloses a method and a device for switching transmission modes of a patent application with the application number 202211452699.6, which is filed on the year 2022, 11 and 21.
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a method and an apparatus for switching a transparent transmission mode.
Background
With the popularity and development of terminal devices, users use real wireless (true wireless stereo, TWS) headphones more and more frequently, and wearing the TWS headphones for a long time has become a normal state. In order not to affect communication with other people when the user wears the TWS headset, the art proposes a transmission-through mode of the TWS headset. The transmission mode is that the TWS earphone can filter environmental sounds and transmit sounds of other people to the user, so that the user can hear the sounds in the environment, and further can talk with other people by wearing the TWS earphone.
Currently, TWS headphones can control whether to turn on the transmission mode by: if the TWS earphone detects that the user speaks, a transmission mode is started; if no speech is detected by its user for a period of time, the pass-through mode is turned off.
However, the above TWS earphone has the following problems: when the user is in a listening state, the TWS headset is in a state of turning off the pass through mode, thereby causing the user to fail to listen to the speech of the other person.
Disclosure of Invention
The application provides a transparent transmission mode switching method and a transparent transmission mode switching device, which can start the transparent transmission mode or keep the state of starting the transparent transmission mode when a user is in a listening state.
In a first aspect, a method for switching a transparent transmission mode is provided, which is applied to an earphone and includes: the earphone collects and stores the sound of the first object at a first time; the earphone collects first environmental sounds in an environment where a user wearing the earphone is located at a second time, a first object is in a speaking state in the environment at the second time, and the second time is after the first time; in response to the first ambient sound, the headset turns on a pass-through mode.
The first object is called a key object in the specific embodiment, and the number of the first objects may be one or more, which is not limited in the present application. The first object is another person than the user wearing the headset. The first ambient sound may be understood as sound within the environment of the user wearing the headset.
The time at which the headset collects the sound of the first object may be referred to as a first time. The second time is after the first time, that is, the earphone collects and stores the sound of the first object first, and then collects the first environmental sound. The earpiece may enable the pass-through mode when the earpiece stores sounds of the first object in hope of the first object speaking.
The earphone stores the sound of the first object, the earphone collects the first environmental sound in the environment where the user wearing the earphone is located at the second time, the first object is speaking at the second time, the earphone can recognize the sound of the first object, and the transmission mode can be started. It will be appreciated that the first object may be in a listening state when the user wearing the headset is speaking in the environment in which the user is listening, at which time the headset may be in a pass-through mode.
As shown in fig. 5 in the specific embodiment, the method provided by the present application may collect, at the second time, the first environmental sound in the environment where the user wearing the earphone is located, that is, S501 shown in fig. 5, where the earphone collects the sound in the environment where the user is located, and the time for the earphone to collect is the second time. The first object is in a speaking state in the environment at the second time, that is, the earphone judges that the sound of the environment where the user is located exists in the sound of the person, and the sound of the person is the sound of the key object, that is, S502 and S503.
According to the method for switching the transparent transmission mode, the first object in the environment where the user is located is in the speaking state, and the earphone can start the transparent transmission mode, so that the user can listen to the sound of the first object conveniently, the situation that the earphone exits the transparent transmission mode in the listening state is avoided, and the use experience of the user is improved.
With reference to the first aspect, in certain implementations of the first aspect, the first object faces the user at the second time, and the first object is in a speaking state for a first duration; or at a second time, the first object is user-oriented and the user is first object-oriented.
The first object is in a speaking state in the environment at the second time, meanwhile, the earphone detects that the first object speaks to the user at the second time, and the speaking time of the first object is longer than or equal to the first time, so that the earphone starts a transparent transmission mode or keeps a state of starting the transparent transmission mode.
According to the implementation mode, the user can be accurately judged to be in a listening state, the transparent transmission mode is started or the transparent transmission mode is kept started, so that the user can talk with the first object conveniently, and the user experience is improved.
The first object is in a speaking state in the environment at the second time, and at the same time, the earphone detects that the first object faces the user at the second time and the user also faces the first object, and the earphone starts a transparent mode or keeps a state of starting the transparent mode.
According to the implementation mode, the user can be accurately judged to talk with the first object, the transparent transmission mode is started or the state of the transparent transmission mode is kept, so that the user can talk with the first object conveniently, and the user experience is improved.
According to the transparent transmission mode switching method, the state of the user is judged more accurately based on the direction of the user and the first object, and when the user is in a listening state, the transparent transmission mode is started or the state of the transparent transmission mode is kept to be started, so that the use experience of the user is improved.
With reference to the first aspect, in certain implementations of the first aspect, the headphones include a left headphone and a right headphone, the left headphone acquiring a first transfer function of the sound of the first object, the right headphone acquiring a second transfer function of the sound of the first object; the method further comprises the steps of: if the ratio of the first transfer function to the second transfer function is smaller than or equal to a first preset threshold, the earphone determines that the first object faces the user, and the user faces the first object; if the ratio is greater than a first preset threshold and less than or equal to a second preset threshold, the earphone determines that the first object faces the user and the user does not face the first object; or if the ratio is greater than the second preset threshold, the earphone determines that the first object is not facing the user and the user is not facing the first object.
The first environmental sound includes the sound of the first object, the earphone collects the first environmental sound, and accordingly, the earphone can collect the first sound. The headphones include a left headphone and a right headphone, the left headphone acquires a first transfer function of the sound of the first object, the right headphone acquires a second transfer function of the sound of the first object, and the first transfer function and the second transfer function may be represented by symbols L '(s) and R'(s), respectively, but the present application is not limited thereto.
The first preset threshold may be represented by the symbol lambda 1 and the second preset threshold may be represented by the symbol lambda 2, but the present application is not limited thereto.
If it isThe headset determines that the first object is user-oriented and that the user is first object-oriented; if/>The headset determines that the first object is user-oriented and that the user is not oriented to the first object; if/>The headset determines that the first object is not facing the user and that the user is not facing the first object.
According to the transparent transmission mode switching method, the orientation of the user and the first object is determined based on the difference of the transfer functions of the sound of the first object acquired by the left earphone and the right earphone, so that the state of the user can be accurately judged, and the transparent transmission mode can be accurately started.
With reference to the first aspect, in certain implementation manners of the first aspect, in response to the first environmental sound, the earphone turns on a transparent mode, including: the earphone responds to the first environmental sound to acquire the voice of the person in the first environmental sound; and under the condition that the human voice in the first environmental sound is matched with the sound of the first object, the earphone starts a transmission mode.
The earphone responds to the first environmental sound, can carry out the voice detection to the first environmental sound first, if there is the voice, judge whether this voice matches with the first object again.
Illustratively, the headphones may perform voice detection of the sound of the environment in which the user is located by a voice activity detection (voice activity detection, VAD) algorithm.
One or more human voices may be present in the first ambient sound, as the application is not limited in this regard. The first object speaks in the user's environment at the second time and the human voice in the first ambient sound may include the sound of the first object. The sound of the first object exists in the human voice in the first environment sound, and the earphone starts a transmission mode.
It can be appreciated that if no human voice exists in the first environmental sound, the earphone will not make any subsequent judgment.
The method for switching the transparent transmission mode can detect the first environmental sound and then match the first environmental sound with the sound of the first object under the condition that the voice exists so as to determine whether to start the transparent transmission mode.
With reference to the first aspect, in certain implementations of the first aspect, when the earphone collects the sound of the first object at the first time, the user faces the first object; under the condition that the human voice in the first environmental sound is matched with the sound of the first object, the earphone starts a transparent transmission mode, which comprises the following steps: if a speaker corresponding to a voice in the first environmental sound is not facing the user, determining a compensation function by the earphone based on the angle of the speaker relative to the user and a first corresponding relation, wherein the first corresponding relation comprises a plurality of angles and the compensation function corresponding to each angle in the plurality of angles, and the plurality of angles comprise the angle of the speaker relative to the user; the earphone compensates the voice in the first environmental sound based on the compensation function to obtain the compensated voice in the first environmental sound; and under the condition that the human voice in the compensated first environmental sound is matched with the sound of the first object, starting a transmission mode of the earphone.
If the speaker corresponding to the voice in the first environmental sound is not facing the user, the earphone can compensate the voice in the first environmental sound. The earphone is preset with a first corresponding relation, a proper compensation function can be selected from the first corresponding relation based on the angle of a speaker relative to a user, and the voice in the first environment sound is compensated based on the compensation function, so that the voice in the compensated first environment sound is obtained. The first corresponding relation is determined by a developer through experiments and preset in the earphone.
Illustratively, the compensation functions of different orientations in the specific embodiment are used to represent the compensation functions corresponding to different angles in the first correspondence. The calibration process of the first correspondence relationship may be as shown in fig. 9 in the specific embodiment.
According to the transmission mode switching method, under the condition that the speaker corresponding to the voice in the first environment sound is not facing the user, the voice in the first environment sound is compensated, and then the voice in the compensated first environment sound is matched with the voice of the first object, so that the matching accuracy is improved.
Optionally, the human voice in the first environmental sound includes the first object, and based on the method, the earphone may also compensate the sound of the first object in the first environmental sound to obtain the compensated sound of the first object. The earphone can determine the direction of the first object when speaking and the direction of the user based on the compensated sound of the first object so as to judge the state of the user more accurately and reasonably switch transmission modes, thereby being beneficial to improving user experience.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the earphone collects second environmental sounds in the environment where the user is located at a third time, the second object is in a speaking state in the environment at the third time, the first object is not in the speaking state in the environment at the third time, and the third time is after the second time; in response to the second ambient sound, if the user is in a speaking state at a third time, the headset remains in an on state in the transparent mode.
And collecting a second environmental sound in the environment where the user is located at a third time after the earphone is started in the transparent transmission mode, wherein at the moment, the second object is in a speaking state in the environment at the third time, and the first object is not in the speaking state in the environment at the third time, so that the sound of the second object is included in the second environmental sound, and the sound of the first object is not included in the second environmental sound. At this time, if the user is in the speaking state at the third time, the earphone keeps the transparent mode in the on state. That is, the conditions for the earphone to turn on the through transmission mode or to keep the through transmission mode in an on state include: the user is in a speaking state or there is a sound of the first object in the ambient sound.
According to the transmission mode switching method, the user can actively talk or the earphone can be in the transmission state when the user does not talk and is in the listening state, so that the user can talk with other people, and the user experience is improved.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes: in response to the second ambient sound, if the user is not in a speaking state at a third time, the headset exits the pass-through mode.
At a third time, if the user does not speak and there is no sound of the first object in the environment, the headset exits the pass-through mode.
According to the transmission mode switching method, when the user does not speak and the environment does not have the sound of the first object, the earphone exits the transmission mode, the transmission mode is not always kept, and power consumption of the earphone is saved.
With reference to the first aspect, in certain implementations of the first aspect, collecting and storing, at the headset, sound of the first object at the first time includes: the earphone receives a first instruction of the terminal equipment at a first time, the terminal equipment is connected with the earphone, and the first instruction is used for instructing the earphone to collect sound of a first object; the earphone collects sounds of the first object based on the first instruction, and stores the sounds of the first object.
The terminal device and the earphone can be connected through wireless connection or wired connection, and the application is not limited to this. The terminal device may be a mobile phone or the like. The terminal device may include a control for collecting data. The terminal equipment detects the operation of triggering a control for collecting data by a user, and can send a first instruction to the earphone in response to the operation, wherein the first instruction is used for indicating the earphone to collect the sound of a first object; the earphone collects sounds of the first object based on the first instruction, and stores the sounds of the first object.
By way of example, an implementation of specifically capturing sound of a first object may be as shown in fig. 7 in a specific embodiment.
There are many possible implementations of the way in which the earpiece stores the sound of the first object.
In one possible implementation, the headphones preserve the sound of the first object.
According to the implementation mode, the earphone stores the sound of the first object, the sound of the first object is not required to be acquired from the outside, signaling overhead is saved, and recognition efficiency can be improved.
In another possible implementation, the headset may send the sound of the first object to the terminal device, which stores the sound of the first object. When the earphone matches the ambient sound with the sound of the first object, the sound of the first object may be acquired from the terminal device.
According to the implementation mode, the earphone does not need to store the sound of the key object, and the memory space of the earphone can be saved.
Optionally, the earphone may further receive a second instruction of the terminal device, where the second instruction is used to instruct the earphone to end collecting the sound of the first object, and the earphone does not collect the sound of the first object any more based on the second instruction.
According to the transmission mode switching method provided by the application, the sound of the first object is collected based on the indication of the terminal equipment, so that the transmission mode can be conveniently started when the first object speaks later.
In a second aspect, a transparent transmission mode switching device is provided, including: the device comprises an acquisition module and a processing module. Wherein, collection module is used for: collecting sound of a first object at a first time; the processing module is used for: storing sound of the first object at a first time; the acquisition module is also used for: collecting a first environmental sound in an environment where a user wearing the switching device is located at a second time, wherein the first object is in a speaking state in the environment at the second time, and the second time is after the first time; the processing module is also used for: in response to the first ambient sound, a pass-through mode is turned on.
With reference to the second aspect, in certain implementations of the second aspect, the first object faces the user at a second time, and the first object is in a speaking state for a first duration; or at a second time, the first object is user-oriented and the user is first object-oriented.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: if the ratio of the first transfer function of the sound of the first object to the second transfer function of the sound of the first object is smaller than or equal to a first preset threshold value, determining that the first object faces the user, and the user faces the first object; if the ratio is greater than a first preset threshold and less than or equal to a second preset threshold, determining that the first object faces the user and the user does not face the first object; or if the ratio is greater than the second preset threshold, determining that the first object is not facing the user and the user is not facing the first object.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: responding to the first environmental sound, and acquiring the voice of the person in the first environmental sound; and under the condition that the voice of the first environment is matched with the voice of the first object, starting the transparent transmission mode.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: if a speaker corresponding to a voice in the first environmental sound is not facing the user, determining a compensation function based on the angle of the speaker relative to the user and a first corresponding relation, wherein the first corresponding relation comprises a plurality of angles and the compensation function corresponding to each angle in the plurality of angles, and the plurality of angles comprise the angle of the speaker relative to the user; compensating the voice in the first environmental sound based on the compensation function to obtain the compensated voice in the first environmental sound; and under the condition that the human voice in the compensated first environmental sound is matched with the sound of the first object, starting a transparent transmission mode.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: collecting a second environmental sound in an environment where a user is located at a third time, wherein a second object is in a speaking state in the environment at the third time, the first object is not in the speaking state in the environment at the third time, and the third time is after the second time; and responding to the second environmental sound, and if the user is in a speaking state at a third time, keeping the transparent transmission mode in an on state.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: and responding to the second environmental sound, and if the user is not in a speaking state at a third time, exiting the transparent transmission mode.
With reference to the second aspect, in certain implementations of the second aspect, the switching device further includes a receiving module. The receiving module is used for: receiving a first instruction of a terminal device at a first time, wherein the terminal device is connected with the switching device, and the first instruction is used for instructing the switching device to collect the sound of a first object; the acquisition module is also used for: based on the first instruction, collecting sound of a first object; the processing module is also used for: the sound of the first object is stored.
In a third aspect, the present application provides a switching device for transparent transmission mode, the switching device comprising: a processor and a memory; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to cause the extraction device to perform the method as described in the first aspect.
In a fourth aspect, the application provides a headset which may be used to perform the method as described in the first aspect.
In a fifth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as described in the first aspect.
In a sixth aspect, the application provides a computer program product comprising a computer program which, when run, causes a computer to perform the method as described in the first aspect.
In a seventh aspect, the present application provides a chip comprising a processor for invoking a computer program in memory to perform the method according to the first aspect.
It should be understood that the second to seventh aspects of the present application correspond to the technical solutions of the first aspect of the present application, and the advantages obtained by each aspect and the corresponding possible embodiments are similar, and are not repeated.
Drawings
FIG. 1 is a scene diagram for a TWS headset;
FIG. 2 is a scene graph to which the method of an embodiment of the application is applicable;
FIG. 3 is another scene graph to which the method of an embodiment of the application is applicable;
fig. 4 is a schematic structural diagram of an earphone according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a method for switching a transparent transmission mode according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a TWS earphone according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an interface for key object sound collection;
fig. 8 is a schematic diagram of calculating a binaural time difference according to an embodiment of the application;
FIG. 9 is a schematic diagram of a calibration compensation function provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of a different chat scenario provided by embodiments of the present application;
FIG. 11 is a schematic block diagram of a transparent mode switching device according to an embodiment of the present application;
Fig. 12 is a schematic block diagram of another transparent mode switching device according to an embodiment of the present application.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
With the popularity and development of terminal devices, users use real wireless (true wireless stereo, TWS) headphones more and more frequently, and wearing the TWS headphones for a long time has become a normal state. In order not to affect communication with other people when the user wears the TWS headset, the TWS headset currently has a transmission mode. The transmission mode means that the TWS earphone can filter environmental sounds and transmit human sounds to a user so that the user can hear human sounds in the environment and further can talk with other people by wearing the TWS earphone.
In general, when the TWS headset detects that the user speaks, the transmission mode is turned on, and if the TWS headset does not detect that the user speaks within a period of time, the transmission mode is exited.
By way of example, fig. 1 shows a scene diagram for which a TWS headset is suitable. As shown in fig. 1, a user 101 wears a TWS headset 102. In case the TWS earpiece 102 detects the sound of the user 101, the transmission-through mode may be switched on. If the TWS headset 102 does not detect the sound of the user 101 for a period of time, the pass through mode may be exited.
In this implementation manner, if the user is in a dialogue talking scene, when the user does not actively talk, the TWS earphone will not turn on the transparent transmission mode, so that the user cannot hear the voice of the other party, and the talking is affected. If the user actively speaks, the TWS earphone starts a transmission mode, and if the user does not actively speak for a long time, the TWS earphone is in a state of listening to the other party, the TWS earphone often exits the transmission mode, so that the user cannot hear the voice of the other party, further the conversation is affected, and the serious user experience is poor.
By way of example, fig. 2 shows a scene diagram for which another TWS headset is suitable. As shown in fig. 2, the user 101 wears a TWS headset 102, and a friend 103 of the user 101 is talking to the user 101, and the friend 103 of the user 101 speaks to the user 101: "taken together with dining and pressing". At this time, the TWS earphone does not detect the sound of the user 101, that is, the user 101 does not actively speak, and the TWS earphone does not turn on the transmission mode, so that the user 101 cannot hear the sound of the friend 103, and the conversation is affected.
By way of example, fig. 3 shows a scene diagram for which a further TWS headset is suitable. As shown in a of fig. 3, the user 101 wears a TWS headset 102, the user 101 is talking to his friends 103, the user 101 speaks to his friends 103: "how your yesterday played is? ". At this time, the TWS earphone detects the sound of the user 101, that is, the user 101 actively speaks, and the TWS earphone turns on the transmission mode, so that the user 101 can hear the sound of his friend 103 and further talk. Friend 103 of user 101 says to user 101: "we go to XXXXX first, then to XXXXX, and finally to XXXXX, with great ease of play. At the time of XXXXX, we have made many interesting things, XXXXXXXXXXXXXXXX. "the friend 103 of the user 101 complains for a longer time, the user 101 is in a listening state, the TWS headset exits the transparent transmission mode because the sound of the user 101 is not detected, so that the user 101 cannot hear the sound of the friend 103, further the conversation is affected, and the serious user experience is poor.
Therefore, the embodiment of the application provides a method and a device for switching a transparent transmission mode, which can judge whether human voice in the environment is the voice of a talking object, and if the voice of the talking object exists, can start the transparent transmission mode or keep the transparent transmission mode; if the voice in the environment is not the voice of the talking object and the user does not actively talk, the transparent mode is exited. By the implementation mode, the transparent transmission mode can be more reasonably started or exited in the dialogue conversation scene, and the user experience is improved.
The method provided by the embodiment of the application can be applied to any earphone with a transmission mode, and is not limited to the TWS earphone. The method provided by the embodiment of the application can be suitable for any scene where the user actively speaks and the user talks with other people, and is not limited to the scenes shown in the above-mentioned figures 1,2 and 3.
In order to better understand the embodiments of the present application, the following describes the structure of the earphone according to the embodiments of the present application. Fig. 4 is a schematic structural diagram of an earphone according to an embodiment of the present application.
The headset may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headset interface 170D, a sensor module 180, keys 190, indicators 192, a camera 193, a display 194, and the like.
It will be appreciated that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the headset. In other embodiments of the application, the headset may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components may be provided. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units. Wherein the different processing units may be separate devices or may be integrated in one or more processors. A memory may also be provided in the processor 110 for storing instructions and data. The processor 110 may be configured to perform the methods provided by embodiments of the present application.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the headset. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the headset, or may be used to transfer data between the headset and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other headphones, such as AR devices, etc.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used for connecting the charge management module 140 and the processor 110.
The wireless communication function of the headset may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Antennas in headphones may be used to cover single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to headphones. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wirelesslocal area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), etc. for application on headphones.
Headphones may implement audio functionality via the audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, and application processor, etc. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. Headphones may listen to music, or to hands-free conversations, through speaker 170A. A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the headset picks up a phone call or voice message, the voice can be picked up by placing the receiver 170B close to the human ear. The earphone interface 170D is used to connect a wired earphone.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. In an embodiment of the present application, the earphone may receive the sound signal based on the microphone 170C and convert the sound signal into an electrical signal that may be processed later, and the earphone may have at least one microphone 170C.
The sensor module 180 may include one or more of the following sensors, for example: a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, or a bone conduction sensor, etc. (not shown in fig. 4).
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The headset may receive key inputs, generating key signal inputs related to user settings and function control of the headset. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
The camera 193 is used to capture still images or video. In some embodiments, the headset may include 1 or N cameras 193, N being a positive integer greater than 1.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. In some embodiments, the headset may include 1 or N display screens 194, N being a positive integer greater than 1.
The software system of the earphone may adopt a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, a cloud architecture, or the like, which will not be described herein.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be implemented independently or combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Before describing the method provided by the embodiment of the present application, the embodiment of the present application is described below.
First, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish between the same item or similar items that have substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
Second, in embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Third, in embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b, c may be single or plural.
Fig. 5 is a schematic flowchart of a method 500 for switching a transparent transmission mode according to an embodiment of the present application. The method 500 may be applied to the scenario illustrated in fig. 2 and 3 described above, but embodiments of the present application are not limited thereto. The method 500 may be performed by a headset, such as the TWS headset described above.
As shown in fig. 5, the method 500 may include the steps of:
S501, collecting the sound of the environment where the user is located.
The earphone may collect sound of the environment in which the user is located through a Microphone (MIC). The MIC may be the microphone 170C shown in fig. 4, but the embodiment of the application is not limited thereto. The user wears the earphone, and the sound of the environment where the user is located can be understood as the sound of the environment where the earphone is located.
If the earphone includes a feedforward microphone (feed forward microphone, FF MIC) and a feedback microphone (feed back microphone, FB MIC), the earphone may collect sound of the environment where the user is located through the FF MIC or the FB MIC, which is not limited by the embodiment of the present application.
Illustratively, the headphones are TWS headphones that can collect sound of the environment in which the user is located through the FF MIC. Fig. 6 shows a schematic structural diagram of a TWS headset. As shown in fig. 6, the TWS headset includes FF MIC and FB MIC. When the TWS headset is worn by a user, the FB MIC of the TWS headset is proximate to the user's ear, and the FF MIC of the TWS headset is relatively far from the user's ear and relatively close to the environment in which the user is located. The TWS earphone can collect the sound of the environment where the user is located through the FF MIC, and the collected sound of the environment where the user is located is relatively accurate.
S502, judging whether human voice (or simply human voice) exists in the voice of the environment where the user is located.
The earphone can detect the voice of the environment where the user is located, and judge whether the voice of the user exists in the voice of the environment where the user is located. The specific implementation mode of the earphone for detecting the voice is not limited.
Illustratively, the headphones may perform voice detection of the sound of the environment in which the user is located by a voice activity detection (voice activity detection, VAD) algorithm. For example, in the example shown in fig. 6, the earphone is a TWS earphone, and after the TWS earphone collects the sound of the environment where the user is located through the FF MIC, the voice detection can be performed on the sound of the environment where the user is located through the VAD algorithm.
If the sound of the user exists in the environment, the earphone may perform the subsequent processing on the sound of the user, that is, S503 may be executed. If no human voice exists in the voice of the environment of the user, the headset may not process the collected voice of the environment of the user, i.e., exit the method 500.
Optionally, if no human voice exists in the voice of the environment where the user is located, the earphone may also continuously collect the voice of the environment where the user is located, and continuously detect whether the voice of the human voice exists in the environment according to the newly collected information, that is, execute S501 and S502 described above.
S503, if there is a sound of a person in the sound of the environment where the user is located, determining whether the sound of the person is a sound of a key object.
The accent object may also be referred to as an accent attention object or a conversation object. When the key object speaks or the user talks with the key object, the user expects the earphone to turn on the transmission mode. The earphone can also be understood as judging whether the sound of the person is the sound of the key object: the earphone judges whether a speaker in the environment where the user is located is a key object.
The earphone judges whether the sound of the person is the sound of the key object, and various implementations exist,
In one possible implementation manner, the earphone stores the sound of the key object, and it can be directly determined whether the sound of the person is the sound of the key object.
For example, the earphone may be as shown in fig. 4 described above, and the internal memory 121 of the earphone may store sounds of a key object. If the sound of the user exists in the environment, the earphone may read the sound of the key object from the internal memory 121 and determine whether the sound of the user is the sound of the key object.
According to the implementation mode, the earphone stores the sound of the key object, the sound of the key object is not required to be acquired from the outside, signaling overhead is saved, and recognition efficiency can be improved.
In another possible implementation, the headset may obtain the sound of the accent object from a device paired with the headset, such as a cell phone, to determine whether the person's sound is the sound of the accent object. The device paired with the headset may be referred to as a terminal device, and may be a device that performs wireless or wired communication with the headset.
Illustratively, the device paired with the headset is a cell phone. If the voice of the person exists in the voice of the environment where the user is located, the earphone can send an instruction for acquiring the voice of the key object to the mobile phone. And the mobile phone receives the instruction and sends the sound of the key object to the earphone based on the instruction. After the earphone receives the sound of the key object, whether the sound of the person is the sound of the key object is judged.
According to the implementation mode, the earphone does not need to store the sound of the key object, and the memory space of the earphone can be saved.
If the sound of the person is the sound of the key object, the earphone may continue to perform the subsequent judgment on the sound of the person, that is, execute S504. If the person's voice is not the voice of the emphasized object, the headset may not continue to make subsequent determinations about the person's voice, i.e., exit the method 500.
Optionally, if the sound of the person is not the sound of the key object, the earphone may also continue to collect the sound of the environment where the user is located, and continue to detect whether the sound of the person exists in the environment according to the newly collected information, that is, execute S501 and S502 described above.
S504, judging whether the voice of the person faces the user.
The headset may determine whether the sound of the person, i.e. the sound of the accent object, is directed towards the user, i.e. whether the person in the environment, i.e. the accent object, or the speaker is speaking to the user.
Illustratively, the headphones are TWS headphones, and the headphones may calculate a binaural amplitude difference (ILD) and a binaural time difference (ITD) based on the sound signals of the person received by the left and right headphones LEVEL DIFFERENCE, ILD and determine the position of the person's sound relative to the user based on the ILD and ITD to determine whether the person's sound is directed to the user.
If the sound of the person faces the user, the earphone can start a transmission mode; if the person's voice is not facing the user, the headset may turn on exit method 500.
Optionally, if the sound of the person is not directed to the user, the earphone may also continue to collect the sound of the environment where the user is located, and continue to detect whether the sound of the person exists in the environment according to the newly collected information, that is, execute S501 and S502 described above.
S505, if the voice of the person is directed to the user, the transmission mode is turned on.
If the sound of the person exists in the environment and the sound of the person is the sound of the key object, and meanwhile, the sound of the person faces the user, the earphone starts the transparent transmission mode. The voice of the person may also be the voice of the speaker, which is not limited in the embodiment of the present application. The speaker in the environment is a key object, and faces the user when speaking, and at this time, whether the user faces the speaker or not is not limited by the embodiment of the present application.
The earphone can start the transparent transmission mode under the condition that a speaker in the environment is a key object and faces to a user when speaking and the user faces to the speaker. The earphone may also turn on the transparent mode when a speaker in the environment is a key object and faces the user when speaking, but the user is not facing the speaker, for example, is facing the side speaker, but the speaking time of the speaker facing the user exceeds a preset time.
In the implementation mode, in the scene that the key object and the user talk, even if the user does not talk, the earphone still starts the transparent transmission mode, so that the user can hear the sound of the key object, the user can talk with the key object conveniently, and the use experience of the user is improved. In addition, in the scene of talking between the key object and the user, even if the user does not face the key object, the earphone still starts the transparent transmission mode when in a listening state, so that the user experience is further improved.
Optionally, before the transparent mode is turned on, the earphone may also determine whether the transparent mode is turned on. If the transmission mode is started, the earphone keeps the transmission mode in a starting state; if the transmission mode is not started, the earphone starts the transmission mode.
In the implementation manner, before the transparent transmission mode is started, whether the transparent transmission mode is started or not is judged, so that repeated starting of the transparent transmission mode can be avoided.
S506, judging whether the user is in a self-talk state.
The headset may determine whether the user is in a speaking state, i.e., a self-talk state.
For example, the headset may collect a signal of the bone conduction sensor, and determine whether the user is in a self-talk state based on the signal of the bone conduction sensor. The bone conduction sensor may be the bone conduction sensor in the sensor module 180 shown in fig. 4 described above.
The embodiment of the application is not limited to the sequence of S501 and S506, and the earphone may execute S501 and S506 simultaneously, may execute S501 before S506, and may execute S506 before S501.
If the user is in the self-talk state, the earphone may turn on the through transmission mode, that is, execute S505. If the user is in a self-talk state and the earphone is started in the transmission mode, the earphone keeps the transmission mode in the on state. If the user is not in the self-talk state, the earphone can judge whether the sound of the person in the environment is the sound of the key object, and if the sound of the person is the sound of the key object and faces the user, the earphone can start the transparent transmission mode. If the user is not in the self-talk state and the sound of the person in the environment is the sound of the key object, judging whether the transparent transmission mode is started. If the transparent mode is not on, the earphone exits the method 500 or continues to collect the sound of the environment where the user is located, and continues to detect whether the sound of the person exists in the environment according to the newly collected information, that is, executing S501 and S502 described above. If the pass-through mode is on, the headset may exit the pass-through mode.
According to the transmission mode switching method provided by the embodiment of the application, when the user does not actively talk, but in the scene that the key object in the environment where the user is located is talking to the user, the earphone can start the transmission mode or keep to start the transmission mode, so that the user can talk with the key object, the situation that the earphone exits the transmission mode in a listening state is avoided, and the use experience of the user is improved.
As an optional embodiment, in S503, if the sound of the user exists in the sound of the environment, determining whether the sound of the user is the sound of the key object may include: if the voice of the person exists in the voice of the environment where the user is located, the voiceprint characteristics of the voice of the person are extracted, and the voiceprint characteristics of the voice of the person are matched with the voiceprint characteristics of the voice of the key object, so that whether the voice of the person is the voice of the key object is judged.
The sound of the user in the environment includes the sound of one person or the sound of a plurality of persons in the sound of the environment of the user, which is not limited by the embodiment of the present application.
Voiceprint features can include features such as a spectrogram (Spectrogram), a Pitch trace (Pitch contour), and a long-time average spectrum (LTAS).
Illustratively, the voiceprint feature of the person's voice may be represented by a time-frequency feature matrix, which may be represented by the symbol pred_ft i (f, t), where i is used to represent the voice of the ith person present in the voice of the environment in which the user is located. If the earphone is a TWS earphone, the TWS earphone comprises a left earphone and a right earphone. The TWS headset may extract the voiceprint feature pred_left_FT i (f, t) based on the sound of the left headset collector, or may extract the voiceprint feature pred_right_FT i (f, t) based on the sound of the right headset collector.
Wherein, pred_left_ft i (f, t) may be specifically expressed as:
Where i is the voice of the ith person existing in the voice of the environment where the user is located, M is the number of features, N is the number of frequency points, and P is the total number of frames.
Pred_right_ft i (f, t) may be expressed in detail as:
The sound of the key object can be one or a plurality of sounds. The specific number of sounds of the key objects may be determined based on the user's needs, which is not limited by the embodiment of the present application. The earphone may store the voiceprint feature of the sound of the key object, or may obtain the voiceprint feature of the sound of the key object from the device paired with the earphone, which is not limited by the embodiment of the present application. The voiceprint features of the sound of the key object are extracted from the sound of the key object by the earphone, and can be stored in a memory of the earphone or can be sent to a device matched with the earphone for storage. Before the earphone extracts the voiceprint feature from the sound of the key object, the sound of the key object can be collected first.
Illustratively, the headset is a TWS headset, which may collect sounds of a key object while facing the key object based on instructions of a device paired with the headset, such as a mobile phone. Fig. 7 shows an interface schematic of a key object sound collection. As shown in the interface a in fig. 7, the interface displayed by the mobile phone is an interface for collecting sound of a key object, and the user can select a corresponding number of people, for example 8, according to the requirement in an input box corresponding to the number of the requesting speakers. The mobile phone can determine that the number of people of the key objects is 8 based on the operation of the user, and displays 8 identifications (A, B, C, D, E, F, G and H) in the target person selection area so as to respectively correspond to the 8 key objects. Each of the 8 identifiers corresponds to a selection box so as to select the key object corresponding to the identifier. The interface also shows notes: data is collected when the mobile phone faces the target speaker, so that the mobile phone is required to face the key object when the user is prompted to collect the sound of the key object. The interface also displays an acquisition progress bar and a start control, when the mobile phone detects that a user triggers the operation of the start control, the mobile phone can respond to the operation to send an instruction for acquiring sound to the TWS earphone, record the acquisition time length, calculate the acquisition progress according to the acquisition time length and the preset total acquisition time length, and display the corresponding progress on the acquisition progress bar according to the acquisition progress.
As shown in an interface a in fig. 7, the mobile phone detects that the user selects H to represent an operation of a corresponding selection frame, and in response to the operation, it can determine that the identifier corresponding to the collected key object is H. As shown in the interface b in fig. 7, the mobile phone detects an operation of the user trigger start control, and in response to the operation, sends an instruction to collect sound to the TWS headset. The TWS earphone receives the instruction for collecting the sound and collects the sound through the FF MIC based on the instruction. The mobile phone detects the operation of triggering the start control by the user, responds to the operation, can record the acquisition time length, calculates the acquisition progress according to the acquisition time length and the preset total acquisition time length, and displays the corresponding progress on the acquisition progress bar according to the acquisition progress. As shown in the interface c of fig. 7, if the acquisition progress is 70%, the mobile phone may display 70% on the acquisition progress bar. The mobile phone detects that the user triggers the operation of the start control, and in response to the operation, the start control can be replaced by the stop control so that the user stops collecting the sound. As shown in the d interface in fig. 7, if the mobile phone detects that the acquisition progress is 100%, 100% may be displayed on the acquisition progress bar, and the stop control is replaced by a completed text control to indicate that the acquisition is completed.
After the earphone collects the sound of the key object based on the method shown in fig. 7, the voiceprint feature of the key object can be extracted from the sound of the key object, so as to obtain the voiceprint feature of the key object. Voiceprint features of the key objects may be represented by the symbol FT i (f, t), where i is used for the i-th key object in the key objects. If the earphone is a TWS earphone, the TWS earphone comprises a left earphone and a right earphone. The TWS headset may be based on the voice-print feature left_FT i (f, t) of the sound extraction of the key object collected by the left headset, or may be based on the voice-print feature right_FT i (f, t) of the sound extraction of the key object collected by the right headset.
Wherein, left_ft i (f, t) can be specifically expressed as:
Wherein i is used for the ith key object in the key objects, M is used for representing the number of features, N is used for representing the number of frequency points, and P is used for representing the total frame number.
Right_ft i (f, t) can be expressed in particular as:
The earphone can match the voiceprint characteristics of the sound of the person with the voiceprint characteristics of the sound of the key object to judge whether the sound of the person is the sound of the key object. If the voice of the person exists in the voice of the environment where the user is located, the voice of the key object is multiple, the earphone can match the voice of the person with the voiceprint characteristics of the voice of each key object, and a matching result is obtained.
For example, if the voiceprint features of the human voice are expressed as the above pred_left_ft i (f, t) and pred_right_ft i (f, t), and the voice-extracted voiceprint features of the key object are expressed as the above left_ft i (f, t) and left_ft i (f, t), the headphones may calculate the mean square error (msemse, MSE) of the voiceprint features of the human voice and the voice-extracted voiceprint features of the key object, and determine whether the human voice is the sound of the key object based on the mean square error.
For example, the mean square error of the voiceprint features of the human voice and the voice extracted voiceprint features of the accented object may be represented by the symbols mse_left_ft i (f, t) and mse_right_ft i (f, t).
Wherein mse_left_ft i (f, t) may be specifically expressed as:
Wherein the method comprises the steps of ,MSE_left_ft11=(plft1(f1(1~N),t11)-lft1(f1(1~N),t11))2;
MSE_left_ft1p=(plft1(f1(1~N),t1p)-lft1(f1(1~N),t1p))2;
MSE_left_ftM1=(plftM(fM(1~N),tM1)-lftM(fM(1~N),tM1))2;
MSE_left_ftMP=(plftM(fM(1~N),tMP)-lftM(fM(1~N),tMP))2.
Mse_right_ft i (f, t) may be expressed in particular as:
Wherein the method comprises the steps of ,MSE_right_ft11=(prft1(f1(1~N),t11)-rft1(f1(1~N),t11))2;
MSE_right_ft1p=(prft1(f1(1~N),t1p)-rft1(f1(1~N),t1p))2;
MSE_right_ftM1=(prftM(fM(1~N),tM1)-rftM(fM(1~N),tM1))2;
MSE_right_ftMP=(prftM(fM(1~N),tMP)-rftM(fM(1~N),tMP))2.
If both mse_left_ft i (f, t) and mse_right_ft i (f, t) are smaller than the preset threshold λ, the earphone may determine that the sound of the person is the sound of the key object. Conversely, if at least one of the mse_left_ft i (f, t) and the mse_right_ft i (f, t) is greater than or equal to the preset threshold λ, the headphones may determine that the human voice is not the voice of the accent object. The preset threshold lambda is preset in the earphone after a developer is calibrated through a large number of experiments.
According to the method provided by the embodiment of the application, whether the voice of the person existing in the voice of the environment where the user is located is the voice of the key object is judged based on the voiceprint characteristics of the voice, so that the judgment result can be accurately obtained.
As an optional embodiment, the earphone may extract a voiceprint feature of a sound of the person, and may include: if the sound of the user exists in the environment, the earphone can calculate the angle of the sound of the user relative to the earphone; if the angle is not 0 ° (i.e. not towards the earphone), the earphone compensates the sound spectrum of the human voice, and the compensated sound spectrum of the human voice is obtained; and extracting voiceprint features from the sound spectrum of the compensated human voice.
If the headphones are TWS headphones, the angle of the person's voice relative to the headphones can be represented by a binaural amplitude difference (ILD) LEVEL DIFFERENCE and a binaural time difference (ITD) TIME DIFFERENCE.
Wherein, the calculation formula of the ITD can be expressed as:
Where a is a constant, for example, a may be 0.0875m, c may be the sound velocity, and the incidence angle immediately in front is θs. It will be appreciated that ITD may also be 0 when the angle of incidence θs immediately ahead is 0 °.
Fig. 8 is a schematic diagram of calculating a binaural time difference according to an embodiment of the present application. As shown in fig. 8, when the sound of a person (i.e., a sound source) is in the left front, the angle between the direction in which the sound of a person is located and the right front is θs, and when ITD (θs) =τ, the left and right channel signals after adding clues are:
pL(t)=[1+m cos(2πfmt)]cos(2πfct)
pR(t)=[1+m cos[2πfm(t-τ)]]cos(2πfct)
Wherein f m is the modulation frequency, f c is the signal frequency, and m is the modulation index.
The ILD calculation mode is relatively simple, and the earphone can directly superimpose amplitude difference information in the spatial cue library on signals of the left channel and the right channel to obtain the ILD.
For example, if ITD (θs) =x dB, pL (t) -pR (t) =x dB.
The earphone may determine an angle of the sound of the person relative to the earphone based on the ILD and the ITD, and if the angle is not 0 °, the earphone compensates a sound spectrum of the sound of the person to obtain a sound spectrum of the compensated sound of the person, that is, the earphone may determine an azimuth of the sound of the person relative to the earphone based on the ILD and the ITD, and if the sound of the person is not directed to the earphone, the earphone compensates the sound spectrum of the sound of the person to obtain a sound spectrum of the compensated sound of the person.
The headphones may determine a compensation function based on the position of the person's sound relative to the headphones and compensate the sound spectrum of the person's sound based on the compensation function, resulting in a compensated sound spectrum of the person's sound. The headphones may be preset with compensation functions of different orientations, and the required compensation function may be determined from the preset compensation functions based on the orientation of the person's voice relative to the headphones. The preset compensation function is determined by a developer through experiments and is preset in the earphone.
Illustratively, FIG. 9 shows a schematic diagram of a calibration compensation function. As shown in fig. 9, the experimenter wears the TWS earphone, and the developer can play the test signal s (t) at different angles (e.g., 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 °, 210 °, 240 °, 270 °, 300 °, and 330 °) respectively at a radius of 1 meter (m) from the experimenter. The TWS headset may receive signals at different angles, respectively. The TWS earphone comprises a left earphone and a right earphone, the signal received by the left earphone can be l j (t), the signal received by the right earphone can be r j (t), wherein j is used for indicating that the test signal s (t) is positioned at an angle of the TWS earphone, and j is 0 degree, 30 degree, 60 degree, 90 degree, 120 degree, 150 degree, 180 degree, 210 degree, 240 degree, 270 degree, 300 degree or 330 degree. When j is 0 deg., the signal received by the TWS earphone is s (t).
The TWS headset may perform a Laplace transform on L j (t) and R j (t) to obtain L j (S) and R j (S), and perform a Laplace transform on S (t) to obtain S (S). The TWS earphone can determine the compensation function of the left ear and the right ear under different angles based on the L j(s)、Rj (S) and the S (S), and can be specifically expressed as:
TWS headphones can preserve the compensation function of the left and right ears at different angles. If the sound of the user exists in the sound of the environment where the user is located, and the angle of the sound of the user relative to the TWS earphone is 60 degrees, the TWS earphone can compensate transfer functions (L '(s) and R' (s)) of the sound of the user received by the left earphone and the right earphone based on the compensation function, and a compensated transfer function is obtained And/>). Wherein/>And/>Specifically, the method can be expressed as:
the headphones may determine the sound spectrum of the compensated human sound based on the compensated transfer function and extract voiceprint features from the sound spectrum of the compensated human sound.
According to the method provided by the embodiment of the application, under the condition that the sound of the person in the environment where the earphone is located is not directed towards the earphone, the sound spectrum of the sound of the person is compensated, and then the voiceprint characteristics are extracted from the compensated sound spectrum, so that the accuracy of the voiceprint characteristics is improved.
Optionally, when the earphone collects the sound of the key object, the angle of the key object relative to the user (or called earphone) can be calculated, and if the angle is 0 °, the calculation is based on the method; if the angle is not 0 degrees, the earphone can record the angle of the sound of the key object. If the sound of the user exists in the sound of the environment, the angle of the sound of the user relative to the earphone is different from the angle of the sound of the key object relative to the earphone, the earphone compensates the sound spectrum of the sound of the user, so as to obtain the sound spectrum of the compensated sound of the user, and the angle of the sound of the compensated user relative to the earphone is the same as the angle of the sound of the key object relative to the earphone.
In this implementation, the headset is preset with compensation functions that compensate for different angles. Before the earphone leaves the factory, the law examination personnel can preset the compensation functions which are compensated to different angles in the earphone.
As an optional embodiment, the step S504 of determining whether the sound of the person is directed to the user may include: the headset determines whether the person's voice is directed to the user (i.e., whether the speaker is speaking toward the user) based on the compensated spectrum, and/or the headset determines whether the user is directed to the person's voice (i.e., whether the user is directed to the speaker) based on the compensated spectrum. If the speaker speaks towards the user and the user speaks towards the speaker, the headset can turn on the pass-through mode; if the speaker is speaking toward the user and the user is not speaking toward the speaker, but the speaker is speaking toward the user for a period of time exceeding a preset period of time (e.g., 10 seconds or 20 seconds, etc.), the headset may turn on the pass-through mode.
Illustratively, in the above example, the compensated transfer function isAnd/>The earphone may be based onAnd/>It is determined whether the speaker is speaking toward the user and whether the user is speaking toward the speaker. The method specifically comprises the following steps: earphone calculation/>And/>Ratio of/>Or/>If the ratio sigma is less than or equal to lambda 1, the earphone can determine that the speaker speaks towards the user and the user speaks towards the speaker; if λ 1<σ≤λ2, the headset may determine that the speaker is speaking towards the user, but the user is not speaking towards the speaker; if σ > λ 2, the headset may determine that the speaker is not speaking towards the user and that the user is not speaking towards the speaker. The preset thresholds lambda 1 and lambda 2 are preset in the earphone after a developer is calibrated through a large number of experiments.
Fig. 10 shows a schematic diagram of a different conversation scenario. As shown in a in fig. 10, the user 101 wears the TWS headset 102, and the friend 103 of the user 101 talks to the user 101, and the friend 103 of the user 101 speaks to the user 101: "taken together with dining and pressing". At this time, the TWS headset detects the sound of the friend 103, if σ is less than or equal to λ 1, the headset may determine that the friend 103 is speaking toward the user 101, and the user 101 is toward the friend 103, and the headset may turn on the pass through mode.
As shown in b in fig. 10, the user 101 wears the TWS headset 102, and the friend 103 of the user 101 talks toward the user 101, and the friend 103 of the user 101 speaks to the user 101: "taken together with dining and pressing". At this point, the TWS headset detects the sound of friend 103 and if λ 1<σ≤λ2, the headset can determine that friend 103 is speaking towards user 101, but user 101 is not towards friend 103. If the earphone detects that the duration of the friend 103 exceeds the preset duration, the transparent mode can be started.
As shown in c in fig. 10, the user 101 wears a TWS headset 102, and the friend 103 of the user 101 is talking to other people 104, and the friend 103 of the user 101 is speaking to other people 104: "taken together with dining and pressing". At this time, the TWS headset detects the sound of the friend 103, and if σ > λ 2, the headset may determine that the friend 103 is not speaking toward the user 101, and that the user 101 is not speaking toward the friend 103, and the headset may not turn on the passthrough mode.
According to the method provided by the embodiment of the application, the terminal equipment determines whether the voice of the person faces the user and/or whether the user faces the voice of the person or not based on the compensated voice spectrum, and determines whether the transparent transmission mode is started or not based on the judgment result.
The sequence numbers of the processes in the above embodiments do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic, and should not limit the implementation process of the embodiments of the present application.
The method provided by the embodiment of the present application is described in detail above with reference to fig. 1 to 10, and the apparatus provided by the embodiment of the present application will be described in detail below with reference to fig. 11 and 12.
Fig. 11 is a schematic block diagram of a transparent mode switching device 1100 according to an embodiment of the present application. As shown in fig. 11, the switching device 1100 includes: an acquisition module 1110 and a processing module 1120. Wherein, acquisition module 1110 is used for: collecting sound of a first object at a first time; the processing module 1120 is configured to: storing sound of the first object at a first time; the acquisition module 1110 is also configured to: collecting a first environmental sound in an environment where a user wearing the switching device is located at a second time, wherein the first object is in a speaking state in the environment at the second time, and the second time is after the first time; the processing module 1120 is further configured to: in response to the first ambient sound, a pass-through mode is turned on.
Optionally, the first object faces the user at the second time, and the first object is in a speaking state for a first duration; or at a second time, the first object is user-oriented and the user is first object-oriented.
Optionally, the processing module 1120 is further configured to: if the ratio of the first transfer function of the sound of the first object to the second transfer function of the sound of the first object is smaller than or equal to a first preset threshold value, determining that the first object faces the user, and the user faces the first object; if the ratio is greater than a first preset threshold and less than or equal to a second preset threshold, determining that the first object faces the user and the user does not face the first object; or if the ratio is greater than the second preset threshold, determining that the first object is not facing the user and the user is not facing the first object.
Optionally, the processing module 1120 is further configured to: responding to the first environmental sound, and acquiring the voice of the person in the first environmental sound; and under the condition that the voice of the first environment is matched with the voice of the first object, starting the transparent transmission mode.
Optionally, the processing module 1120 is further configured to: if a speaker corresponding to a voice in the first environmental sound is not facing the user, determining a compensation function based on the angle of the speaker relative to the user and a first corresponding relation, wherein the first corresponding relation comprises a plurality of angles and the compensation function corresponding to each angle in the plurality of angles, and the plurality of angles comprise the angle of the speaker relative to the user; compensating the voice in the first environmental sound based on the compensation function to obtain the compensated voice in the first environmental sound; and under the condition that the human voice in the compensated first environmental sound is matched with the sound of the first object, starting a transparent transmission mode.
Optionally, the processing module 1120 is further configured to: collecting a second environmental sound in an environment where a user is located at a third time, wherein a second object is in a speaking state in the environment at the third time, the first object is not in the speaking state in the environment at the third time, and the third time is after the second time; and responding to the second environmental sound, and if the user is in a speaking state at a third time, keeping the transparent transmission mode in an on state.
Optionally, the processing module 1120 is further configured to: and responding to the second environmental sound, and if the user is not in a speaking state at a third time, exiting the transparent transmission mode.
Optionally, the switching device further includes a receiving module. The receiving module is used for: receiving a first instruction of a terminal device at a first time, wherein the terminal device is connected with the switching device, and the first instruction is used for instructing the switching device to collect the sound of a first object; the acquisition module 1110 is also configured to: based on the first instruction, collecting sound of a first object; the processing module 1120 is further configured to: the sound of the first object is stored.
It should be appreciated that the switching device 1100 herein is embodied in the form of a functional module. The term module herein may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor, etc.) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an alternative example, it will be understood by those skilled in the art that the switching device 1100 may be specifically an earphone in the foregoing method embodiment, or the functions of the earphone in the foregoing method embodiment may be integrated in the switching device 1100, and the switching device 1100 may be used to execute each flow and/or step corresponding to the earphone in the foregoing method embodiment, which is not repeated herein.
The switching device 1100 has a function of implementing the corresponding steps executed by the ear camera in the method embodiment described above; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In an embodiment of the present application, the switching device 1100 in fig. 11 may also be a chip or a chip system, for example: system on chip (SoC).
Fig. 12 is a schematic block diagram of another transparent mode switching apparatus 1200 according to an embodiment of the present application. As shown in fig. 12, the switching device 1200 includes: processor 1210, transceiver 1220 and memory 1230. Wherein the processor 1210, the transceiver 1220 and the memory 1230 are in communication with each other through an internal connection path, the memory 1230 is used for storing instructions, and the processor 1210 is used for executing the instructions stored in the memory 1230 to control the transceiver 1220 to transmit signals and/or receive signals.
It should be understood that the switching device 1200 may be specifically an earphone in the above-described method embodiment, or the functions of the earphone in the above-described method embodiment may be integrated in the switching device 1200, and the switching device 1200 may be used to perform the steps and/or processes corresponding to the earphone in the above-described method embodiment. The memory 1230 may optionally include read-only memory and random access memory and provide instructions and data to the processor 1210. A portion of memory 1230 may also include non-volatile random access memory. For example, the memory 1230 may also store information of device type. The processor 1210 may be configured to execute instructions stored in the memory 1230, and when the processor 1210 executes the instructions, the processor 1210 may perform the steps and/or processes corresponding to the headphones in the above-described method embodiments.
It is to be appreciated that in embodiments of the application, the processor 1210 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor executes instructions in the memory to perform the steps of the method described above in conjunction with its hardware. To avoid repetition, a detailed description is not provided herein.
The application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for realizing the method corresponding to the ear phone in the embodiment of the method.
The application also provides a chip system for supporting the tympanic bulla in the embodiment of the method to realize the functions shown in the embodiment of the application.
The present application also provides a computer program product comprising a computer program (which may also be referred to as code, or instructions) which, when run on a computer, is adapted to perform the method corresponding to the headset shown in the method embodiments described above.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a specific implementation of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiments of the present application, and all changes and substitutions are included in the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The transparent transmission mode switching method is characterized by being applied to a headset and comprising the following steps of:
The earphone collects first environmental sounds in an environment where a user wearing the earphone is located at a second time, and a first object is in a speaking state in the environment at the second time;
the earphone responds to the first environmental sound to acquire human voice in the first environmental sound;
Under the condition that the voice of the first environment sound is matched with the voice of the first object, determining that the first object faces the user at the second time, and the first object is in a speaking state within a first duration, and starting a transmission mode by the earphone;
The earphone collects second environmental sounds in an environment where the user is located at a third time, the first object is in an un-speaking state in the environment at the third time, and the user is in an un-speaking state at the third time; the earphone responds to the second environmental sound and exits the transparent transmission mode, and the third time is after the second time;
the earphone collects third environmental sounds in the environment where the user is located at a fourth time, the first object is in a speaking state in the environment at the fourth time, and the fourth time is located after the third time;
the earphone responds to the third environmental sound to acquire the voice of the person in the third environmental sound;
And under the condition that the human voice in the third environmental sound is matched with the sound of the first object, determining that the first object faces the user at the fourth time, and that the user faces the first object, and starting a transparent transmission mode by the earphone.
2. The method of claim 1, wherein the headphones comprise a left headphone that obtains a first transfer function of the sound of the first object and a right headphone that obtains a second transfer function of the sound of the first object;
The method further comprises the steps of:
if the ratio of the first transfer function to the second transfer function is smaller than or equal to a first preset threshold, the earphone determines that the first object faces the user, and the user faces the first object;
If the ratio is greater than the first preset threshold and less than or equal to a second preset threshold, the earphone determines that the first object faces the user and the user does not face the first object; or alternatively
If the ratio is greater than the second preset threshold, the earphone determines that the first object is not facing the user and the user is not facing the first object.
3. The method of claim 1, wherein the earphone turns on a pass-through mode if the human voice in the first ambient sound matches the sound of the first object, comprising:
If a speaker corresponding to a voice in the first environmental sound is not facing the user, the earphone determines a compensation function based on an angle of the speaker relative to the user and a first corresponding relation, wherein the first corresponding relation comprises a plurality of angles and a compensation function corresponding to each angle in the plurality of angles, and the plurality of angles comprise angles of the speaker relative to the user;
The earphone compensates the voice in the first environmental sound based on the compensation function to obtain the compensated voice in the first environmental sound;
and under the condition that the human voice in the compensated first environmental sound is matched with the sound of the first object, the earphone starts a transparent transmission mode.
4. The method according to claim 1, wherein the method further comprises:
The earphone collects second environmental sounds in the environment where the user is located at the third time, the first object is not in a speaking state in the environment at the third time, and the user is in a speaking state at the third time;
In response to the second ambient sound, the earpiece keeps the pass-through mode in an on state.
5. The method of any of claims 1-4, wherein before the headset captures a first environmental sound within an environment in which a user wearing the headset is located at a second time, the method further comprises:
The earphone collects and stores sound of the first object at a first time, and the second time is located after the first time.
6. The method of claim 5, wherein the headset collects and stores sound of the first object at a first time, comprising:
The earphone receives a first instruction of a terminal device at the first time, wherein the terminal device is connected with the earphone, and the first instruction is used for indicating the earphone to collect the sound of the first object;
The earphone collects the sound of the first object based on the first instruction and stores the sound of the first object.
7. A transmission-through mode switching device, comprising: a processor and a memory;
the memory stores computer-executable instructions;
The processor executing computer-executable instructions stored in the memory to cause the switching device to perform the method of any one of claims 1 to 6.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.
CN202310931487.4A 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device Pending CN118057837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310931487.4A CN118057837A (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310931487.4A CN118057837A (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device
CN202211452699.6A CN115835079B (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202211452699.6A Division CN115835079B (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device

Publications (1)

Publication Number Publication Date
CN118057837A true CN118057837A (en) 2024-05-21

Family

ID=85529484

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310931487.4A Pending CN118057837A (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device
CN202211452699.6A Active CN115835079B (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211452699.6A Active CN115835079B (en) 2022-11-21 2022-11-21 Transparent transmission mode switching method and switching device

Country Status (1)

Country Link
CN (2) CN118057837A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117641172A (en) * 2022-08-09 2024-03-01 北京小米移动软件有限公司 Earphone control method and device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010046304A1 (en) * 2000-04-24 2001-11-29 Rast Rodger H. System and method for selective control of acoustic isolation in headsets
US11863959B2 (en) * 2019-04-08 2024-01-02 Harman International Industries, Incorporated Personalized three-dimensional audio
CN113873378B (en) * 2020-06-30 2023-03-10 华为技术有限公司 Earphone noise processing method and device and earphone
CN114449393B (en) * 2020-10-31 2023-10-13 华为技术有限公司 Sound enhancement method, earphone control method, device and earphone
CN112770214B (en) * 2021-01-28 2022-11-11 歌尔科技有限公司 Earphone control method and device and earphone
CN113938785A (en) * 2021-11-24 2022-01-14 英华达(上海)科技有限公司 Noise reduction processing method, device, equipment, earphone and storage medium
CN114727212B (en) * 2022-03-10 2022-10-25 北京荣耀终端有限公司 Audio processing method and electronic equipment
CN115209298A (en) * 2022-07-12 2022-10-18 Oppo广东移动通信有限公司 Audio playing method, device, earphone and storage medium

Also Published As

Publication number Publication date
CN115835079B (en) 2023-08-08
CN115835079A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN108538320B (en) Recording control method and device, readable storage medium and terminal
CN112243220A (en) Method for establishing communication connection and wearable device
CN112040461A (en) Approach discovery method and device
CN113393856B (en) Pickup method and device and electronic equipment
US10354651B1 (en) Head-mounted device control based on wearer information and user inputs
CN114727212B (en) Audio processing method and electronic equipment
US20230147435A1 (en) Audio Processing Method, Apparatus, and System
CN106506437B (en) Audio data processing method and device
CN114363770B (en) Filtering method and device in pass-through mode, earphone and readable storage medium
CN115835079B (en) Transparent transmission mode switching method and switching device
CN117528349A (en) Stereo pickup method, stereo pickup device, terminal device and computer-readable storage medium
CN111163226A (en) Volume adjusting method, device and system
CN115967887B (en) Method and terminal for processing sound image azimuth
CN112599144A (en) Audio data processing method, audio data processing apparatus, medium, and electronic device
CN114120950B (en) Human voice shielding method and electronic equipment
CN116095254B (en) Audio processing method and device
CN114220454B (en) Audio noise reduction method, medium and electronic equipment
CN112771893A (en) 3D sound effect implementation method and device, storage medium and electronic equipment
CN108605067B (en) Method for playing audio and mobile terminal
CN113746976A (en) Audio module detection method, electronic device and computer storage medium
CN113407076A (en) Method for starting application and electronic equipment
WO2022218271A1 (en) Video recording method and electronic devices
CN114125735B (en) Earphone connection method and device, computer readable storage medium and electronic equipment
CN114696961B (en) Multimedia data transmission method and equipment
CN114915682B (en) Voice processing method, device, storage medium and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination