WO2023245390A1

WO2023245390A1 - Smart earphone control method and apparatus, electronic device and storage medium

Info

Publication number: WO2023245390A1
Application number: PCT/CN2022/099966
Authority: WO
Inventors: 彭聪
Original assignee: 北京小米移动软件有限公司
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2023-12-28
Also published as: CN117751585A

Abstract

A smart earphone control method and apparatus, an electronic device and a storage medium. The method comprises: acquiring audio data played on a terminal device (101); identifying the audio data to determine that the terminal device is in a target working scenario (102), wherein, in the target working scenario, a microphone of a smart earphone is in a voice acquisition state; in the target working scenario, acquiring a mode switching instruction for the smart earphone (103); and, according to the mode switching instruction, controlling the smart earphone to be switched between at least two voice modes (104). In a target working scenario, a smart earphone is controlled to be switched between at least two modes, so that when a user uses the smart earphone, the user does not need to frequently take off or put on the smart earphone, thus improving the convenience of using the smart earphone.

Description

Control method, device, electronic device and storage medium for smart headphones

Technical field

The present application relates to the field of computer application technology, and in particular to control methods, devices, electronic devices and storage media for smart earphones.

Background technique

Terminal devices are equipped with smart headphones during use, which brings convenience to people's daily use. Smart headphones with noise reduction function can isolate the surrounding noise, and their current use is becoming more and more popular.

When users use smart earphones with noise reduction function, they usually need to communicate with people around them. At this time, users need to frequently take off the smart earphones to communicate with people around them, and then put on the earphones to reduce the noise level. It improves the convenience of using smart headphones.

Contents of the invention

This application proposes a control method, device, electronic device and storage medium for smart earphones to improve the convenience of using smart earphones.

On the one hand, an embodiment of this application proposes a control method for smart headphones, including:

Obtain audio data played on the terminal device;

The audio data is identified to determine that the terminal device is in a target working scenario; the microphone of the smart headset is in a voice collection state under the target working scenario.

Under the target working scenario, obtain the mode switching instruction of the smart headset;

According to the mode switching instruction, the smart earphone is controlled to switch between the at least two voice modes.

Another embodiment of the present application provides a control device for smart headphones, including:

An acquisition module, used to acquire audio data played on the terminal device;

The first determination module is used to identify the audio data to determine that the terminal device is in a target working scenario; in the target working scenario, the microphone of the smart headset is in a voice collection state;

The acquisition module is also used to acquire the mode switching instruction of the smart headset in the target working scenario;

A control module configured to control the smart earphone to switch between the at least two voice modes according to the mode switching instruction.

Another aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the aforementioned aspect is implemented. the method described.

Another embodiment of the present application proposes a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method as described in the foregoing aspect is implemented.

Another aspect of the present application provides a computer program product on which a computer program is stored. When the program is executed by a processor, the method as described in the foregoing aspect is implemented.

The control method, device, electronic equipment and storage medium of the smart headset proposed in this application acquire the audio data played on the terminal device and identify the audio data to determine that the terminal device is in the target working scene and the microphone of the smart headset in the target working scene. In the voice collection state, in the target work scenario, obtain the mode switching instruction of the smart headset, and control the smart headset to switch between at least two voice modes according to the mode switching instruction, by controlling the smart headset to switch between at least two voice modes in the target work scenario. Switching between modes can help users avoid taking off or putting on smart headphones frequently when using smart headphones, which improves the convenience of using smart headphones.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Description of the drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

Figure 1 is a schematic flowchart of a smart headset control method provided by an embodiment of the present application;

Figure 2 is a schematic flow chart of another smart headset control method provided by an embodiment of the present application;

Figure 3 is a schematic flow chart of another smart headset control method provided by an embodiment of the present application;

Figure 4 is a schematic flow chart of another smart headset control method provided by an embodiment of the present application;

Figure 5 is a schematic structural diagram of a control device for a smart headset provided by an embodiment of the present application;

Figure 6 is a block diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are intended to explain the present application, but should not be construed as limiting the present application.

The following describes the control method, device, electronic device and storage medium of the smart headset according to the embodiment of the present application with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of a method for controlling a smart headset provided by an embodiment of the present application.

The execution subject of the smart earphone control method in the embodiment of the present application is a control device of the smart earphone. The device can be installed in an electronic device. The electronic device can be a smart earphone. The smart earphone is a noise-cancelling earphone with a noise reduction function.

As shown in Figure 1, the method may include the following steps:

Step 101: Obtain audio data played on the terminal device.

The terminal device may be a smartphone, a handheld computer, a smart wearable device, a computer, etc., which are not limited in this embodiment.

In an example of the embodiment of this application, the smart earphone is a Bluetooth earphone with a noise reduction function. The smart earphone and the terminal device are connected through Bluetooth. After the connection is established, the terminal device sends the played audio data to the smart earphone according to the set frequency. , for example, sent in real time, or sent every 200ms, where the audio data can be human voice data, audio-visual entertainment data, etc.

Step 102: Recognize the audio data to determine that the terminal device is in the target working scene.

Among them, in the target working scenario, the microphone of the smart headset is in the voice collection state, which means that when the terminal device is in the target working scenario, the microphone of the smart headset is turned on and can be used to collect voice data of the environment.

In the embodiment of the present application, the audio data is identified to identify whether the audio data belongs to audio-visual entertainment data or human voice data. Among them, audio-visual entertainment data, such as music, movies, short videos, etc., such data all include background music. If the audio data recognition is voice data, for example, the voice data of people in a conference call scenario, or the voice data of people using instant chat software. Voice data, etc. In this scenario, the voice data of the user using the smart headset needs to be transmitted to users of other terminal devices participating in conference calls or instant chats. Therefore, the microphone on the smart headset needs to be turned on to collect the users using the smart headset. sound data, it is determined that the terminal device is in the target working scene; if the audio data is identified as audio-visual entertainment data, for example, music listened to on the application, video watched, etc., in this scenario, the smart headset is used to use the terminal device The played audio data is transmitted to the human ear through the earphones, and the microphone on the smart earphones is not turned on. There is no need to collect the sound data or environmental data of the user using the smart earphones.

In an implementation manner of the embodiment of the present application, audio data of a set duration is obtained, the audio data is identified, the voice segment included in the audio data is determined, and the response is that the voice segment includes the first target voice segment and does not include the second target voice segment. The target voice band and the third target voice band determine that the terminal device is in the target working scenario. Wherein, the lower frequency limit of the first target speech band is greater than the upper frequency limit of the second target speech band, and the upper frequency limit of the first target speech band is less than the lower frequency limit of the third target speech band. As an example, the speech band includes a low frequency band, a medium frequency band and a high frequency band, then the first target speech band is the medium frequency band, the second target speech band is the low frequency band, and the third target speech band is the high frequency band. Usually The acquired user voice data contains fewer low-frequency and high-frequency bands, which means that the frequency of the user's voice data mainly belongs to the mid-frequency band. Since the audio-visual entertainment data contains music data, the music data contains more low-frequency bands. data and high-frequency band data. Therefore, by identifying the frequency band contained in the audio data, it can be determined whether the audio data is human voice data or audio-visual entertainment data. When the audio data is recognized as human voice data, it is determined that the terminal device is in Target work scenario.

In another implementation manner of the embodiment of the present application, by identifying the audio data, if it is recognized that the proportion of the first target audio frequency band included in the audio data in the entire audio frequency band is greater than the set threshold, that is, the second target audio frequency band and/or the proportion of the third target audio frequency band in the entire audio frequency band is lower than the set threshold, then the audio data is considered to be the user's voice data, that is, it is determined that the terminal device is in the target working scene.

It should be noted that in order to improve the accuracy of identification, audio data of multiple set durations can be collected for identification. When the voice segments contained in the audio data of multiple set durations meet the above requirements, it is determined that the terminal device is in Target work scenario.

In another implementation manner of the embodiment of the present application, the source information carried by the audio data is identified, the source of the audio data is determined, and in response to the source of the audio data being the target source, it is determined that the terminal device is in the target working scene. Specifically, The source information carried by the audio data can be identified, where the source information indicates the application software to which the audio data belongs. If the application to which the audio data belongs is identified as the target application, it is determined that the terminal device is in the target working scene. For example, if the source information indicates that the pinyin data belongs to an audio-visual entertainment application, not a call application, for example, a conference call application, then it is determined that the terminal device does not belong to the target work scenario, otherwise it is determined that the terminal device belongs to the target work scenario.

Step 103: Obtain the mode switching instruction of the smart headset in the target working scenario.

In the embodiment of the present application, the smart headset has at least two voice modes in the target working scenario, which is a scenario where multiple users communicate, for example, a phone call. Meeting scenarios, voice chat scenarios, etc.

In an implementation manner of the embodiment of the present application, when it is detected that the terminal device is in a target working scenario, the functional mode corresponding to the setting button of the smart headset can be changed. For example, setting button A in a non-target working scenario , the functional mode includes play and pause functions, and in the target working mode, the functional mode is switched to include at least two voice modes. Therefore, in the target working mode, in response to the user pressing the setting button of the smart earphone, the mode switching instruction of the smart earphone can be obtained, and the mode switching instruction is used to switch the smart earphone to the voice mode indicated by the mode switching instruction. For example, smart headphones include two voice modes, namely noise reduction mode and communication mode. In the noise reduction mode, the external sound will be noise reduction processing, that is, after the noise reduction processing, the external sound is less than the set volume; in the communication mode, The external sound will not be denoised, but the original sound can be maintained or the original sound can be amplified so that the external sound is greater than the set volume. If it is detected that the user presses the setting button, it is determined which voice mode the smart headset was in before the user pressed the setting button. If the smart headset is in the noise reduction mode, it responds to the operation of the button and generates a communication mode switching command. Similarly, If the smart headset is in the communication mode, it will generate a noise reduction mode switching instruction in response to the operation of the button.

In another implementation manner of the embodiment of the present application, when it is detected that the terminal device is in the target working scene, the mode switching instruction corresponding to the obtained voice recognition can be used to change the corresponding functional mode of the smart headset. Specifically, That is, the acquired voice signal can be identified to identify the keywords contained in the voice signal, and the corresponding mode switching instruction is determined based on the keywords, where the mode switching instruction is used to switch the smart headset to the mode indicated by the mode switching instruction. Voice mode. Regarding the voice mode of the smart headset, please refer to the description in the previous implementation, and will not be described again here.

Step 104: Control the smart headset to switch between at least two voice modes according to the mode switching instruction.

Furthermore, according to the mode switching instruction, the smart headset is controlled to switch between at least two voice modes, so that the smart headset can be switched between at least two voice modes while wearing the smart headset, which improves the efficiency of voice mode switching and increases the number of voice modes. User convenience in using smart headphones.

This embodiment takes two voice modes, namely the noise reduction mode and the communication mode, as an example to illustrate. In one scenario, the smart headset is in the noise reduction mode, that is to say, the user using the smart headset makes a voice call through the client program. Then according to the mode switching instruction, switch to the communication mode. In the communication mode, the sound of the voice call will be reduced to less than the set volume, so that when the user wears the smart headset, he will not be affected by the audio played by the device on the terminal. The sound of data affects the acquisition of external sounds; in another scenario, the smart earphones are in communication mode, and after detecting the mode switching command, the smart earphones will switch from the communication mode to the noise reduction mode. In the noise reduction mode, the external sounds It will be processed by noise reduction, so that the volume of external sounds heard by the user is lower than the set volume. By controlling the switching of smart earphones between the two modes in the target work scenario, users can communicate and hold meetings smoothly without having to frequently take off or put on the smart earphones during the use of smart earphones, which improves the efficiency of smart earphones. Ease of use.

In the control method of the smart headset in the embodiment of the present application, the audio data played on the terminal device is obtained, and the audio data is identified to determine that the terminal device is in the target working scene. In the target working scene, the microphone of the smart headset is in the voice collection state. In the target work scenario, obtain the mode switching instruction of the smart headset, and control the smart headset to switch between at least two voice modes according to the mode switching instruction. By controlling the smart headset to switch between at least two modes in the target work scenario, it can help the user. When using smart headphones, you do not need to frequently take off or put on the smart headphones, which improves the convenience of using smart headphones.

Based on the previous embodiment, FIG. 2 is a schematic flow chart of another smart earphone control method provided by an embodiment of the present application. In this embodiment, the voice mode including the noise reduction mode and the communication mode is used as an example for explanation. Specifically, After the smart headset switches to the noise reduction mode, the smart headset performs noise reduction processing on the acquired environmental voice data to avoid the impact of human voice data on the target mode. As shown in Figure 2, the method includes the following steps:

Step 201: Obtain audio data played on the terminal device.

Step 202: Recognize the audio data to determine that the terminal device is in the target working scene.

Among them, the microphone of the smart headset is in the voice collection state in the target work scenario.

Step 203: Obtain the mode switching instruction of the smart headset in the target working scenario.

Step 204: Control the smart headset to switch between at least two voice modes according to the mode switching instruction.

For steps 201 to 204, reference may be made to the explanations in the foregoing embodiments. The principles are the same and will not be described again here.

Step 205: In response to controlling the smart headset to switch to the noise reduction mode, obtain the audio data played on the terminal device.

It should be noted that the execution of steps 205 to 208 after step 204 is only an example, that is to say, the execution timing of steps 205 to 208 is not limited.

In the embodiment of the present application, in response to controlling the smart headset to switch to the noise reduction mode according to the mode switching instruction, audio data played on the terminal device is obtained. The audio data is voice data collected by other terminal devices. Users of other terminal devices and the terminal device They are all in the target work scenario. The voice data contains human voice data. For the convenience of distinction, it is called the first user's voice data.

For example, the target work scenario is a conference call scenario. In the conference call scenario, there are three users participating in the conference call, namely User 1, User 2 and User 3. Among them, User 1 is the user corresponding to the terminal device, User 2 and User 3 are participating in the conference call. 3 corresponds to other terminal devices, and the obtained audio data played on the terminal device includes the sound data of the two first users, that is, the sound data of user 2 and user 3.

Step 206: Recognize the audio data to identify the first user's voice data included in the audio data.

In an implementation manner of the embodiment of the present application, the audio data is identified to identify the sound segments corresponding to different voices in the audio data. Since the voices of different people have different frequencies, therefore, the segments corresponding to the voices of different people can be identified. Different sound frequency bands or different sound spectrum characteristics can be used to identify the first user's sound data included in the audio data. The identified first user's sound data can be one or more, which are related to the target work scene. Next, it is related to the number of participants. The more participants, the more voice data of the first user will be recognized.

In another implementation manner of the embodiment of the present application, feature recognition such as timbre and color can be performed on the audio data based on the trained speech recognition model to identify the first user's voice data contained in the audio data.

For example, in a conference call scenario where three users participate in the conference call, the audio data obtained after identification includes the voice data of the two first users, which can be called the voice data of the first user A. and the voice data of the first user B.

Step 207: Obtain the first environmental voice data collected by the microphone on the smart headset.

In the embodiment of the present application, the microphone on the smart headset is in the activated state and can collect environmental voice data in real time. In order to facilitate differentiation from the environmental voice data collected in other modes, it is called the first environmental voice data and is sent to the smart headset.

Step 208: Perform noise reduction processing on the first environmental voice data according to the first user's voice data.

In the embodiment of the present application, the first environmental voice data can be recognized to obtain the human voice data contained in the first environmental voice data, wherein the first environmental voice data can be recognized to obtain the human voice data. For the method, please refer to the explanations in the previous steps. The principles are the same and will not be repeated here.

As an implementation manner, according to the voice segments of the first user's voice data, the volume of the voice segments in the first environmental voice data that do not belong to the first user's voice data is reduced or directly deleted, so as to improve the first user's voice data. The ambient voice data undergoes noise reduction processing to reduce the impact of the ambient voice data on the voice data played on the terminal device.

As another implementation manner, the voice data of the second user using the smart headset is obtained, and based on the voice data of the first user and the voice data of the second user, the voice data of the first user and the voice data of the second user in the first environmental voice data are compared. Sound data other than the user's sound data is subjected to noise reduction processing. The noise reduction level of the noise reduction processing can be set according to needs. As an implementation method, the noise reduction can be set to the highest to achieve the acquisition of the third sound. The voice data other than the first user's voice data and the second user's voice data are filtered out in the first ambient voice data to maximize the effect of calls or communication in the target work scenario, and achieve the goal of filtering out the first ambient voice data that is not included in the first user's voice data. Other sound data belonging to the first user and the second user are subjected to noise reduction processing, so that in the target working scenario, the smart earphones can filter out the sound data in the environment when using the noise reduction mode, so that the second user in the target working mode can The audio data played by the terminal device is purer, which improves the speech effect in the target work scenario.

In the control method of the smart headset in the embodiment of the present application, in response to controlling the smart headset to switch to the noise reduction mode, the audio data played on the terminal device is obtained, and the audio data is identified to identify the first user's voice included in the audio data. data, obtain the ambient voice data collected by the microphone on the smart headset, and perform noise reduction processing on the ambient voice data based on the first user's voice data to reduce the impact of the ambient voice data on the voice data played on the terminal device and improve the intelligence Ease of use of headphones.

Based on the previous embodiment, FIG. 3 is a schematic flow chart of another smart earphone control method provided by an embodiment of the present application. In this embodiment, the voice mode including the noise reduction mode and the communication mode is used as an example for explanation. Specifically, After the smart headset switches to communication mode, how does the smart headset obtain the sound data of a third user outside the second user using the smart headset in the ambient voice data, so as to achieve communication without taking off the smart headset. As shown in Figure 3, the method consists of the following steps:

Step 301: Obtain audio data played on the terminal device.

Step 302: Recognize the audio data to determine that the terminal device is in the target working scene.

Step 303: Obtain the mode switching instruction of the smart headset in the target working scenario.

Step 304: Control the smart headset to switch between at least two voice modes according to the mode switching instruction.

For steps 301 to 304, reference may be made to the explanations in the foregoing embodiments. The principles are the same and will not be described again here.

Step 305: In response to controlling the smart earphone to switch to the communication mode according to the mode switching instruction, obtain the first sound direction of the second user using the smart earphone.

It should be noted that the execution of steps 305 to 307 after step 304 is only an example. Steps 305 to 307 can also be executed before step 304. That is to say, the execution timing of steps 305 to 307 in this implementation There are no restrictions in the example.

In the embodiment of the present application, when the smart earphone is in the communication mode, the second user using the smart earphone will communicate with surrounding users. As an implementation method, when switching to the communication mode, a voice prompt can be sent to the second user using the smart earphone through the smart earphone, prompting the second user to speak the setting sentence, thereby determining to collect the setting of the second user's voice The first sound position corresponding to the sentence. As another implementation, when switching to the communication mode, the first sound direction of the second user using the smart earphone stored by the smart earphone can be obtained, because the second user usually wears the smart earphone when using the smart earphone. is fixed, so the first sound direction of the second user's sound data collected by the smart earphone is also fixed, so the first sound direction of the second user can be obtained from the storage unit of the smart earphone.

Step 306: Control the microphone to collect the sound data of the third user in the environment at a second sound direction other than the first sound direction of the second user.

The third user may be a user who makes a sound in the environment, may be a user communicating with the second user, or may be another user who is not communicating with the second user but is talking. The second sound direction is the direction from which the third user makes the sound in the environment.

In one implementation manner of the embodiment of the present application, the microphone is a microphone array, and the first microphone in the microphone array used to collect the first sound direction of the second user is controlled to stop collecting the second user's sound data, and the first microphone in the microphone array is controlled to stop collecting the second user's sound data. A second microphone other than the microphone collects the sound data of the third user in the environment at a second sound direction other than the first sound direction of the second user, so that the sound data collected by the microphone does not include the sound data of the second user, improving Improves the accuracy of sound collection in communication mode.

Optionally, if there are multiple sound directions other than the first sound direction in the environment, the direction with the strongest sound intensity can be used as the second sound direction.

Step 307: Use smart headphones to play the third user's voice data.

In the embodiment of the present application, the voice data of the third user collected by the microphone is sent to the smart earphone. After the smart earphone amplifies the voice data of the third user, it plays the voice data of the third user, which improves the accuracy of the voice data of the third user. The volume improves the reliability of the second user's acquisition of the communicating user's voice data, allowing the second user to clearly hear the communicating third user's voice without taking off the earphones, thereby improving the communication effect.

Further, in response to controlling the smart headset to switch to the communication mode, turn off the noise reduction mode, and prohibit data transmission between the microphone and the terminal device, and prohibiting data transmission between the microphone and the terminal device may be performed by the smart headset or performed by the terminal device, as follows Explain separately.

In one implementation manner of the embodiment of the present application, in response to controlling the smart headset to switch to the communication mode, the noise reduction mode is turned off to avoid noise reduction processing on the environmental voice data collected by the microphone.

At the same time, in order to avoid interference with the target working scene of the smart headset, when the second user using the headset communicates with the surrounding third user, data transmission between the microphone and the terminal device is prohibited, which means that the microphone is prohibited from transmitting data to the surrounding third user. The collected voice data of the third user is sent to the terminal device, thereby preventing the terminal device from playing the voice data of the third user in the target work scenario. The target work scenario is, for example, a conference call scenario, that is, to prevent other customers participating in the conference call from listening. to the second user and the third user's communication voice.

In another implementation manner of the embodiment of the present application, when the smart headset is switched to the communication mode according to the mode switching instruction, the user can be prompted to trigger the prohibition instruction in the interactive interface of the terminal device, so that the terminal device passes the prohibition instruction through Bluetooth is sent to the smart headset. The smart headset prohibits data transmission between the microphone and the terminal device based on the obtained prohibition instruction sent by the terminal device. Specifically, it can prohibit the transmission of the third user's voice data, thereby avoiding the second user from being connected to the surroundings. The voice content communicated by the third user is transmitted to the terminal device, thereby preventing the terminal device from playing the communicated voice content in the target work scenario. The target work scenario is, for example, a conference call scenario, that is, to prevent other customers participating in the conference call from hearing the third user. The voice content of the exchange between the second user and the third user.

In the embodiment of the present application, in response to controlling the smart headset to switch to the communication mode, obtaining the sound direction of the second user using the smart headset, and controlling the microphone to collect the third user's sound data at a sound direction other than the second user's sound direction, Using smart earphones to play the voice data of a third user does not collect the voice data of the second user using the smart earphones, but collects the voice data of the third user communicating with the second user, so that the user does not take off the smart earphones. It can communicate with surrounding users smoothly, improving the reliability of communication while wearing smart headphones.

Based on the previous embodiment, FIG. 4 is a schematic flow chart of another smart headset control method provided by an embodiment of the present application. It specifically illustrates how the smart headset obtains environmental voice data using the smart headset after the smart headset switches to the communication mode. The voice data of the third user outside the second user enables communication without taking off the smart earphones. As shown in Figure 4, the method consists of the following steps:

Step 401: Obtain audio data played on the terminal device.

Step 402: Recognize the audio data to determine that the terminal device is in the target working scene.

Step 403: Obtain the mode switching instruction of the smart headset in the target working scenario.

Step 404: Control the smart headset to switch between at least two voice modes according to the mode switching instruction.

For steps 401 to 404, reference may be made to the explanations in the foregoing embodiments. The principles are the same and will not be described again here.

Step 405: In response to controlling the smart headset to switch to the communication mode, obtain the second environmental voice data collected by the microphone on the smart headset.

It should be noted that the execution of steps 405 to 407 after step 404 is only an example, and the execution timing of steps 405 to 407 is not limited in this embodiment.

Among them, the content of the second environmental voice data collected by the microphone on the smart headset may be the same as or different from the first environmental voice data in the previous embodiment. It is only to distinguish the environmental voice data collected in different modes. Please refer to the aforementioned The explanation of the first environmental voice data in the embodiment will not be described again here.

Step 406: Determine the voice data of the third user in the second environmental voice data based on the voice data of the second user using the smart headset.

Among them, the third user and the second user are different users. In the conference call scenario, the third user is the user communicating with the second user.

Step 407: Use smart headphones to play the third user's voice data.

In the embodiment of the present application, the second user's voice data is identified to obtain the voice segment or timbre feature information of the second user's voice data, and the second user's voice data is identified based on the voice segment or timbre feature information of the second user's voice data. The second user's voice data included in the second environmental voice data, so the voice data other than the second user's voice data in the second environmental voice data is the third user's voice data, and further, the third user's voice data is passed through Smart earphones play to increase the volume of the third user's voice. Optionally, the noise data that does not belong to the human voice in the third user's voice data can be filtered out and amplified and then played through the smart earphones to ensure that the third user wearing the smart earphones The second user can clearly obtain the voice of the third user.

It should be noted that the voice data of the third user may be the voice data of one user or the voice data of multiple users.

The method of identifying the second user's voice data may refer to the explanations in the foregoing embodiments. The principles are the same and will not be described again here.

At the same time, in order to avoid interference with the target working scene of the smart headset, when the second user using the headset communicates with the surrounding third user, data transmission between the microphone and the terminal device is prohibited, which means that the microphone is prohibited from transmitting data to the surrounding third user. The collected second environment voice data is sent to the terminal device, thereby preventing the terminal device from playing the second environment voice data containing the third user's voice data in the target work scenario. The target work scenario is, for example, a conference call scene, that is, to avoid participating in Other users of the conference call hear the exchanges between the second user and the third user.

In another implementation manner of the embodiment of the present application, when the smart headset is switched to the communication mode according to the mode switching instruction, the user can be prompted to trigger the prohibition instruction in the interactive interface of the terminal device, so that the terminal device passes the prohibition instruction through Bluetooth is sent to the smart headset. The smart headset prohibits data transmission between the microphone and the terminal device according to the prohibition instruction sent by the terminal device. Specifically, it can prohibit the transmission of the second environment voice data collected by the microphone, thereby avoiding the second user. The voice content communicated with the surrounding third users is transmitted to the terminal device, thereby preventing the terminal device from playing the communicated voice content in the target work scenario. The target work scenario is, for example, a conference call scenario, that is, to prevent other customers participating in the conference call from listening. Speech content of the exchange to the second user and the third user.

In the control method of the smart headset according to the embodiment of the present application, in response to controlling the smart headset to switch to the communication mode, the second environmental voice data collected by the microphone on the smart headset is obtained, and the second environmental voice data collected by the microphone on the smart headset is determined based on the voice data of the second user using the smart headset. The third user's voice data in the second ambient voice data is used to play the third user's voice data using smart headphones. By determining the third user's voice data in the ambient voice data and playing the third user's voice data, the second Users can communicate smoothly with third-party users around them without taking off their smart headphones, which improves the reliability of communication while wearing smart headphones and eliminates the need to frequently take off or remove the smart headphones when switching modes. Wearing smart headphones increases the convenience of using smart headphones.

In order to implement the above embodiments, embodiments of the present application also provide a control device for smart earphones.

FIG. 5 is a schematic structural diagram of a control device for a smart headset provided by an embodiment of the present application.

As shown in Figure 5, the device may include:

The acquisition module 51 is used to acquire the audio data played on the terminal device.

The first determination module 52 is used to identify the audio data to determine that the terminal device is in a target working scenario; in the target working scenario, the microphone of the smart headset is in a voice collection state.

The obtaining module 51 is also used to obtain the mode switching instruction of the smart headset in the target working scenario.

The control module 53 is configured to control the smart earphone to switch between the at least two voice modes according to the mode switching instruction.

Further, in an implementation manner of the embodiment of the present application, at least two voice modes include a noise reduction mode, and the device further includes: a recognition module and a processing module.

The acquisition module 51 is also configured to acquire audio data played on the terminal device in response to controlling the smart headset to switch to the noise reduction mode;

An identification module, used to identify the audio data to identify the first vocal data included in the audio data;

The acquisition module 51 is also used to acquire the first environmental voice data collected by the microphone on the smart headset;

A processing module configured to perform noise reduction processing on the first environmental voice data according to the first human voice data.

In an implementation manner of the embodiment of the present application, the above processing module is specifically used for:

Obtain second sound data of the first target user using the smart headset;

According to the first sound data and the second sound data, noise reduction processing is performed on the sound data in the first environmental voice data other than the sound data of the first user and the sound data of the second user.

In an implementation manner of the embodiment of the present application, at least two voice modes include a communication mode, and the device further includes: a second determination module and a playback module.

The acquisition module 51 is also configured to acquire the second environmental voice data collected by the microphone on the smart earphone in response to controlling the smart earphone to switch to the communication mode;

A second determination module, configured to determine the voice data of a third user in the second environmental voice data according to the voice data of the second user using the smart headset; wherein the third user and the second user The users are different users;

A playback module is used to play the third user's voice data using the smart earphone.

In an implementation manner of the embodiment of the present application, at least two voice modes include a communication mode, and the device further includes:

The acquisition module 51 is configured to acquire the first sound direction of the second user using the smart earphone in response to controlling the smart earphone to switch to the communication mode;

A collection module configured to control the microphone to collect the sound data of a third user in the environment at a second sound direction other than the first sound direction of the second user; wherein the third user and the second The users are different users;

The playback module is also used to play the third user's voice data using the smart earphone.

In an implementation manner of the embodiment of the present application, the device further includes:

A first shutdown module, configured to turn off the noise reduction mode in response to controlling the smart earphone to switch to the communication mode, and prohibit data transmission between the microphone and the terminal device.

A second shutdown module, configured to turn off the noise reduction mode in response to controlling the smart headset to switch to the communication mode; in response to obtaining a prohibition instruction sent by the terminal device, to prohibit the Data transmission is performed between the microphone and the terminal device.

In an implementation manner of the embodiment of the present application, the first determination module 52 is specifically used to:

Identify the audio data and determine the voice segments included in the audio data;

In response to the voice segment containing the first target voice segment and not including the second target voice segment and the third target voice segment, it is determined that the terminal device is in the target working scene; wherein the frequency of the first target voice segment The lower limit is greater than the upper frequency limit of the second target speech band, and the upper frequency limit of the first target speech band is less than the lower frequency limit of the third target speech band.

Identify the source information carried by the audio data and determine the source of the audio data;

In response to the source of the audio data being the target source, it is determined that the terminal device is in the target working scene.

It should be noted that the foregoing explanation of the method embodiment also applies to the device of this embodiment, and will not be described again here.

In the control device of the smart headset in the embodiment of the present application, the audio data played on the terminal device is acquired, and the audio data is identified to determine that the terminal device is in the target working scene. In the target working scene, the microphone of the smart headset is in the voice collection state. In the target work scenario, obtain the mode switching instruction of the smart headset, and control the smart headset to switch between at least two voice modes according to the mode switching instruction. By controlling the smart headset to switch between at least two modes in the target work scenario, it can help the user. When using smart headphones, you do not need to frequently take off or put on the smart headphones, which improves the convenience of using smart headphones.

In order to implement the above embodiments, this application also proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the aforementioned method is implemented. methods described in the examples.

In order to implement the above embodiments, this application also proposes a non-transitory computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method as described in the foregoing method embodiments is implemented.

In order to implement the above embodiments, this application also proposes a computer program product on which a computer program is stored. When the computer program is executed by a processor, the method as described in the foregoing method embodiments is implemented.

Figure 6 is a block diagram of an electronic device provided by an embodiment of the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communications component 816.

Processing component 802 generally controls the overall operations of electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

Memory 804 is configured to store various types of data to support operations at electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

Power component 806 provides power to various components of electronic device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some embodiments, multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.

Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when electronic device 800 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or sent via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.

Sensor component 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or an electronic device 800. The position of components changes, the presence or absence of user contact with the electronic device 800 , the orientation or acceleration/deceleration of the electronic device 800 and the temperature of the electronic device 800 change. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 4G or 5G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communications component 816 also includes a near field communications (NFC) module to facilitate short-range communications. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, electronic device 800 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.

In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions, such as a memory 804 including instructions, which can be executed by the processor 820 of the electronic device 800 to complete the above method is also provided. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of this application, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.

Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the technical field to which the embodiments of this application belong.

The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of the present application can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

In addition, each functional unit in various embodiments of the present application can be integrated into a processing module, or each unit can exist physically alone, or two or more units can be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present application have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and cannot be understood as limitations of the present application. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present application. The embodiments are subject to changes, modifications, substitutions and variations.

Claims

A control method for smart headphones, which is characterized by including:

Obtain audio data played on the terminal device;

The audio data is identified to determine that the terminal device is in a target working scenario; the microphone of the smart headset is in a voice collection state under the target working scenario.

Under the target working scenario, obtain the mode switching instruction of the smart headset;

According to the mode switching instruction, the smart earphone is controlled to switch between the at least two voice modes.
The method of claim 1, wherein the at least two voice modes include a noise reduction mode, and the method further includes:

In response to controlling the smart headset to switch to the noise reduction mode, obtain audio data played on the terminal device;

Identify the audio data to identify the first user's voice data included in the audio data;

Obtain the first environmental voice data collected by the microphone on the smart headset;

Perform noise reduction processing on the first environmental voice data according to the first user's voice data.
The method of claim 2, wherein performing noise reduction processing on the first environmental voice data according to the first user's voice data includes:

Obtain the voice data of the second user using the smart headset;

According to the voice data of the first user and the voice data of the second user, the voice data in the first environmental voice data other than the voice data of the first user and the voice data of the second user are processed. Noise reduction processing.
The method of claim 1, wherein the at least two voice modes include a communication mode, and the method further includes:

In response to controlling the smart headset to switch to the communication mode, obtain the second environmental voice data collected by the microphone on the smart headset;

Determine the voice data of the third user in the second environmental voice data according to the voice data of the second user using the smart headset; wherein the third user and the second user are different users;

The smart earphone is used to play the third user's voice data.
The method of claim 1, wherein the at least two voice modes include communication modes, and the method further includes:

In response to controlling the smart earphone to switch to the communication mode, obtain the first sound direction of the second user using the smart earphone;

Controlling the microphone to collect sound data of a third user in the environment at a second sound direction other than the first sound direction; wherein the third user and the second user are different users; using the The smart earphone plays the voice data of the third user.
The method according to claim 4 or 5, characterized in that the method further includes:

In response to controlling the smart earphone to switch to the communication mode, the noise reduction mode is turned off, and data transmission between the microphone and the terminal device is prohibited.
The method according to claim 4 or 5, characterized in that the method further includes:

In response to controlling the smart headset to switch to the communication mode, turning off the noise reduction mode;

In response to obtaining the prohibition instruction sent by the terminal device, data transmission between the microphone and the terminal device is prohibited according to the prohibition instruction.
The method of claim 1, wherein identifying the audio data to determine that the terminal device is in a target working scene includes:

Identify the audio data and determine the voice segments included in the audio data;

In response to the voice segment containing the first target voice segment and not including the second target voice segment and the third target voice segment, it is determined that the terminal device is in the target working scene; wherein the frequency of the first target voice segment The lower limit is greater than the upper frequency limit of the second target speech band, and the upper frequency limit of the first target speech band is less than the lower frequency limit of the third target speech band.
The method of claim 1, wherein identifying the audio data to determine that the terminal device is in a target working scene includes:

Identify the source information carried by the audio data and determine the source of the audio data;

In response to the source of the audio data being the target source, it is determined that the terminal device is in the target working scene.
A control device for smart headphones, which is characterized by including:

An acquisition module, used to acquire audio data played on the terminal device;

The first determination module is used to identify the audio data to determine that the terminal device is in a target working scenario; in the target working scenario, the microphone of the smart headset is in a voice collection state;

The acquisition module is also used to acquire the mode switching instruction of the smart headset in the target working scenario;

A control module configured to control the smart earphone to switch between the at least two voice modes according to the mode switching instruction.
The device of claim 10, wherein the at least two voice modes include a noise reduction mode, and the device further includes:

The acquisition module is also configured to acquire audio data played on the terminal device in response to controlling the smart headset to switch to the noise reduction mode;

An identification module, used to identify the audio data to identify the first vocal data included in the audio data;

The acquisition module is also used to acquire the first environmental voice data collected by the microphone on the smart headset;

A processing module configured to perform noise reduction processing on the first environmental voice data according to the first human voice data.
The device according to claim 11, characterized in that the processing module is specifically used for:

Obtain second sound data of the first target user using the smart headset;

According to the first sound data and the second sound data, noise reduction processing is performed on the sound data in the first environmental voice data other than the sound data of the first user and the sound data of the second user.
The device of claim 10, wherein the at least two voice modes include a communication mode, and the device further includes:

The acquisition module is also configured to acquire the second environmental voice data collected by the microphone on the smart earphone in response to controlling the smart earphone to switch to the communication mode;

A second determination module, configured to determine the voice data of a third user in the second environmental voice data according to the voice data of the second user using the smart headset; wherein the third user and the second user The users are different users;

A playback module is used to play the third user's voice data using the smart earphone.
The device of claim 10, wherein the at least two voice modes include a communication mode, and the device further includes:

The acquisition module is also configured to acquire the first sound direction of the second user using the smart earphone in response to controlling the smart earphone to switch to the communication mode;

A collection module configured to control the microphone to collect the sound data of a third user in the environment at a second sound direction other than the first sound direction of the second user; wherein the third user and the second The users are different users;

The playback module is used to play the third user's voice data using the smart earphone.
The device according to claim 13 or 14, characterized in that the device further includes:

A first shutdown module, configured to turn off the noise reduction mode in response to controlling the smart earphone to switch to the communication mode, and prohibit data transmission between the microphone and the terminal device.
The device according to claim 13 or 14, characterized in that the device further includes:

A second shutdown module, configured to turn off the noise reduction mode in response to controlling the smart headset to switch to the communication mode; in response to obtaining a prohibition instruction sent by the terminal device, to prohibit the Data transmission is performed between the microphone and the terminal device.
The device according to claim 10, characterized in that the first determination module is specifically used to:

Identify the audio data and determine the voice segments included in the audio data;

In response to the voice segment containing the first target voice segment and not including the second target voice segment and the third target voice segment, it is determined that the terminal device is in the target working scene; wherein the frequency of the first target voice segment The lower limit is greater than the upper frequency limit of the second target speech band, and the upper frequency limit of the first target speech band is less than the lower frequency limit of the third target speech band.
The device according to claim 10, characterized in that the first determination module is specifically used to:

Identify the source information carried by the audio data and determine the source of the audio data;

In response to the source of the audio data being the target source, it is determined that the terminal device is in the target working scene.
An electronic device, characterized in that it includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, any one of claims 1-9 is implemented. the method described.
A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1-9 is implemented.