WO2020239001A1

WO2020239001A1 - Humming recognition method and related device

Info

Publication number: WO2020239001A1
Application number: PCT/CN2020/092802
Authority: WO
Inventors: 叶波; 吴小进; 周昕宇
Original assignee: 华为技术有限公司
Priority date: 2019-05-31
Filing date: 2020-05-28
Publication date: 2020-12-03
Also published as: CN112015943A

Abstract

Disclosed in the present application are a humming recognition method and a related device. In the humming recognition method, an electronic device can continuously acquire voice in the external environment, and upon determining that the voice is produced by a preset user, the electronic device sends to a music recognition server a first audio file comprising the voice to perform humming recognition. Upon receiving a recognized second audio file and its start playback position sent by the music recognition server, the electronic device can start playing back the second audio file from the end position of the voice, wherein the start playback position of the second audio file corresponds to the end position of the first audio file. In this way, the steps of operations conducted by a user to trigger a terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved; in addition, the audio can be played back subsequently to the humming of the user, so that the user experience can be improved.

Description

Humming recognition method and related equipment

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China with application number 201910472410.9, and the priority of the Chinese patent application with the title of "A humming recognition method and related equipment" on May 31, 2019 , Its entire content is incorporated in this application by reference.

Technical field

This application relates to the field of computer technology, in particular to a humming recognition method and related equipment.

Background technique

Humming recognition is currently a research hotspot in the field of audio retrieval. Different from using text (for example, song name, singer, or lyrics) to retrieve audio, and different from using a piece of music to retrieve audio, humming recognition can be based on the music segment hummed by the user. Retrieve audio.

At this stage, the user triggers the terminal to perform humming recognition mainly in the following two ways: the first method, the user first needs to find an application with humming recognition function, and then find the corresponding functional control of humming recognition in the application , And then perform operations on the functional control to trigger the terminal to perform humming recognition. In the second way, the user first needs to wake up the intelligent voice assistant (for example, siri, Tmall Genie, etc.) through a wake-up word, and then input a voice command to trigger the terminal to perform humming recognition. It can be seen that in the prior art, the manner in which the user triggers the terminal to perform humming recognition is relatively complicated.

Summary of the invention

This application provides a humming recognition method and related equipment, which can reduce the operation steps of a user triggering a terminal to perform humming recognition, improve the efficiency of humming recognition, and at the same time, can achieve the effect of playing audio following the user's humming, and improve the user Experience.

The above goals and other goals will be achieved through the features in the independent claims. Further implementations are embodied in the dependent claims, description and drawings.

In the first aspect, an embodiment of the present application provides a humming recognition method, which may include: an electronic device collects sound in an external environment through an audio input module; if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is the same, the electronic device sends a first audio file to the music recognition server, and the first audio file contains the sound, and the music recognition server is configured to retrieve the audio resource from the audio resource according to the first audio file. The second audio file is found in the library, and the initial playback position of the second audio file is determined; wherein the similarity between the feature of the second audio file and the feature of the first audio file is higher than that of the third The similarity between the feature of the audio file and the feature of the sound, the third audio file is the audio file excluding the second audio file in the audio resource library, and the start playback position of the second audio file is the same as The end position of the first audio file corresponds; the electronic device receives the second audio file and first indication information sent by the music recognition server, and the first indication information indicates the start of the second audio file The starting playback position; the electronic device plays the second audio file from the starting playback position through the audio output module. In this way, the operation steps for the user to trigger the terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.

With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic device obtains the user's mouth shape information through a camera; if the voiceprint information of the voice is consistent with the prestored voiceprint information, then The electronic device sends the lip shape information to a music recognition server; wherein, the music recognition server is also used to convert the lip shape information into text information, and the first audio file is obtained from an audio resource library Finding the second audio file includes: finding the second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.

With reference to the first aspect, in a possible implementation manner, the electronic device obtains the user's mouth shape information through a camera, including: if the electronic device determines that the sound is a human voice, acquiring the user's mouth shape information through the camera .

With reference to the first aspect, in a possible implementation manner, the electronic device collects sounds in the external environment through an audio input module, including: if the electronic device determines that the audio input module and/or the audio output module If it is not occupied, the electronic device collects sounds in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.

With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the method further includes: the electronic device displays The identification information of the second audio file, and the playback control; wherein, the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the electronic device detects the function In response to the first user operation of the play control in the first state, the electronic device pauses playing the second audio file and sets the display state of the play control The second state indicates that the second audio file is paused.

With reference to the first aspect, in a possible implementation manner, the method further includes: when detecting that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation, the method further includes: when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module .

With reference to the first aspect, in a possible implementation manner, the electronic device playing the second audio file from the starting playback position through the audio output module includes: if the electronic device determines that the electronic device is If the location is not consistent with the preset location, the electronic device plays the second audio file from the start playback position through the audio output module.

With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic device stops collecting sounds in the external environment through the audio input module within the first time period.

With reference to the first aspect, in a possible implementation manner, the electronic device collects sounds in the external environment through the audio input module, including: if the electronic device determines that its own humming recognition function is enabled, the electronic device collects the sound through the audio input module Sound in the external environment.

With reference to the first aspect, in a possible implementation manner, the method further includes: when it is detected that the duration of the ambient light brightness is less than the preset value and greater than the preset time, the electronic device stops collecting external data through the audio input module. Sound in the environment.

With reference to the first aspect, in a possible implementation manner, the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.

With reference to the first aspect, in a possible implementation manner, the electronic device will use the time period from the time when the second audio file is played to the preset time (for example, the 5th second, the 6th second, etc.) The volume of the second audio file is gradually increased from low to high.

With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the electronic device may also detect the second audio file Whether it is stored in a pre-stored music folder, if so, the electronic device can play other audio files in the music folder after playing the second audio file.

In the second aspect, an embodiment of the present application provides an electronic device. This electronic device includes an audio input module, an audio output module, a processor, and a memory. The memory is used to store program instructions; The program instructions perform the following operations: collect sounds in the external environment through the audio input module; if it is determined that the voiceprint information of the voice is consistent with the prestored voiceprint information, send the first audio file to the music recognition server, and the first audio file is sent to the music recognition server. An audio file contains the sound, and the music recognition server is configured to find a second audio file from an audio resource library according to the first audio file, and determine the starting playback position of the second audio file; where The similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library Except for the audio files of the second audio file, the start playback position of the second audio file corresponds to the end position of the first audio file; receiving the second audio file sent by the music recognition server, and First indication information, where the first indication information indicates the start playback position of the second audio file; the second audio file is played from the start playback position through an audio output module. Through this electronic device, the operation steps of the user's touch and humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be realized, and the user experience can be improved.

With reference to the second aspect, in a possible implementation manner, the electronic device further includes a camera, and the processor is further configured to perform the following operations according to the program instructions: obtain the user's mouth shape information through the camera; If the voiceprint information of the voice is consistent with the pre-stored voiceprint information, the lip-shape information is sent to the music recognition server; wherein the music recognition server is also used to convert the lip-shape information into text information; the music recognition The server is also specifically configured to find a second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file is The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the sound is a human voice, obtain the user's mouth shape information through a camera.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the audio input module and/or the audio output module is not occupied, pass the audio The input module collects sounds in the external environment.

With reference to the second aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.

With reference to the second aspect, in a possible implementation manner, the electronic device further includes a display screen, and the processor is further configured to perform the following operations according to the program instructions: display identification information of the second audio file on the display screen , And a play control; wherein the display state of the play control is a first state, and the first state indicates that the second audio file is being played; if it is detected that it acts on the play in the first state The first user operation of the control, in response to the first user operation, pause the playback of the second audio file, and set the display state of the playback control to the second state, the second state representing the second The audio file is paused.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when detecting that the electronic device is in a locked state, stop collecting the external environment through the audio input module In the voice.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when it is detected that the electronic device is at a preset location, stop collecting external data through the audio input module Sound in the environment.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the location of the electronic device is inconsistent with the preset location, the audio output module The second audio file is played at the start playback position.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: stop collecting sounds in the external environment through the audio input module within the first time period.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that its own humming recognition function is turned on, collect data in the external environment through the audio input module sound.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when it is detected that the duration of the ambient light brightness is less than a preset value and is greater than the preset time, Stop collecting sounds in the external environment through the audio input module.

With reference to the second aspect, in a possible implementation manner, the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: from the time when the second audio file starts to be played to a preset time (for example, the 5th second, During the time period of the 6th second, the volume of the second audio file will be gradually increased from low to high.

With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: detecting whether the second audio file is stored in a pre-stored music folder, and if so, After the second audio file is played, other audio files in the music folder are played.

In a third aspect, an embodiment of the present application provides yet another humming recognition method. The method includes: an open platform obtains a first audio file, and the first audio file includes sounds in an external environment; if the open platform determines If the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform searches for the second audio file from the audio resource library according to the first audio file, and determines the value of the second audio file The initial playback position; wherein the similarity between the features of the second audio file and the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound, and the third The audio file is an audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file corresponds to the end position of the first audio file; The second audio file is played at the initial playback position, or the development platform controls other applications of the electronic device to play the second audio file from the initial playback position. In this way, the operation steps for the user to trigger the humming recognition can be reduced, and the efficiency of the humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.

With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform obtains the user's mouth shape information through the electronic device; if the open platform determines the sound of the first audio file The pattern information is consistent with the pre-stored voiceprint information, the open platform converts the lip shape information into text information; the searching for the second audio file from the audio resource library according to the first audio file includes: The text information corresponding to the first audio file and the lip shape information finds the second audio file from the audio resource library, wherein the text information corresponding to the second audio file is compared with the text information corresponding to the lip shape information. The similarity is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information.

With reference to the third aspect, in a possible implementation manner, the open platform obtains the user's mouth shape information, including: if the open platform determines that the voice included in the first audio file is a human voice, using the The electronic device obtains the user's mouth shape information.

With reference to the third aspect, in a possible implementation manner, the open platform acquiring the first audio file includes: if the open platform determines that the audio input module and/or the audio output module is not occupied by other applications, then the open platform Get the first audio file.

With reference to the third aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.

With reference to the third aspect, in a possible implementation manner, after the open platform plays the second audio file from the start playback position, the method further includes: the open platform displays the second audio file through an electronic device Second, the identification information of the audio file and the playback control; wherein the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the open platform detects that it acts on The first user operation of the playback control in the first state, in response to the first user operation, the open platform pauses the second audio file, or controls other applications of the electronic device to pause the playback. The second audio file is set, and the display state of the playback control is set to the second state, and the second state indicates that the second audio file is paused.

With reference to the third aspect, in a possible implementation manner, the method further includes: when detecting that the electronic device is in a locked state, the open platform stops acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the method further includes: when it is detected that the electronic device is at a preset location, the open platform stops acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device from the start Playing the second audio file at the playback position includes: if the open platform determines that the position of the electronic device is inconsistent with a preset location, then the open platform plays the second audio file from the starting playback position, Or the development platform controls another application program of the electronic device to play the second audio file from the start playback position.

With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform stops acquiring the first audio file within the first time period.

With reference to the third aspect, in a possible implementation manner, the open platform acquiring the first audio file includes: if the humming recognition function of the electronic device is enabled, the open platform acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the method further includes: when it is detected that the ambient light brightness of the electronic device is less than the duration of the preset value and greater than the preset time, the open platform stops acquiring The first audio file.

With reference to the third aspect, in a possible implementation manner, the open platform is also used to find a second audio file from an audio resource library according to the first audio file when it is determined that the first audio file is a music fragment .

In combination with the third aspect, in a possible implementation manner, the open platform will use the time period from the moment when the second audio file starts to play to the preset moment (for example, the 5th second, the 6th second, etc.) The volume of the second audio file is gradually increased from low to high.

With reference to the third aspect, in a possible implementation manner, after the development platform controls other applications of the electronic device to play the second audio file from the start playback position, the open platform may also detect the second audio file. Second, whether the audio file is stored in the pre-stored music folder of the electronic device. If so, the open platform can control other applications of the electronic device to play the second audio file after other applications of the electronic device have finished playing the second audio file Other audio files in the music folder.

In a fourth aspect, the embodiments of the present application provide a computer program product containing instructions, when the computer program product is run on an electronic device, the electronic device is caused to execute, such as causing the above-mentioned electronic device to execute as any one of the first aspect. A possible implementation manner, or when the computer program product runs on an open platform, such that the open platform executes any possible implementation manner in the third aspect.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions run on an electronic device, the electronic device is caused to execute such as the first Any possible implementation manner in the aspect, or when the instruction runs on an open platform, causes the open platform to execute any possible implementation manner in the third aspect.

In the method for humming recognition provided in the present application, the electronic device can continuously acquire the sound in the external environment. When determining that the sound is the sound made by the preset user, the electronic device sends the first sound containing the sound to the music recognition server. Audio files for humming recognition. After the electronic device receives the recognized second audio file and its starting position from the music recognition server, it can start playing the second audio file from the ending position of the sound. Wherein, the start playback position of the second audio file corresponds to the end position of the first audio file. In this way, the operation steps for the user to trigger the terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.

Description of the drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.

FIG. 1A is a schematic structural diagram of a smart terminal provided by an embodiment of the present application;

FIG. 1B is a software structure block diagram of a smart terminal provided by an embodiment of the present application;

FIG. 1C is a schematic structural diagram of a smart home device provided by an embodiment of the present application;

FIG. 1D is a schematic structural diagram of a vehicle-mounted device provided by an embodiment of the present application;

Figure 2 is a user interface for displaying application menus on a smart terminal provided by an embodiment of the present application;

3A-3B are some user interfaces that display recognition results provided by embodiments of the present application;

FIG. 3C is a user interface displayed when a smart terminal is in a locked state according to an embodiment of the present application;

3D-FIG. 3F are other user interfaces that display recognition results provided by embodiments of the present application;

FIG. 3G is a user interface for humming recognition provided by an embodiment of the present application;

4A-4B are some user interfaces for setting the humming recognition function provided by embodiments of the present application;

5A-5C are some other user interfaces for setting the humming recognition function provided by the embodiments of the present application;

Figures 5D-5F are some user interfaces for setting access rights for the humming recognition function provided by embodiments of the present application;

FIG. 5G is a user interface for entering voiceprint information provided by an embodiment of the present application;

6A-6B are other user interfaces for setting the humming recognition function provided by the embodiments of the present application;

FIG. 6C is another user interface for entering voiceprint information provided by an embodiment of the present application;

7A-7B are user interfaces for setting the humming recognition function on some vehicle-mounted devices provided by the embodiments of the present application;

FIG. 7C is another user interface for entering voiceprint information provided by an embodiment of the present application;

8A-8B are user interfaces for displaying recognition results on some vehicle-mounted devices provided by embodiments of the present application;

Fig. 9 is a flowchart of a humming recognition method provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

Among them, in the description of the embodiments of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this document is only a description of related objects The association relationship of indicates that there can be three relationships, for example, A and/or B, which can indicate: A alone exists, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

First, some concepts involved in this application are specifically introduced.

Humming recognition is a way to perform audio retrieval through music fragments hummed by users. The working principle of humming recognition is: the electronic device obtains a music piece hummed by the user, and then sends the music piece to the server. The server matches the audio file that is most similar to the user's humming record through the similarity. Then, the server The audio file is fed back to the electronic device. Optionally, the server extracts a feature (for example, a fundamental frequency sequence) from a music segment, and then uses the feature to perform a search, and matches an audio file that is most similar to the user's humming segment from a pre-stored audio resource library. Since the user's humming segment cannot be completely similar to the actual audio file segment in the library, humming recognition is a fuzzy match. For fuzzy matching, string edit distance and dynamic time warping (DTW) algorithms can be used to improve the accuracy of recognition.

User interface (UI) is a medium interface for interaction and information exchange between applications or operating systems and users. It realizes the conversion between the internal form of information and the form acceptable to users. The user interface of the application is the source code written in a specific computer language such as java, extensible markup language (XML), etc. The interface source code is parsed and rendered on the electronic device 300, and finally presented to the user can be recognized Content, such as pictures, text, buttons and other controls. Controls are the basic elements of the user interface. Typical controls include buttons, widgets, toolbars, menu bars, text boxes, and scroll bars. scrollbar), pictures and text. The attributes and content of the controls in the interface are defined by tags or nodes. For example, XML specifies the controls contained in the interface through nodes such as <Textview>, <ImgView>, and <VideoView>. A node corresponds to a control or attribute in the interface, and the node is parsed and rendered as user-visible content. In addition, many applications, such as hybrid applications, usually include web pages in their interfaces. A webpage, also called a page, can be understood as a special control embedded in the application program interface. The webpage is source code written in a specific computer language, such as hypertext markup language (HTML), cascading style Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc., web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions. The specific content contained in a web page is also defined by tags or nodes in the source code of the web page. For example, HTML defines the elements and attributes of the web page through <p>, <img>, <video>, and <canvas>.

The commonly used form of user interface is a graphical user interface (GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of the electronic device.

The following embodiments of the present application provide a humming recognition method and electronic device, which can enable the electronic device to follow the user's humming and play the audio file corresponding to the music fragment while the user is humming a music fragment, thereby reducing the user triggering the terminal Perform the operation steps of humming recognition to improve the efficiency of humming recognition.

In the following embodiments of the present application, electronic devices (for example, smart terminals, smart homes, in-vehicle devices, etc.) will perform the humming recognition operation provided in the embodiments of the present application when they have the authority for humming recognition. The implementation process of the humming recognition operation can refer to the following steps: first, the electronic device collects the sound in the external environment through the audio input module (for example, a microphone); then, if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is consistent, the electronic device sends the first audio file containing this sound to the music recognition server for humming recognition, so as to identify the audio file that matches the music segment hummed by the user, and determine the audio file Start playback position. Wherein, the start playback position of the recognized audio file corresponds to the end position of the first audio file. After the electronic device receives the audio file fed back by the music recognition server and the information containing the initial play position, the electronic device can play the audio file from the initial play position, thereby achieving the purpose of playing the audio file following the user's humming. The following content will further introduce the system architecture and implementation process for performing the humming recognition operation, which will not be specifically expanded here.

In the following embodiments of the present application, before the electronic device performs the humming recognition operation provided by the embodiments of the present application, it needs to determine whether its own audio input module and/or audio output module is occupied, if its own audio input module and/or audio The output module is occupied, for example, to play audio/video, make a call, perform voice navigation, etc., the electronic device does not perform the humming recognition operation provided in the embodiment of this application; if the electronic device’s own audio input module and/or audio If the output module is not occupied, the electronic device performs the humming recognition operation provided in the embodiment of the present application. Optionally, after the audio input module and/or audio output module of the electronic device is released, for example, the audio/video playback ends, the phone hangs up, the voice navigation ends, etc., the electronic device can execute the Humming recognition operation. It can also be understood that the priority of the humming recognition operation provided in the embodiments of the present application is lower than the priority of the operations of the electronic device other than the humming recognition operation that need to occupy the audio input module and/or the audio output module.

In the following embodiments of the present application, when the electronic device performs the humming recognition operation provided by the embodiments of the present application, if it detects other requests for audio resources that require the operation of the audio input module and/or audio output module, The electronic device invokes the audio input module and/or audio output module to perform an operation corresponding to the request. In particular, if the time required for the request to occupy the audio output module is less than the preset value (for example, 1 second), for example, the request can be a notification sound (for example, a short message sound, an application push sound) , The humming recognition operation can occupy the audio input module, and at the same time, the operation corresponding to the request occupies the audio output module.

In the following embodiments of the present application, the humming recognition operation provided by the embodiments of the present application may be a system application or a third-party application of the electronic device. In a possible implementation, the system application or the third-party application may be dedicated to performing the humming recognition operation provided in the embodiment of the present application; in another possible implementation, the system application or the third-party application The third-party application can also execute other services (or functions). The humming recognition operation provided in the embodiment of the present application is only integrated into the system application or the third-party application as a service (or function).

It is understandable that "humming recognition" is only a name used in this embodiment, and its representative meaning has been recorded in this embodiment, and its name does not constitute any limitation to this embodiment. For example, in some possible implementations, "humming recognition" may also be referred to as "listening to song recognition", "humming retrieval" and other names.

In the embodiment of the present application, the electronic device that performs the humming recognition operation may be a smart terminal, a smart home device, or a vehicle-mounted device. The following first introduces an exemplary smart terminal 100 provided in the following embodiments of the present application.

FIG. 1A shows a schematic diagram of the structure of the smart terminal 100.

The smart terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2. , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera module 193, display 194 , And subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an acceleration sensor 180C, a distance sensor 180D, a proximity light sensor 180E, a fingerprint sensor 180F, a touch sensor 180G, an ambient light sensor 180H, and so on.

It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the smart terminal 100. In other embodiments of the present application, the smart terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a central processing unit (CPU), and a graphics processing unit (GPU). , Neural network processor (neural-network processing unit, NPU), modem processor, image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor) processor, DSP), baseband processor, etc. Among them, the different processing units may be independent devices or integrated in one or more processors. In some embodiments, the smart terminal 100 may also include one or more processors 110.

The controller may be the nerve center and command center of the smart terminal 100. The controller can generate operation control signals according to the instruction operation code and timing signals, and complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated access is avoided, the waiting time of the processor 110 is reduced, and the efficiency of the smart terminal 100 is improved.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.

The I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may couple the touch sensor 180K, charger, flash, camera module 193, etc., through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the smart terminal 100.

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.

The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera module 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera module 193 communicate through a CSI interface to implement the camera function of the smart terminal 100, so as to obtain the user's mouth shape information. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the smart terminal 100.

The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera module 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the smart terminal 100, and can also be used to transfer data between the smart terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect to other smart terminals, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the smart terminal 100. In other embodiments, the smart terminal 100 may also adopt different interface connection modes in the above-mentioned embodiments, or a combination of multiple interface connection modes.

The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some embodiments of wireless charging, the charging management module 140 may receive the wireless charging input through the wireless charging coil of the smart terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the smart terminal through the power management module 141.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera module 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the smart terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the smart terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the smart terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 may receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the smart terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2. Exemplarily, the wireless communication module 160 may include a Bluetooth module, a Wi-Fi module, and the like. In a possible implementation manner, the smart terminal can determine its own location through the wireless communication module 160.

In some embodiments, the antenna 1 of the smart terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the smart terminal 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).

The smart terminal 100 can implement a display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphical rendering. The processor 110 may include one or more GPUs, which execute instructions to generate or change display information.

The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.

The smart terminal 100 can realize a camera function through a camera module 193, an ISP, a video codec, a GPU, a display screen 194, an application processor AP, a neural network processor NPU, and the like.

The camera module 193 can be used to collect color image data of the subject. The ISP can be used to process the color image data collected by the camera module 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera module 193.

In some embodiments, the photosensitive element of the camera of the color camera module may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats.

In some embodiments, the smart terminal 100 may include 1 or N camera modules 193, and N is a positive integer greater than 1. Specifically, the smart terminal 100 may include a front camera module 193 and a rear camera module 193. Among them, the front camera module 193 can usually be used to collect the photographer's own color image data facing the display screen 194, and the rear camera module 193 can be used to collect the photographic objects (such as people, landscapes, etc.) faced by the photographer. Color image data.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the smart terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The smart terminal 100 may support one or more video codecs. In this way, the smart terminal 100 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG)-1, MPEG-2, MPEG-3, MPEG-4, etc.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information and can continuously learn by itself. Through the NPU, applications such as intelligent cognition of the smart terminal 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the smart terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save audio files, photos, videos and other data in an external memory card.

The internal memory 121 may be used to store one or more computer programs, and the one or more computer programs include instructions. The processor 110 can run the above-mentioned instructions stored in the internal memory 121 to enable the smart terminal 100 to execute the smart terminal photographing preview method provided in some embodiments of the present application, as well as various functional applications and data processing. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store the operating system; the storage program area can also store one or more application programs (such as a gallery, contacts, etc.) and so on. The data storage area can store data (such as photos, contacts, etc.) created during the use of the smart terminal 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.

The smart terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.

The audio output module 170A, also called "speaker" and "speaker", is used to convert audio electrical signals into sound signals. The smart terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The audio output module 170B, also called "receiver" and "earpiece", is used to convert audio electrical signals into sound signals. When the smart terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.

The audio input module 170C, also called a "microphone" and a "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C. The smart terminal 100 may be provided with at least one microphone 170C. In other embodiments, the smart terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the smart terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions. In the embodiment of the present application, when the “humming recognition” function is enabled on the smart terminal 100, the microphone 170C can collect sound signals near the smart terminal 100.

In some embodiments, the CPU or digital processor or audio processor in the processor 110 may process the sound collected by the microphone 170C. In one embodiment, when the processor 110 determines that the voice collected within the preset time is a human voice, the processor 110 extracts voiceprint information from the voice. If the voiceprint information of the voice is compared with the pre-stored voiceprint If the information is consistent, the first audio file containing the sound is sent to the music recognition server through the mobile communication module 150 or the wireless communication module 160.

In some embodiments, the processor 110 includes a user portrait module. The user portrait module can collect user information of the user who uses the smart terminal. The user information may include the user's attributes (age, gender, occupation, etc.), life habits, User behavior and other information. In a possible implementation, the smart terminal can abstract the user information based on the user information to form a user tag, and send it to the server for storage. In another possible implementation manner, the smart terminal may send the user information to the server, and the server analyzes the user information to form a user tag and store it. Wherein, the user tag has a corresponding relationship with the user account (or called user ID) of the user who uses the smart terminal. In the embodiments of this application, users can be abstracted into labels based on their habit or preference for playing audio files, such as rock, folk songs, pop, etc., and favorite singers can also be recorded to form labels, for example, Li Zongsheng, Liang Jingru , Eason Chan and so on. In a possible implementation, the tag of the second audio file identified by the audio file is included in the user tag of the first user, and the first user may be a user who uses the smart terminal, or may be logged in to the smart terminal. The user corresponding to the user account on.

The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The smart terminal 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the smart terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The smart terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the movement posture of the smart terminal 100. In some embodiments, the angular velocity of the smart terminal 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the smart terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the smart terminal 100 through a reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes. In some embodiments, the smart terminal 100 may determine its own moving direction through the gyro sensor 180B, so as to improve the accuracy of determining its own position.

The acceleration sensor 180C can detect the magnitude of the acceleration of the smart terminal 100 in various directions (generally three-axis). When the smart terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of the smart terminal 100, and be used in applications such as horizontal and vertical screen switching, pedometers, etc. In some possible implementation manners, the user interface exemplified in the following embodiments may switch between horizontal and vertical screens as the posture of the smart terminal changes.

Distance sensor 180D, used to measure distance. The smart terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the smart terminal 100 may use the distance sensor 180D to measure distances to achieve rapid focusing and improve the accuracy of the acquired lip information.

The proximity light sensor 180E may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The smart terminal 100 emits infrared light to the outside through the light emitting diode. The smart terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the smart terminal 100. When insufficient reflected light is detected, the smart terminal 100 can determine that there is no object near the smart terminal 100. The smart terminal 100 may use the proximity light sensor 180E to detect that the user holds the smart terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 180F is used to sense the brightness of the ambient light. The smart terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180F can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180F can also cooperate with the proximity light sensor 180G to detect whether the smart terminal 100 is in a pocket to prevent accidental touch. In a possible implementation, when the smart terminal detects through the ambient light sensor 180F that the duration of the ambient light brightness being less than the preset value is greater than the preset time, the smart terminal stops collecting the external environment through the audio input module. sound. The fingerprint sensor 180G is used to collect fingerprints. The smart terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking to release the locked state of the smart terminal 100.

The touch sensor 180H can also be called a touch panel or a touch-sensitive surface. The touch sensor 180H may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180H and the display screen 194, which is also called a “touch screen”. The touch sensor 180H is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180H may also be disposed on the surface of the smart terminal 100, which is different from the position of the display screen 194.

The button 190 includes a power button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The smart terminal 100 may receive key input, and generate key signal input related to user settings and function control of the smart terminal 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photographing, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.

The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the smart terminal 100. The smart terminal 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The smart terminal 100 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the smart terminal 100 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the smart terminal 100 and cannot be separated from the smart terminal 100.

The smart terminal 100 exemplarily shown in FIG. 1A can display various user interfaces described in the following embodiments through a display screen 194. The smart terminal 100 can detect touch operations in each user interface through the touch sensor 180H, such as a click operation in each user interface (such as a touch operation on an icon, a double-click operation), and for example, up or down in each user interface. Swipe down, or perform circle-drawing gestures, etc. In some embodiments, the smart terminal 100 may detect a motion gesture performed by the user holding the smart terminal 100 by hand, for example, shaking the smart terminal through a gyroscope sensor 180B, an acceleration sensor 180C, etc. In some embodiments, the smart terminal 100 can detect non-touch gesture operations through the camera module 193 (such as a 3D camera, a depth camera).

The software system of the smart terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the smart terminal 100 by way of example.

FIG. 1B is a software structure block diagram of a smart terminal 100 provided by an embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.

The application layer can include a series of application packages.

As shown in Figure 1B, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.

The application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 1B, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.

The window manager is used to manage window programs. The window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.

The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls that display text and controls that display pictures. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the smart terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, smart terminals vibrate, and indicator lights flash.

Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.

The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.

The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, G.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.

The software system shown in Figure 1B involves the presentation of applications that use sharing capabilities (such as gallery and file manager), instant sharing modules that provide sharing capabilities, and print services and print spooler that provide printing capabilities. , And the application framework layer provides printing framework, WLAN service, Bluetooth service, and the core and bottom layer provide WLAN Bluetooth capabilities and basic communication protocols.

In the following, in conjunction with a scenario of setting the humming recognition authority, the workflow of the software and hardware of the smart terminal 100 is exemplified.

When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch touch operation, and the control corresponding to the touch operation is the switch control of the humming recognition function as an example, the humming recognition application calls the interface of the application framework layer, starts the humming recognition application, and then starts by calling the kernel layer The microphone is driven, and the sound in the external environment is collected through the microphone 170C.

FIG. 1C exemplarily shows a schematic structural diagram of a smart home device 110 provided in an embodiment of the present application.

For example, the smart home device may be a device such as a smart speaker or a smart TV. As shown in FIG. 1C, the smart home device 110 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, an RJ45 communication processing module 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:

The processor 102 can be used to read and execute computer readable instructions. In specific implementation, the processor 102 may mainly include a controller, an arithmetic unit, and a register. Among them, the controller is mainly responsible for instruction decoding, and sends out control signals for the operation corresponding to the instruction. The arithmetic unit is mainly responsible for performing fixed-point or floating-point arithmetic operations, shift operations and logical operations, etc., and can also perform address operations and conversions. The register is mainly responsible for storing the register operands and intermediate operation results temporarily stored during the execution of the instruction. In a specific implementation, the hardware architecture of the processor 102 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, etc.

In some embodiments, the processor 102 may be used to parse the signal received by the wireless communication processing module 104, for example, a request to modify setting information sent by the smart terminal 100, the recognized audio file sent by the music recognition server, and instructions Indication information of the starting playback position, etc. The processor 102 may be configured to perform corresponding processing operations according to the analysis result, such as modifying the setting information of the smart home device 110 according to the request, or playing the recognized audio file from the playback position, and so on.

In some embodiments, the processor 102 may also be used to process sounds in the external environment collected by the smart home device 110. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.

In some embodiments, the processor 102 may also be used to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the smart terminal 100 for feedback of the recognition status (such as successful recognition, recognition failure, etc.).

The memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as an embedded operating system such as DuerOS and AliGenie. The memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.

The wireless communication processing module 104 may include one or more of the Bluetooth (BT) communication processing module 104A and the WLAN communication processing module 104B.

In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover smart home device 110 and establish a wireless communication connection with other devices through one or more of Bluetooth or WLAN wireless communication Technology to communicate with other devices.

In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the smart home device 110, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.

The wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown). The cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.

The power switch 105 can be used to control the power supply to the smart home device 110.

The RJ45 communication processing module 106 may be used to process data received or sent through the RJ45 interface. RJ45 interface is mainly used to connect modem modem.

The USB interface 107 can be used to communicate with other devices (for example, a computer, a notebook computer, etc.) through a data cable.

The audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals. In a possible implementation manner, the smart home device 110 may receive a voice command input by the user through the audio input module 108, and in response to the voice command, the smart home device performs an operation corresponding to the voice command.

The audio output module 109 is used to convert audio electrical signals into sound signals, and the smart home device 100 can play the sound signals through the audio output module 109.

In a possible implementation, the smart home device 110 may further include a display screen 110 (not shown), and the display screen 110 may be used to display images, videos, and the like. The display screen 110 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.

It can be understood that the structure illustrated in FIG. 1C does not constitute a specific limitation on the smart home device 110. In other embodiments of the present application, the smart home device 110 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

FIG. 1D exemplarily shows a schematic structural diagram of a vehicle-mounted device 120 provided in the present application.

For example, the vehicle-mounted device may be a vehicle-mounted speaker or a vehicle-mounted computer. As shown in FIG. 1C, the vehicle-mounted device 120 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, a display screen 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:

In some embodiments, the processor 102 may also be used to process the sound in the external environment collected by the vehicle-mounted device 120. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.

The memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as embedded operating systems such as uCLinux, GENIVI, and ecos. The memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.

In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover the vehicle-mounted device 120 and establish a wireless communication connection with other devices through one or more wireless communication technologies in Bluetooth or WLAN Communicate with other devices.

In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the in-vehicle device 120, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.

The power switch 105 can be used to control the power supply to the vehicle-mounted device 120 from the power source.

The display screen 110 can be used to display images, videos, etc. The display screen 110 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.

The USB interface 107 can be used to communicate with other devices such as a display, the smart terminal 100 or an audio external device through a data line.

The audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals. In a possible implementation manner, the in-vehicle device 120 may receive a voice instruction input by the user through the audio input module 108, and in response to the voice instruction, the in-vehicle device performs an operation corresponding to the voice instruction.

The audio output module 109 is used to convert audio electrical signals into sound signals, and the vehicle-mounted device 120 can play the sound signals through the audio output module 109.

In some embodiments, the in-vehicle device 120 may also include a serial interface such as an RS-232 interface. The serial interface can be connected to other devices, such as speakers and other audio playback devices, so that the audio playback devices cooperate to play the recognized audio files.

It can be understood that the structure illustrated in FIG. 1C does not constitute a specific limitation on the in-vehicle device 120. In other embodiments of the present application, the in-vehicle device 120 may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The following describes an exemplary user interface on the smart terminal 100 for displaying application menus.

FIG. 2 exemplarily shows a user interface 21 of the smart terminal 100 for displaying an application menu. As shown in FIG. 2, the user interface 21 may include: a status bar 201, a tray 217 with icons of commonly used applications, a calendar widget 213, a weather widget 215, and other application icons. among them:

The status bar 201 may include: one or more signal strength indicators 203 for mobile communication signals (also called cellular signals), one or more signal strength indicators 205 for wireless fidelity (Wi-Fi) signals , Battery status indicator 209, time indicator 211.

The calendar widget 213 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.

The weather widget 215 can be used to indicate the type of weather, such as cloudy to clear, light rain, etc., and can also be used to indicate information such as temperature.

The tray 217 with icons of commonly used application programs can display: a phone icon 219, a contact icon 221, a short message icon 223, and a camera icon 225.

Other application icons can be for example: Wechat icon 227, QQ icon 229, Twitter icon 231, Facebook icon 233, mailbox icon 235, cloud sharing icon 237, memo The icon 239 of, the icon 221 of Alipay, the icon 225 of gallery, and the icon 227 of settings. The user interface 21 may also include a page indicator 229. Other application icons can be distributed on other pages. The page indicator 229 can be used to indicate the number of pages and which page the user is currently browsing. For example, the page indicator 229 displays 3 small dots, and the second dot is Black, the other two small dots are white, indicating that the current mobile phone includes 3 pages, and the user is browsing the second page. In addition, users can swipe left and right on the current page to browse application icons on other pages. In some embodiments, the user interface 21 exemplarily shown in FIG. 2 may be a user interface in the home screen.

In some other embodiments, the smart terminal 100 may also include a home screen key. The main screen key can be a physical key or a virtual key. The home screen key can be used to receive a user's instruction, and in response to the user's instruction, return the currently displayed UI to the main interface, so that the user can view the home screen at any time. The above instruction can be an operation instruction for the user to press the home screen key once, or an operation instruction for the user to press the home screen key twice in a short period of time, or the user long press the home screen key within a predetermined time Operation instructions. In some other embodiments of the present application, the home screen key can also be integrated with a fingerprint recognizer, so that when the home screen key is pressed, fingerprints are collected and recognized.

It is understandable that FIG. 2 only exemplarily shows the user interface on the smart terminal 100, and should not constitute a limitation to the embodiment of the present application.

Next, some embodiments of humming recognition provided in the embodiments of the present application will be introduced.

In the embodiment of the present application, the smart terminal 100 can play the recognized audio file following the user's humming progress. In addition, the smart terminal 100 can display the recognition result through the display screen 194. The recognition result may be displayed when the smart terminal 100 is in use, or may be displayed when the smart terminal 100 is in a locked state. The following examples of these two application scenarios will be further introduced. It should be noted that in the embodiment of the present application, the smart terminal 100 is in use, which means that the smart terminal 100 is being used by the user, and the display screen 194 of the smart terminal 100 is always on. The display screen 194 can display the desktop and the application program interface. , Pull down notification bar, negative one screen and other user interfaces. When the smart terminal 100 is in the locked state, it means that the screen of the smart terminal is locked. In most cases, the smart terminal 100 needs to receive a password input by the user or verify other unlocking methods (for example, fingerprint unlocking, face unlocking, etc.) before unlocking. Generally, the user can turn off the screen of the smart terminal 100 and enter the locked state by clicking the power button of the smart terminal 100 or clicking the virtual control of "lock screen". In addition, the lock screen interface refers to a user interface displayed by the smart terminal 100 after the smart terminal 100 enters the locked state and before the lock is unlocked. While the smart terminal 100 is in the locked state, the smart terminal may display a lock screen interface, or it may be in the off-screen (or referred to as black screen) state.

First, an embodiment of displaying the recognition result when the smart terminal 100 is in use is introduced.

FIG. 3A exemplarily shows the user interface 31 displaying the recognition result in the use interface of the application program of the smart terminal 100. It should be noted that the application program that performs the humming recognition operation and the application program in use may be the same application program or different application programs, which is not limited in the embodiment of the present application. In addition, this application does not limit the application in use, which can be WeChat, QQ, Weibo, mailbox and other applications. Illustratively, the chat interface during the use of WeChat is taken as an example in FIG. 3A. As shown in FIG. 3A, the user interface 31 may include: a display area 318, an input area 319, and a notification window 315. among them:

The display area 318 may be used to display chat content, and the chat content may include text/voice communication content between the user using the smart terminal 100 and the user of another social account.

The input area 319 can be used to input chat content. The input area 319 can include a first control 319A, a second control 319B, a third control 319C, and a fourth control 319D. The first control 319A is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays a voice input button. Generally, the user can input voice information by long pressing the voice input button. It should be noted that when the voice input button receives the user's operation, the smart terminal 100 needs to collect the voice information input by the user. The audio input module will be occupied by the voice input service of the social application, and the smart terminal 100 will not execute this application. The humming recognition operation provided by the embodiment. The second control 319B is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays a keyboard/handwriting pad. Generally, the smart terminal 100 can receive text information input by the user through the keyboard/handwriting pad. The third control 319C is used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a plurality of emoticons/motion pictures for the user to select. The fourth control 319D is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays multiple input type selection boxes, such as pictures, shooting, documents, red envelopes, video calls, etc., for the user to choose . Similarly, when the "shooting" or "video call" selection box receives the user's operation, the smart terminal 100 needs to collect the audio and video information input by the user, and the audio input module and/or audio output module will be used by the social application. The video input service is occupied, and the smart terminal 100 does not perform the humming recognition operation provided in the embodiment of the present application.

The notification window 315 is used to display the recognition result of the music segment hummed by the user. The notification window 315 may include: a humming recognition icon 316, a first display area 314, a playback control 310, and a control 312.

Among them, the humming recognition icon 316 is used to indicate the source of the notification window 315, in order to facilitate the user to quickly understand that the notification window 315 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 316 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.

The first display area 314 can be used to display the identification information of the recognized audio file, and can provide the user with more information about the recognized audio file. Wherein, the identification information of the audio file may be the song name, lyrics, artist name, album name, album cover picture, artist poster, etc. of the audio file. As shown in the figure, the first display area 314 contains the name of the song "Across the Ocean to See You"). The first display area 314 may also include operation instruction information (for example, "click to stop playback" contained in the first display area 314), which may provide the user with operation reminders and improve the convenience of user operations. Optionally, the first display area 314 may also contain singer information or lyrics information of the currently played audio file. In another possible situation, the smart terminal 100 may also display the lyrics information of the currently played audio file in the form of a floating window. The floating window is a movable window displayed floating in the display interface of the smart terminal 100.

The play control 310 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 pauses or continues to play the audio file. Specifically, after the smart terminal 100 recognizes the audio file corresponding to the music clip hummed by the user, it will play the audio file following the user’s singing progress. At this time, the playback control 310 displays the first state, which indicates the audio The file is being played. Optionally, during the playback of the recognized audio file, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application. When the play control 310 is displayed in the first state, if the play control 310 receives a user's operation, the smart terminal 100 pauses playing the audio file and displays the play control 310 in the second state, which indicates The audio file is paused. It is understandable that when the player 310 is displayed in the second state, if the player control 310 receives a user's operation, the smart terminal 100 continues to play the audio file and displays the player control 310 in the first state.

The control 312 may be used to receive a user's operation. In response to the user's operation, the smart terminal 100 pauses playing the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal. In a possible situation, in the process of reacquiring the sound signal, the smart terminal 100 may display a prompt message (for example, "recognizing..."), which is used to indicate that the smart terminal 100 is reacquiring the sound. Signal for humming recognition. In another possible situation, in response to the user's operation on the control 312, the smart terminal 100 pauses playing the audio file and jumps to the user interface 35 for displaying humming recognition. The user interface 35 will be specifically introduced later, and the description will not be expanded here.

In a possible implementation manner, the notification window 315 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, and so on. Or, when the notification window 315 receives the user's upward sliding operation, the smart terminal 100 no longer displays the notification window 315 in the user interface 31 in response to the operation.

In a possible implementation manner, the notification window 315 may also be displayed in a drop-down notification bar. For this display manner, refer to FIG. 3B. As shown in FIG. 3B, when a downward sliding gesture on the status bar 201 is detected, in response to the gesture, the smart terminal 100 may display a pull-down notification bar 318 on the user interface 21, and the pull-down notification bar 318 includes a notification window 315. And control window 313, where:

For the notification window 315, refer to the description in FIG. 3A, which will not be repeated here.

The control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.). The control window 313 will be described in detail in the subsequent introduction to the setting interface of humming recognition, and will not be specifically expanded here.

In a possible situation, in the process of turning on the humming recognition function, an icon 311 for humming recognition is displayed in the status bar 201. It can be understood that the status bar 201 may be included in multiple display interfaces of the smart terminal 100. In this way, it is convenient for the user to know the on state of the humming recognition function through multiple display interfaces of the smart terminal 100.

Next, an embodiment of displaying the recognition result when the smart terminal 100 is in the locked state is introduced.

Fig. 3C exemplarily shows the user interface 32 displayed when the smart terminal 100 is in the locked state. The user interface 32 may also be referred to as a lock screen interface. As shown in FIG. 3C, the user interface 32 includes a status bar 201, a calendar widget 213, and a lock screen wallpaper 523. among them:

The status bar 201 can refer to the description in FIG. 2, which will not be repeated here. In particular, the status bar 201 here includes a humming recognition icon 311 and a lock icon 323. The humming recognition icon 311 is used to indicate that the humming recognition function is on, and the lock icon 323 is used to indicate that the smart terminal 100 is in a locked state.

The calendar widget 213 can refer to the description in FIG. 2, which will not be repeated here. Optionally, the user interface 32 may also include a weather widget 215.

The lock screen wallpaper 523 may be a picture set by the user, or a picture preset by the smart terminal 100, or a picture downloaded by the smart terminal 100 from the network.

FIG. 3D exemplarily shows yet another user interface 32 displaying the recognition result.

As shown in FIG. 3D, when the smart terminal is in the locked state, when the smart terminal 100 recognizes the audio file of the music clip hummed by the user, the smart terminal 100 displays a notification window 324 above the user interface 32, and the notification window 324 can be Including: humming recognition icon 316, second display area 322, playback control 310, control 312, and volume control 328.

For the humming recognition icon 316, the playback control 310, and the control 312, please refer to the related description in FIG. 3A, which will not be repeated here.

The second display area 322 has the same function as the first display area 314 in FIG. 3A, and both can display the identification information of the recognized audio file. The difference is that the second display area 322 here not only includes the name of the audio file "Across the Ocean to See You", but also includes the singer of the audio file "Li Zongsheng", and the lyrics of the currently playing audio file. The message "A strange city, in a familiar corner...", in which the bold part of the lyrics message "No matter what you will face" is the lyrics part of the user's current singing. It is understandable that the lyric information will change with the playing progress of the audio file, so that the lyric information and the playing of the audio file are kept synchronized.

In a possible implementation manner, the tag of the second audio file is included in the user tag of the first user, and the meaning of the first user can refer to the above introduction. Different smart terminals may have different recognition results for the humming section of the same song. For example, when user 1 is singing "Across the Ocean to See You", the audio file recognized by the smart terminal of user 1 may be Li Zongsheng Singing version; while user 2 is singing "Across the Ocean to See You", the audio file recognized by user 2's smart terminal may be the version sung by Jingru Liang. It is understandable that the different recognition results are due to the different user tags of user 1 and user 2.

The volume control 328 can be used to adjust the volume of the playing audio file. The volume control 328 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 adjusts the volume of the played audio file. Optionally, when the received user operation is sliding to the left, the smart terminal 100 reduces the volume of playing the audio file; when the received user operation is the sliding to the right, the smart terminal 100 increases the volume of playing the audio file. In a possible implementation, the ratio of the distance from the audio file control 328 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.

In a possible implementation manner, the smart terminal 100 will make the volume of the played audio file change from the time when it starts to play the audio file to the preset time (for example, the 5th second, the 6th second, etc.) Gradually increase from low to high. For example, gradually increasing from the minimum value of the volume to the volume value set by the user, or gradually increasing from 30% of the volume value set by the user to 100% of the volume value set by the user, there may be other volume levels The increase mode is not limited in the embodiment of this application. It should be noted that the volume value set by the user is the volume value indicated by the audio file control 328. Optionally, the volume value adjusted by the user is the volume value set by the user, which is the most recent one.

In a possible implementation manner, the notification window 324 disappears after the recognized audio file is played.

In a possible implementation manner, the smart terminal 100 may also display the content of the notification window 324 above the user interface 32 in the form of a user interface, and the user interface may be as shown in the user interface 33 of FIG. 3E. In FIG. 3E, the user interface 33 contains the content contained in the notification window 324, for example, the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328. Optionally, the user interface 32 may also display a background picture. For example, the background picture may be a poster of a song singer, a picture of an album included in the recognized audio file, and so on. Optionally, when the user interface 33 receives a user's operation (for example, a sliding operation to the right), the smart terminal 100 displays the user interface 32 below the user interface 33 (that is, the lock screen interface).

FIG. 3F exemplarily shows yet another user interface 34 displaying the recognition result.

In one embodiment, in the user interface 31 or the user interface 21, if a user input operation for the notification window 315 is detected (for example, a click operation, a long press operation, a press operation, etc.), or in the user interface In 32, if an operation (for example, a click operation, a long press operation, a press operation, etc.) input by the user for the notification window 324 is detected, the smart terminal 100 displays the user interface 34. Optionally, before the user interface 32 jumps to the user interface 34, the smart terminal 100 receives an unlocking operation input by the user (for example, fingerprint unlocking, password unlocking, face unlocking, etc.). In the case of successful unlocking, the smart terminal 100 The terminal 100 performs an operation of jumping from the user interface 32 to the user interface 34.

The user interface 34 includes: a humming recognition icon 316, a second display area 322, a playback control 310, a control 312, a volume control 328, a control 330, a control 332, and a control 334, of which:

Among them, the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328 can refer to the above description, and will not be repeated here.

The control 330 can be used to collect the recognized audio files. The control 330 can receive a user's operation. In response to the user's operation, the smart terminal 100 adds the identified audio file identifier to a preset favorite (or a folder called "favorite music", which is not included in this application). Restrictions), it is convenient for the user to find or play the identified audio file next time.

The control 332 can be used to download the recognized audio file. The control 332 can receive a user's operation, and in response to the user's operation, the smart terminal 100 downloads the audio resource of the identified audio file from the network. Optionally, in response to the user's operation, the smart terminal 100 displays a selection box that contains sound quality options such as "standard quality", "high quality", and "lossless quality". The selection box is used to receive a user's selection operation of an option, and in response to the user's selection operation of an option, the smart terminal 100 downloads audio resources of sound quality corresponding to the option.

The control 334 can be used to share the recognized audio file. The control 334 can receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a sharing box, which contains multiple sharing objects, such as QQ, WeChat, Weibo, Twitter, and so on. The sharing box is used to receive a user's selection operation of a sharing object. In response to the user's selection operation of a sharing object, the smart terminal 100 sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.

Fig. 3G exemplarily shows a user interface 35 for humming recognition.

In one embodiment, in the user interface 21, the user interface 31, the user interface 32, the user interface 33, and the user interface 34, if a user operation (for example, a click operation) on the control 312 is detected, the smart terminal 100 displays User interface 35 for humming recognition.

The user interface 35 includes a humming recognition icon 316, an indicator 350, a control 352, and a control 354. among them:

The humming recognition icon 316 can refer to the above description, and will not be repeated here.

The indicator 350 may indicate the time information of the music segment that the user has hummed. The time information changes as the time the user hums the audio file increases, and is synchronized with the time length of the user humming. The indicator 350 can also instruct the user to enter the operation prompt information of the voice signal (for example, the "more accurate recognition of humming a few sentences" contained in the indicator 350), and can provide the user with operation reminders, so as to improve the recognition of humming accuracy. The operation instruction information may also be other content. For example, when it is detected that the volume of the user's voice is low, the operation instruction information such as "increase the volume (or sound near the device) for more accurate recognition" may be displayed.

The control 352 may be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C. When detecting that the user's finger leaves the display screen 194, the smart terminal 100 performs humming recognition according to the collected sound signal. Optionally, when the smart terminal 100 receives the recognized audio file from the music recognition server, the smart terminal 100 may display the user interface 34 for displaying the recognition result.

The above introduces some user interfaces for displaying recognition results and performing humming recognition in the smart terminal 100. In this embodiment of the present application, before the smart terminal 100 can implement the humming recognition function, the user can use the smart terminal 100 The setting interface of the humming recognition function can be turned on or off. The following will introduce some setting interfaces of humming recognition.

FIG. 4A exemplarily shows a user interface 41 for setting the humming recognition function.

Similar to the way of displaying the pull-down notification bar 318 described above, when a downward sliding gesture on the status bar 201 is detected, in response to the gesture, the smart terminal 100 can display the pull-down notification bar 401 on the user interface 41, and the pull-down notification The column 401 includes a control window 313, in which:

The control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.). The switch control 317 has two display states. The first display state (also known as "ON" state) indicates that the humming recognition function is turned on, and the second display state (also known as "OFF" state) Indicates that the humming recognition function is off. When the display state of the switch control 317 is the second display state, when the smart terminal 100 detects an operation on the switch control 317 in the control window 318 (such as a touch operation on the switch control 317), it responds to the In operation, the smart terminal 100 can turn on "humming recognition" and adjust the display state of the switch control 317 to the first display state. When the display state of the switch control 317 is the first display state, when the smart terminal 100 detects an operation on the switch control 317 in the control window 318, in response to the operation, the smart terminal 100 can turn off the "humming recognition" ", and adjust the display state of the switch control 317 to the second display state. In this way, it is convenient for users to turn on/off the humming recognition function.

FIG. 4B exemplarily shows another user interface 42 for setting the humming recognition function.

As shown in FIG. 4B, the user interface 42 includes a display area 410, which is used to display multiple settable options, such as "airplane mode", "Wi-Fi", "Bluetooth" and so on. The display area 410 also includes multiple switch controls and multiple jump controls. The switch control 412 and the jump control 416 are used to introduce the functions of the two controls. Among them:

The switch control 412 can be used to receive a user's operation (for example, click operation, sliding operation, etc.), and in response to the user's operation, the smart terminal 100 changes the function/service/application corresponding to the switch control 412 (ie, humming Recognition function). For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and turns off the humming recognition function.

The jump control 416 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 jumps to the setting interface of the function/service/application (ie, do not disturb mode) corresponding to the jump control 416, which needs to be explained Yes, the device interface can include multiple setting options for the "Do Not Disturb Mode" function, for example, the adjustment of the opening state of the Do Not Disturb mode, the setting of the opening time of the Do Not Disturb mode, and the setting of automatic reply in the Do Not Disturb mode. and many more.

5A-5C exemplarily show some user interfaces for setting the humming recognition function.

As shown in FIG. 5A, the user interface 51 includes a display area 522, which is similar to the display area 410 included in the user interface 41. The display area 522 is used to display multiple settable options, such as "airplane mode", "Wi-Fi "Bluetooth" and so on.

Different from the display area 410, the control corresponding to the humming recognition function is a jump control 520, which can be used to jump the user interface to the "humming recognition" setting interface. As exemplarily shown in FIGS. 5A-5B, the jump control 520 receives a user's operation (for example, a click operation), and in response to the user's operation, the smart terminal 100 jumps from the user interface 51 to the "humming recognition" setting Interface (ie, user interface 52).

As shown in FIG. 5B, the user interface 52 includes a return key 530, a switch control 532, text information 534, a switch control 536, a control 538, a control 540, a control 552, a plurality of jump controls (for example, a jump control 554), and a switch Control 556. among them:

The return key 530 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 returns to the previous interface of the current page, that is, the user interface 41 shown in FIG. 5A. Those skilled in the art should know that the previous interface of an interface is determined when the application program is set.

The function of the switch control 532 can refer to the function of the switch control 412 in FIG. 4B, which will not be repeated here.

The text information 534 can be used to describe the authority obtained by the smart terminal 100 after the humming recognition function is turned on, so that the user can determine whether to grant the smart terminal 100 the authority for humming recognition according to the description. The expression of the text information 534 can be changed as required, and there is no limitation here.

The switch control 536 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 performs an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 536 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation does not have an activation time set, and the humming recognition operation can always be running. Optionally, in this case, the smart terminal does not display the control 538 and the control 540. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal changes the display state of the switch control 536 to "ON", and displays the display control 538 and the control 540. The control 538 is used to receive the activation time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 performs the humming recognition operation provided in the embodiment of the present application after the activation time; the control 540 uses At the end time of receiving the humming recognition operation input by the user, in response to the user's operation, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application after the end time. It should be noted that when the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application, the user can still actively trigger the humming recognition in the manner in the prior art.

The control 522 can be used to add voiceprint information that can be used to enable humming recognition.

Jump 554 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 jumps from the user interface 52 to the setting interface of the voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.

The switch control 556 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. For example, if the display state of the switch control 556 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation cannot be used when the smart terminal 100 is locked, that is, the smart terminal 100 does not perform humming recognition when the smart terminal 100 is locked. operating. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 556 to "ON" and adjusts the usable state of the humming recognition function, that is, in the smart terminal 100 The humming recognition operation is also run when locked.

The switch control 557 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. When the switch control 557 is in the on state, the smart terminal can obtain its own position during the operation of the humming recognition function. The smart terminal 100 determines whether to stop collecting sounds in the external environment through the audio input module by determining whether the location where it is located is a preset location, or to determine whether to play the recognized audio file from the initial playback position. This determination method will be introduced in the follow-up content, and will not be specifically expanded here.

As exemplarily shown in FIGS. 5B-5C, the display screen 194 receives a user's operation (for example, an upward sliding operation), and in response to the user's operation, the smart terminal 100 displays the setting content of "humming recognition" under the switch control 556 . As shown in FIG. 5C, the user interface 52 also includes content for setting the access authority of the humming recognition function. Specifically, the user interface 52 also includes a jump control 558 and a plurality of switch controls (such as switch controls 560). among them:

The jump control 558 can be used to set the type of wireless data that the humming recognition function allows to access, such as off, WLAN, WLAN and cellular mobile data.

The switch control 560 can be used to set system functions (ie, location services) that the humming recognition function allows to access. For example, if the display state of the switch control 560 is “OFF” before receiving the user's operation, it indicates that when the humming recognition function is running, the position information of the smart terminal 100 cannot be obtained. After receiving the user's operation on the switch control 560, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 560 to “ON” and allows the humming recognition function to obtain the position information of the smart terminal 100. Similarly, other system functions for humming to identify access to western medicine can also be set by referring to the above method.

Figures 5D-5F exemplarily show some user interfaces for setting access rights for the humming recognition function.

As exemplarily shown in FIGS. 5D-5E, in response to the user's operation of the jump control 524, the smart terminal 100 jumps from the user interface 51 to the user interface 53, and the user interface 53 is used to display multiple system functions, for example, Bluetooth , Location service, microphone, gallery, etc. Among them, one system service corresponds to a jump control (for example, the system service "microphone" corresponds to the jump control 562).

As exemplarily shown in FIGS. 5E-5F, in response to the user's operation of the jump control 562, the smart terminal 100 jumps from the user interface 53 to the user interface 54, and the user interface 54 is used to display multiple applications that require access to the microphone . The user can control the permission of the application to access the microphone through the switch control corresponding to the application. For example, if the display state of the switch control 572 is "OFF" before receiving the user's operation, it indicates that the humming recognition function cannot access the microphone. After receiving the user's operation on the switch control 572, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 572 to "ON" and allows the humming recognition function to access the microphone. Similarly, the way other applications access system functions can also refer to the above way.

FIG. 5G exemplarily shows a user interface 55 for inputting voiceprint information.

In one embodiment, in response to the user's operation of the control 552 in the user interface 52, the smart terminal 100 jumps from the user interface 52 to the user interface 55 to enter the voiceprint information that the user wants to add. In the user interface 55, an indicator 570, text information 572, and a control 574 are included. among them:

The indicator 570 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.

The text information 572 is the text content that the user needs to read. Optionally, the smart terminal can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.

In a possible situation, the smart terminal can also instruct the user to sing several pieces of music to enter the voiceprint information. In this case, the content of the indicator 570 may be “please press and hold the button and sing the following song fragments to record voiceprint information”, and correspondingly, the text information 572 is a piece of lyrics.

The control 574 can be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C. When detecting that the user's finger leaves the display screen 194, the smart terminal 100 stores the sound signal collected during this period, and extracts the voiceprint information of the collected sound signal, and then the extracted sound Pattern information is stored.

The humming recognition operation provided in the embodiments of this application can also be applied to smart home devices (for example, smart speakers, televisions, etc.) and vehicle-mounted devices (for example, vehicle-mounted speakers), and the smart home devices or vehicle-mounted devices can execute this application The humming recognition operation provided by the embodiment. In a possible situation, the smart home device or vehicle-mounted device is not equipped with a display screen (for example, smart speakers, vehicle audio), and the user can recognize the humming on the smart home device or vehicle-mounted device through the smart terminal 100 Function to be set.

6A-6B exemplarily show some user interfaces for setting the humming recognition function. Optionally, these user interfaces may be interfaces in smart home applications.

As shown in FIG. 6A, the user interface 61 includes a display area 60. The display area 60 includes instruction information 600, reminder information 602, selection box 610, selection box 614, control 608, and display area 606, in which:

The instruction information 600 may be used to indicate the family information set by the user, and may also be text information such as "Annie's Home" and "Jack's Home".

The reminder information 602 may be used to remind the user of some abnormal situations that need to be paid attention to. The smart terminal 100 may generate corresponding reminder information according to the status of each smart home device. For example, if the anti-theft door has not been closed for a long time, the smart terminal 100 may display the reminding message 602. Or, if the remaining amount of the filter element of the air purifier is less than the preset value, the smart terminal 100 may display a reminder message "The filter element of the air purifier needs to be replaced", and so on.

The selection box 610 can display multiple optional home statuses for the user to choose, such as "going home", "leaving home", "sleeping", "reading" and "more". Corresponding to a home state, each furniture device may have a preset activation state. For example, if the user performs a selection operation on the selection box of "go home", in response to the selection operation, the smart terminal 100 controls the chandelier and the air conditioner in the living room to turn on. Optionally, the user can set the activation state of each home device in each home state, and can also customize more home states.

The selection box 604 can display multiple home spaces for the user to select, such as "all", "living room", "master bedroom", "second bedroom", and so on. The selection box 604 can receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the operation, the smart terminal 100 displays in the display box 606 the smart home devices corresponding to the selected home space. For example, if the selection box 604B receives the user's click operation, the smart terminal 100 displays the smart home equipment contained in the "living room" in the display box 606

The control 608 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays an interface for adding a smart home device. The user can enter the information of the new smart home device through the add interface.

The display area 606 may be used to display information of one or more smart home devices, and the information may include basic information such as pictures, names, and opening states. The display area 606 may also be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays the setting interface of the smart home device corresponding to the operation.

As exemplarily shown in FIGS. 6A-6B, the display area 606 receives a user's click operation, and in response to the user's click operation, the smart terminal 100 jumps from the user interface 61 to the user interface 62. The user interface 62 includes a return key 620, a switch control 622, a volume control 626, a switch control 628, a control 630, a switch control 620, a switch control 634, a control 636, a control 638, a control 640, and a jump control 642. among them:

The return key 620 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 returns to the previous page of the current page (ie, the user interface 61).

The switch control 622 may be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the on or off state of the smart speaker. The electronic device can control the smart speaker by sending a control instruction to instruct the smart speaker to perform an operation corresponding to the control instruction.

The volume control control 626 can be used to adjust the volume of the playing audio file. The volume control 626 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 controls the smart speaker to adjust the volume of the audio file played. Optionally, when the received user operation is a leftward sliding, the smart terminal 100 controls the smart speaker to reduce the volume of the audio file; when the received user operation is a rightward sliding, the smart terminal 100 controls the smart speaker to increase The volume of the audio file being played. In a possible implementation, the ratio of the distance from the audio file control 626 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the smart speaker.

The switch control 628 may be used to receive an operation input by a user, and in response to the user's operation, the smart terminal 100 controls the smart speaker to turn on the sound effect optimization function or turn off the sound effect optimization function.

The control 630 may be used to receive the user's input time operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to set the closing time as the time input by the user.

The switch control 620 can be used to receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the user's operation, the smart terminal 100 controls the smart speaker to change the on state of the humming recognition function. For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function of the smart speaker is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and controls the smart speaker to turn off the humming recognition function.

The switch control 634 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to perform an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 634 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation of the smart speaker has not set an activation time, and the humming recognition operation may be running all the time. Optionally, in this case, the smart terminal 100 does not display the control 636 and the control 638. After receiving the user's operation on the switch control 634, in response to the user's operation, the electronic device changes the display state of the switch control 634 to “ON” and displays the display control 636 and the control 638. Wherein, the control 636 is used to receive the activation time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 controls the smart speaker to execute, and sets the activation time of the humming recognition to the activation time input by the user; The control 638 is used to receive the end time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 controls the smart speaker to execute and sets the end time of the humming recognition operation as the end time input by the user.

In a possible implementation, the smart speaker itself may not be able to set the activation time of the humming recognition function. In this case, in response to the user's operation on the control 636, the smart terminal 100 sends the smart speaker to the smart speaker at the activation time. The speaker sends an instruction to enable the humming recognition function to control the smart speaker to turn on the humming recognition function; in response to the user's operation on the control 638, the smart terminal 100 sends an instruction to stop the humming recognition function to the smart speaker at the end time to control The smart speaker stops the humming recognition function.

The control 640 can be used to add voiceprint information that can be used to enable humming recognition.

The jump control 554 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 jumps from the user interface 62 to the setting interface of the voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.

In a possible situation, the voiceprint information used for matching in the smart speaker is the voiceprint information stored in the smart terminal 100. Optionally, after the smart terminal 100 receives the voice signal input by the user, the voiceprint information extracted from the voice signal is sent to a smart speaker capable of humming recognition for storage. In this way, the smart speaker with humming recognition function can use the voiceprint information stored in the electronic device to match the sound signal. In this case, the user interface for entering voiceprint information can refer to Figure 5G.

In another possible situation, the voiceprint information used for matching in the smart speaker is the voiceprint information extracted from the user's voice signal re-entered by the smart speaker. Fig. 6C exemplarily shows yet another user interface 63 for inputting voiceprint information. In response to the user's operation of the control 640, the smart terminal 100 jumps from the user interface 62 to the user interface 63. The user interface 63 includes:

The instruction information 650 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.

The text information 652 is text content that the user needs to read aloud. Optionally, the electronic device may display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.

In a possible situation, the electronic device can also instruct the user to sing several pieces of music to record the voiceprint information. In this case, the content of the indicator 650 may be "please approach the smart speaker, press and hold the play button, and sing the following song fragments to record voiceprint information." Correspondingly, the text information 652 is a piece of lyrics. It should be noted that the play button refers to the play button of the smart speaker, and the play button can be a physical button or a virtual button.

It should be noted that, not limited to smart speakers, other smart home devices (not limited to smart home devices without a display screen, but also smart home devices with a display screen) can recognize the humming function in the manner described above Make settings. Similarly, the function setting of the in-vehicle device can also be performed on the smart terminal 100. In this case, the above-mentioned method can also be referred to.

The above has introduced some user interfaces of the smart terminal 100 for setting the humming recognition function on the smart home device. In a possible implementation manner, a display screen is provided on the smart home device or the vehicle-mounted device, and its own humming recognition function can be set. The following introduces the setting interface for humming recognition on vehicle equipment.

Figures 7A-7B exemplarily show a user interface for displaying and setting the humming recognition function on the vehicle-mounted device.

FIG. 7A exemplarily shows a user interface 71 for displaying an application menu on the vehicle-mounted device. The user interface 71 may also be referred to as the main menu. As shown in FIG. 7A, the user interface 71 may include: a calendar widget 700, a status bar 702, a display area 708, and a control 706, among which:

The calendar widget 700 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.

The status bar 201 may include: a Bluetooth indicator 704, one or more signal strength indicators 705 of a wireless fidelity (wireless fidelity, Wi-Fi) signal, and a time indicator 703.

The display area 708 can be used to display multiple application icons, such as navigation icon 708A, phone icon 708B, music icon 708C, video icon 708D, gallery icon 708E, radio icon 708F, locomotive recorder icon 708G, set icon 708H.

The control 706 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps back from the current interface to the user interface 71 (that is, the main menu interface).

As shown in FIGS. 7A-7B, when a user's operation on the set icon 708H is received, in response to the user's operation, the in-vehicle device jumps from the user interface 71 to the user interface 72. The user interface 72 is a user interface for displaying the setting menu. The user interface includes multiple setting options, for example, "system setting 720", "user setting 722", "sound effect setting 724", "network setting 726", and "time setting 728". ,and many more. The content displayed in the display area 716 is the setting content corresponding to the setting option. Optionally, "system setting 720" may be a setting option selected by default. In this case, the content displayed in the display area 716 is the setting content corresponding to the system setting. Optionally, if a setting option receives a user's operation, in response to the user's operation, the display area 716 displays the setting content corresponding to the one setting option.

In a possible implementation manner, the display area 716 can receive a user's operation (for example, an upward or downward sliding operation), and in response to the operation, the display area 716 can display more settings. As shown in FIG. 7B, the content displayed in the display area 716 is the setting content of humming recognition.

The display area 716 may include a switch control 710, a control 712, and a control 714.

The switch control 710 can be used to turn on or turn off the humming recognition function.

The control 712 can be used to add voiceprint information that can be used to enable humming recognition. Optionally, the control 712 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps to a user interface for inputting voiceprint information, for example, the user interface user interface 73 shown as an example. The user interface 73 will be described in more detail later, which is not specifically expanded here.

Jump 714 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps from the user interface 72 to the setting interface of voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.

FIG. 7C exemplarily shows a user interface 73 for inputting voiceprint information.

In one embodiment, in response to the user's operation of the control 712 in the user interface 72, the in-vehicle device jumps from the user interface 72 to the user interface 73 to enter the voiceprint information that the user wants to add. In the user interface 73, instruction information 730 and text information 572 are included. among them:

The instruction information 730 may be used to provide prompt information for the user to instruct the user to enter voiceprint information. It should be noted that the play button is the play button of the speaker. In a possible situation, the play button of the speaker is a physical button around the display screen of the vehicle device.

The text information 732 is the text content that the user needs to read aloud. Optionally, the in-vehicle device can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.

In a possible situation, the vehicle-mounted device may also instruct the user to sing several pieces of music to record the voiceprint information. In this case, the content of the indicator 730 may be "please approach the speaker, press and hold the play button, and sing the following song fragments to record voiceprint information." Correspondingly, the text information 732 is a piece of lyrics.

After recording the voice signal input by the user, the vehicle-mounted device can extract the voiceprint information of the collected voice signal, and store the voiceprint information.

The above introduces the setting interface for humming recognition on vehicle equipment. It should be noted that it is not limited to the user interface introduced above. For the setting interface for humming recognition on vehicle equipment, you can also refer to the smart terminal described above. User interface 51, user interface 52, user interface 53, and user interface 54. Next, the user interface that displays the humming recognition result in the vehicle-mounted device is further introduced.

FIG. 8A exemplarily shows a user interface 81 for displaying recognition results on a vehicle-mounted device. As shown in FIG. 8A, when the vehicle-mounted device recognizes an audio file according to the music clip hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a notification window 842 on its current interface for displaying to the user For the recognition result of the hummed music segment, the notification window 842 may include: a humming recognition icon 840, a third display area 841, a playback control 843, and a control 844.

The humming recognition icon 840 is used to indicate the source of the notification window 842, in order to facilitate the user to quickly understand that the notification window 842 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 840 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.

The third display area 841 can be used to display the identification information of the recognized audio file. For example, the third display area 841 contains the name of the song "Across the Ocean to See You". The third display area 841 may also include operation instruction information, for example, "click to stop playback" included in the third display area 841, which can provide the user with an operation reminder and improve the convenience of the user's operation. Optionally, the third display area 841 may also contain singer information or lyrics information of the currently played audio file. In another possible situation, the vehicle-mounted device may also display the lyrics information of the currently playing audio file in the form of a floating window. The floating window is a movable window displayed floating in the display interface of the vehicle-mounted device.

The playback control 843 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses or continues to play the audio file. Specifically, after the in-vehicle device recognizes the audio file corresponding to the music segment hummed by the user, it will play the audio file in accordance with the user's singing progress. At this time, the play control 843 displays the first state. In the case where the playback control 843 is displayed in the first state, if the playback control 843 receives a user's operation, the vehicle-mounted device pauses playing the audio file and displays the playback control 843 in the second state. In the case where the playback control 843 is displayed in the second state, if the playback control 843 receives the user's operation, the vehicle-mounted device continues to play the audio file and displays the playback control 843 in the first state.

The control 844 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal.

In a possible implementation manner, the control 844 is not displayed in the in-vehicle device, and the instruction message "recognize the voice call "recognize" can be performed again for humming recognition". In this case, if the in-vehicle device detects the "re-recognition" of the voice information input by the user, the in-vehicle device will pause playing the audio file and perform humming recognition on the segment that the user hums again. As a result, the notification window is displayed again. In this way, no manual operation by the user is required, and it is convenient for the user to output an instruction to perform humming recognition again during driving.

In a possible implementation manner, the notification window 842 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, or the like. Or, when the notification window 842 receives the user's upward sliding operation, in response to the operation, the in-vehicle device no longer displays the notification window 842 in the user interface 81. Or, the notification window can disappear after the current song is played.

FIG. 8B exemplarily shows a user interface 82 for displaying recognition results on another vehicle-mounted device. As shown in FIG. 8B, when the vehicle-mounted device recognizes an audio file based on the music segment hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a user interface 82 on its current interface for displaying information to the user As a result of the recognition of the hummed music segment, the user interface 82 may include: a humming recognition icon 840, a third display area 841, a playback control 843, a control 844, a volume control 851, a control 853, and a control 854.

Among them, the humming recognition icon 840, the third display area 841, the playback control 843, and the control 844 can all refer to the description in FIG. 8A, and will not be repeated here.

The volume control 851 can be used to adjust the volume of playing audio files. The volume control 851 may be used to receive a user's operation, and in response to the operation, the vehicle-mounted device adjusts the volume of the played audio file. Optionally, when the received user operation is sliding to the left, the vehicle-mounted device reduces the volume of playing audio files; when the received user operation is a sliding to the right, the vehicle-mounted device increases the volume of playing audio files. In a possible implementation, the ratio of the distance from the audio file control 851 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.

The control 853 can be used to collect the recognized audio files. The control 853 can receive the user's operation. In response to the user's operation, the vehicle-mounted device adds the identified audio file identifier to the preset favorites (or the folder called "favorite music", which is not limited by this application). ), it is convenient for the user to search or play the identified audio file next time.

The control 854 can be used to download the recognized audio file. The control 854 can receive a user's operation, and in response to the user's operation, the in-vehicle device downloads the audio resource of the identified audio file from the network. Optionally, in response to the user's operation, the vehicle-mounted device displays a selection box, and the selection box contains sound quality options such as "standard quality", "high quality", and "lossless quality". The selection box is used to receive a user's selection operation on an option, and in response to the user's selection operation on an option, the vehicle-mounted device downloads audio resources of sound quality corresponding to the option.

The control 855 can be used to share the recognized audio file. The control 855 can receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays a sharing frame, which contains multiple sharing objects, for example, one or more terminal devices connected to the vehicle-mounted device via Bluetooth. The sharing box is used to receive a user's selection operation of a sharing object, and in response to the user's selection operation of a sharing object, the vehicle-mounted device sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.

In a possible implementation manner, the notification window 842 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays the user interface 82.

In a possible implementation manner, the user interface 82 may receive a user's sliding operation, and in response to the sliding operation, the vehicle-mounted device displays the user interface that was recently displayed before the user interface 82.

It should be noted that, not limited to in-vehicle devices, other smart home devices (smart home devices equipped with display screens) can set the humming recognition function according to the method described in Figure 7A-7C above, according to Figure 8A-Figure above The method introduced in 8B displays the recognition result. In addition, the user interface displaying the recognition result in the vehicle-mounted device and the smart home device can also refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, and the user interface 34 in the smart terminal described above. However, since the functions of each device are not exactly the same, the interface elements contained in the user interface can be adjusted accordingly.

Based on the aforementioned UI embodiment, the following embodiment introduces the system architecture for executing the humming recognition method provided in this application. The system architecture includes electronic equipment and music recognition server. among them:

The electronic device may be the smart terminal 100 exemplarily shown in FIG. 1A, specifically it may be a portable electronic device such as a mobile phone or a tablet computer, or a wearable device such as a smart watch or a smart bracelet, and the electronic device may also be the exemplarily shown in FIG. 1C. The smart home device 110 or the in-vehicle device 120 exemplarily shown in FIG. 1D. Specifically, the electronic device may have an audio input module and an audio output module. The electronic device can collect the sound in the external environment through the audio input module, and send the sound signal to the music recognition server for humming recognition. After that, the electronic device receives the recognized audio file and the playback position from the music recognition server, and then passes the speaker module Play the recognized audio file from the playback position. In a possible implementation, the electronic device may also include a camera module, the camera module is used to obtain the user's mouth shape information, the electronic device may send the acquired mouth shape information to the music recognition server for use in music The recognition server combines the lip shape information with the voice signal to perform humming recognition.

The music recognition server can perform feature extraction on the received sound signal, and use the extracted features (for example, the fundamental frequency sequence) to search, and match it with the user's hum from the pre-stored audio resource library (or called the feature database). The most similar audio information on the album. Optionally, the music recognition server may be a separate server, and the music recognition server may also be composed of multiple servers. Optionally, the audio resource library may be stored in the music recognition server, and the audio resource library may also be stored in another device (for example, a database server) that has a connection relationship with the music recognition server.

Refer to FIG. 9, which is a flowchart of a humming recognition method provided by an embodiment of the present application. The humming recognition method provided by the embodiment of the present application includes but is not limited to the following steps.

S901. The electronic device collects sounds in the external environment through the audio input module.

Optionally, before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether its own audio input module and/or audio output module is occupied. If its own audio input module and/or audio output module is occupied, for example, playing audio/video, making a phone call, performing voice navigation, etc., the electronic device does not collect sounds in the external environment through the audio input module for humming Recognition operation, it should be noted that the electronic device does not collect sounds in the external environment through the audio input module for humming recognition operation. It does not mean that in this case, the electronic device does not collect the external environment through the audio input module. The operation of sound, but the purpose of acquiring sound by the electronic device is not for humming recognition. For example, during a call, an electronic device (for example, a mobile phone) needs to collect sounds in the external environment through an audio input module, in order to obtain voice information input by the user and to obtain environmental sounds for noise reduction.

If the audio input module and/or audio output module of the electronic device is not occupied, the audio input module is used to collect sounds in the external environment. Optionally, after the audio input module and/or audio output module of the electronic device is released, for example, the audio/video playback ends, the phone hangs up, the voice navigation ends, etc., the electronic device can collect the external environment through the audio input module In the voice. It can also be understood that the priority of the humming recognition operation provided in the embodiment of the present application is lower than the priority of other operations in the electronic device that need to occupy audio resources except the humming recognition operation.

S902: If it is determined that the voiceprint information of the sound is consistent with the prestored voiceprint information, the electronic device sends a first audio file to the music recognition server, and the first audio file contains the sound.

Specifically, before the electronic device sends the first audio file to the music recognition server, the electronic device matches the voiceprint information of the sound with the prestored voiceprint information. If the matching is successful, that is, the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends the first audio file to the music recognition server for humming recognition; if the matching fails, the voiceprint information of the sound If it is inconsistent with the pre-stored voiceprint information, the electronic device continues to collect sounds in the external environment through the audio input module. It should be noted that the pre-stored voiceprint information is the pre-stored voiceprint information extracted from the voice signal input by the user. Specifically, the electronic device can receive the sound input by the user through the user interface 55, the user interface 63, and the user interface 73 exemplarily shown in the above-mentioned embodiment; then, the electronic device extracts voiceprints from the collected sound Process, and then store the extracted voiceprint information.

In addition, the voiceprint information of the voice is consistent with the prestored voiceprint information, which does not mean that the voiceprint information of the voice is exactly the same as the prestored voiceprint information; the similarity between the voiceprint information of the voice and the prestored voiceprint information is not When it is less than the preset value (for example, 90%, 95%), it can be determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information. Specifically, the electronic device may match the voiceprint information of the sound with the pre-stored voiceprint information: the electronic device extracts the voiceprint information from the sound signal, and the electronic device calculates the extracted voiceprint information and The similarity of the predicted voiceprint information. If the similarity is greater than or equal to the preset value, the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information; if the similarity is less than the preset value, the electronic device determines the voiceprint information of the sound Inconsistent with the pre-stored voiceprint information.

S903. The music recognition server searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file.

In an embodiment, the way that the music recognition server finds the second audio file from the audio resource library according to the first audio file may be: extracting features of the first audio file, and using the extracted features (for example, The base frequency sequence) is searched, and the second audio file that is most similar to the first audio file is selected from the pre-stored audio resource library (or called the feature database). That is, the similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library The audio file except the second audio file.

Optionally, the music recognition server can use automatic speech recognition (ASR) technology to convert the first audio file into text information, so as to determine the lyrics information corresponding to the first audio file. Further, the music recognition server can determine the progress of the user's humming music according to the recognized text information, and then determine the initial playback position of the second audio file. The start playback position of the second audio file corresponds to the end position of the first audio file. Therefore, the electronic device plays the second audio file from the start playback position to achieve the effect of playing audio following the user's humming progress.

S904. The music recognition server sends the second audio file and first indication information to the electronic device, where the first indication information indicates a starting playback position of the second audio file.

S905. After receiving the second audio file and the first instruction information sent by the music recognition server, the electronic device plays the second audio file from the starting playback position through the audio output module.

The specific implementation of each step in the above method will be further supplemented below.

In an embodiment, before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether its own humming recognition function is enabled. The electronic device may receive the user's setting of the humming recognition function through the user interface exemplarily shown in the user interface 41, the user interface 42, the user interface 51, the user interface 52, the user interface 62, and the user interface 72 in the foregoing embodiment. If the electronic device determines that its humming recognition function is enabled, the electronic device performs the step of collecting sounds in the external environment through the audio input module; if the electronic device determines that its humming recognition function is not enabled, the electronic device does not perform audio input The steps for the module to collect sounds in the external environment.

In yet another possible implementation manner, when it is detected that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that after detecting that the electronic device is unlocked, the electronic device can collect sounds in the external environment through the audio input module. For this implementation manner, reference can be made to the introduction of the switch control 556 in the user interface 52 in the above embodiment, and the switch control 556 can be used to set the usable state of the humming recognition function. In this way, the collection of environmental sounds can be stopped when the electronic device is in the locked state, which can reduce power consumption and save the power of the electronic device.

In another possible implementation manner, when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that if it is detected that the electronic device is no longer located at the preset location, the electronic device can collect sounds in the external environment through the audio input module. The preset location may be a location set by the user (for example, the location of a company set by the user, etc.), and the preset location may also be a location prestored in the electronic device (for example, a school, a hospital, a theater, etc.). Wherein, the electronic device can determine its own location through a global positioning system (GPS), Bluetooth (BT) or wireless local area networks (WLAN). For this possible implementation manner, refer to the introduction of the switch control 557 in the user interface 52 in the above embodiment. Specifically, when the “Environment Do Not Disturb” switch control (switch control 557) is in the on state, the electronic device detects in real time (or detects according to a preset period) whether it is located at a preset location, and if it is detected that the electronic device is in a preset location When the location is set, the electronic device stops collecting sounds in the external environment through the audio input module. It should be noted that the preset location is a location that is not suitable for playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided and the power of the electronic device can be saved.

In another possible implementation manner, when it is detected that the duration of the ambient light brightness is less than the preset value and greater than the preset time, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that when it is detected that the duration of the ambient light brightness is greater than or equal to the preset value and greater than the preset time, the electronic device can collect the sound in the external environment through the audio input module. Optionally, the electronic device can sense the brightness of the ambient light through the ambient light sensor. It should be noted that the situation where the ambient light brightness of the electronic device is less than the preset value for the duration and greater than the preset time may represent the situation that the electronic device is in the user's pocket or the current time is night. In this case, The electronic device is not suitable for playing audio files. In this way, the problem of playing the second audio file in inappropriate places can be avoided, and the power of the electronic device can be saved.

In yet another possible implementation manner, the electronic device stops collecting sounds in the external environment through the audio input module within the first time period. Wherein, the first time period may be a preset time period (for example, 11 pm to 9 am), and the first time period may also be a time period determined according to the time information input by the user. The situation that the first time period is a time period determined according to the time information input by the user may correspond to the introduction of the switch control 536 in the user interface 52 in the foregoing embodiment. The user can input the start time and end time of the humming recognition function, and the first time period is the time period from the end time to the start time.

In one embodiment, in step S902, before the electronic device determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information, the electronic device may determine whether the voice signal is a human voice. If the electronic device determines that the sound is a human voice, the electronic device then determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information; if the electronic device determines that the sound is not a human voice, the electronic device continues to pass the audio The input module collects sounds in the external environment. The method for the electronic device to determine whether the sound is a human voice may be: the electronic device calculates the frequency of the sound, and if the frequency is within a preset frequency range, the electronic device determines that the sound is a human voice; if the frequency is not within the preset frequency range , The electronic device determines that the sound is not a human voice. The preset frequency range can be set according to requirements. For example, since the reference range of male voices is 64 Hz to 523 Hz and the reference range of female voices is 160 Hz to 1200 Hz, the preset frequency range may be 64 Hz to 1200 Hz.

In an embodiment, the electronic device may also obtain the user's mouth shape information through a camera. For example, the electronic device may receive the setting of the user's access authority to the camera through the user interface exemplarily shown in the user interface 52 and the user interface 53 in the foregoing embodiment. Optionally, the on state of the camera of the electronic device may be consistent with the on state of the humming recognition function. Optionally, before the electronic device obtains the user's mouth shape information through the camera, the electronic device determines whether the sound is a human voice. If it is determined that the sound is a human voice, the electronic device can obtain the user's mouth shape information through the camera. For the method of judging whether the sound is a human voice, refer to the above description, which will not be repeated here. In this way, the power consumption of the electronic device can be reduced, and the power of the electronic device can be saved.

In this case, the music recognition server can also receive the lip shape information sent by the humming recognition server. The music recognition server can determine the text information based on the lip shape information, and combine the text information determined by the lip shape and the first audio file to determine the final Recognition results. That is, the similarity between the text information corresponding to the second audio file and the text information corresponding to the lip shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information. In this way, the accuracy of identifying the second audio file can be further improved.

In an embodiment, before the music recognition server finds the second audio file from the audio resource library according to the first audio file, the music recognition server determines whether the first audio file is a music fragment. Optionally, the music recognition server music segment may determine whether the first audio file is a music segment based on text information corresponding to the first audio file and multiple intervals between consecutive texts in the audio file. It should be noted that the music recognition server prestores text information corresponding to an audio file (which can be understood as lyrics), and multiple intervals between consecutive text in the audio file. If the similarity between the text information corresponding to the first audio file and the text information corresponding to one or more pre-stored audio files is not less than the preset value, and the multiple intervals between consecutive text in the first audio file are equal to The similarity of multiple intervals between consecutive words in the one or more audio files is not less than a preset value, and then it is determined that the first audio file is a music segment. Specifically, when the music recognition server determines that the sound signal is a music fragment, the music recognition server finds the second audio file from the audio resource library according to the first audio file. Optionally, the second audio file is included in the one or more audio files. When the music recognition server determines that the sound signal is not a music segment, the music recognition server feeds back to the electronic device the result that the sound signal is not a music segment.

In a possible implementation manner, the tag of the second audio file is included in the user tag of the first user. Wherein, the first user is a user who logs in in an electronic device, or is a user who uses the electronic device, and a user tag of the first user is pre-stored in the music recognition server. In this way, the second audio file can be made more in line with the user's preferences, and the user experience can be improved.

In an embodiment, before the electronic device plays the second audio file from the starting playback position through the audio output module, it is necessary to determine whether the location information of the electronic device is consistent with the preset location. Specifically, if the electronic device determines that the location of the electronic device is inconsistent with the preset location, the electronic device plays the second audio file from the starting playback position through the audio output module. Optionally, if the electronic device determines that its location is consistent with the preset location, the electronic device may only display the humming recognition result, but not play the audio file, and the user interface for displaying the humming recognition result can refer to the above embodiment The introduced user interface 21, user interface 31, user interface 32, user interface 33, user interface 34, user interface 81, and user interface 82 will not be repeated here. For the meaning of the preset location and the determination method of the preset location, please refer to the content introduced above, which will not be repeated here. For this possible implementation manner, refer to the introduction of the switch control 557 in the user interface 52 in the above embodiment. Specifically, when the switch control (switch control 557) of "Environment Do Not Disturb" is in the on state, the electronic device needs to determine that its location is not a preset location before playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided, and the power of the electronic device can be saved.

In another possible implementation manner, before the electronic device plays the second audio file, the environmental volume of the environment where the electronic device is located is determined, and the electronic device determines the volume at which the second audio file is played according to the environmental volume. Specifically, the greater the environmental volume, the greater the volume at which the electronic device plays the second audio file, and the lower the environmental volume, the lower the volume at which the electronic device plays the second audio file.

In a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the method further includes: the electronic device displays the identification information of the second audio file, and playing Control; wherein the display state of the playback control is a first state, and the first state indicates that the second audio file is being played; if the electronic device detects a first user acting on the playback control in the first state Operation, in response to the first user operation, the electronic device pauses playing the second audio file, and sets the display state of the playback control to a second state, which indicates that the second audio file is paused. Optionally, if the electronic device detects a second user operation acting on the playback control in the second state, in response to the second user operation, the electronic device continues to play the second audio file, and plays the second audio file The display state of the control is set to the first state. In addition, the electronic device displays the identification information of the second audio file and the user interface of the playback control. Refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, the user interface 34, and the user interface introduced in the above embodiment. The interface 81 and the user interface 82 are not repeated here.

In a possible implementation manner, the electronic device will play the second audio file within the time period from the moment when it starts to play the second audio file to the preset moment (for example, the 5th second, the 6th second, etc.) The volume gradually increases from low to high. For example, gradually increase from the minimum volume value to the volume value set by the user, or gradually increase from 30% of the volume value set by the user to 100% of the volume value set by the user, and there may be other volume levels The increase mode is not limited in the embodiment of this application.

In a possible implementation manner, after the electronic device plays the second audio file from the starting playback position through the audio output module, the electronic device may also detect whether the second audio file is stored in a pre-stored music folder If yes, the electronic device can play other audio files in the music folder after playing the second audio file.

It is understandable that, for the specific implementation of each step of the method described in FIG. 9, reference may be made to the embodiments described in FIGS. 1A to 8B, which are not repeated here.

In an embodiment, the humming recognition method provided in this application can also be applied in an open platform. Specifically, the open platform obtains a first audio file, and the first audio file includes sounds in the external environment; if the open platform determines that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open The platform searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file; wherein the characteristics of the second audio file are similar to those of the first audio file The degree of similarity is higher than the similarity between the feature of the third audio file and the feature of the sound. The third audio file is the audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file Corresponds to the end position of the first audio file; the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device to play the second audio from the start playback position file.

Wherein, the open platform is a platform that provides an open application programming interface (API) or function. That is, the open platform may have the function of an application program provided with an API, or the function of a function. Optionally, the open platform can implement the method executed by the electronic device and the music recognition server in FIG. 9 by calling an API (or function). For example, the open platform may be a voice assistant platform, which may include only the voice assistant on the electronic device side, or a platform directly associated with the voice assistant on the electronic device side and the server side, or only the voice assistant on the server side. Platform, the embodiment of the present invention does not specifically limit it. The way for the open platform to obtain the first audio may be that the open platform obtains the first audio file through the audio input module of the device where it is located, or the open platform receives the first audio file sent by an electronic device connected to itself. Optionally, the electronic device may actively send the first audio file to the open platform, or the open platform may actively obtain the first audio file from the electronic device. After that, the open platform calls the API (or function) with the voiceprint recognition function to determine whether the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information. If it is determined that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, The voiceprint information is consistent, and the open file calls an API (or function) with a humming recognition function to find the second audio file from the audio resource library according to the first audio file. After that, the open platform plays the second audio file from the start playback position through the audio output module of the device where it is located, or the development platform controls other applications of the electronic device to play the second audio file from the start playback position . Optionally, the open platform may send a second audio file and first instruction information to the electronic device, where the first instruction information includes the start playback position, and the first instruction information is used to instruct the electronic device to play from the start Position to play the second audio file.

It should be noted that, for the manner in which the open platform executes the humming recognition provided in the embodiment of the present application, reference may be made to the specific implementation manner of each step of the method described in FIG. 9, which will not be repeated here.

The various embodiments of the present application can be combined arbitrarily to achieve different technical effects.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in this application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk).

In short, the above descriptions are only examples of the technical solutions of the present application, and are not used to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made according to the disclosure of this application shall be included in the protection scope of this application.

Claims

A humming recognition method, which is characterized in that it comprises:

The electronic device collects the sound in the external environment through the audio input module;

If the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends a first audio file to the music recognition server, the first audio file contains the sound, and the The music recognition server is used to find the second audio file from the audio resource library according to the first audio file, and determine the initial playback position of the second audio file; wherein, the characteristics of the second audio file and the The similarity of the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound, and the third audio file is the audio except the second audio file in the audio resource library. File, the start playback position of the second audio file corresponds to the end position of the first audio file;

Receiving, by the electronic device, the second audio file and first indication information sent by the music recognition server, where the first indication information indicates the starting playback position of the second audio file;

The electronic device plays the second audio file from the start playback position through the audio output module.
The method of claim 1, wherein the method further comprises:

The electronic device obtains the user's mouth shape information through a camera;

If the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends the lip shape information to the music recognition server;

Wherein, the music recognition server is further configured to convert the lip shape information into text information, and the search for a second audio file from an audio resource library according to the first audio file includes:

Find a second audio file from the audio resource library according to the text information corresponding to the first audio file and the lip shape information, where the text information corresponding to the second audio file and the text corresponding to the lip shape information The similarity of the information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.
The method according to claim 2, wherein the electronic device acquiring the user's mouth shape information through a camera comprises:

If the electronic device determines that the sound is a human voice, the user's mouth shape information is acquired through a camera.
The method according to any one of claims 1 to 3, wherein the electronic device collects sound in the external environment through an audio input module, comprising:

If the electronic device determines that the audio input module and/or the audio output module is not occupied, the electronic device collects sounds in the external environment through the audio input module.
The method according to any one of claims 1 to 4, wherein the tag of the second audio file is included in the user tag of the first user.
The method according to any one of claims 1-5, wherein after the electronic device plays the second audio file from the starting playback position through an audio output module, the method further comprises:

The electronic device displays the identification information of the second audio file and the playback control;

Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;

If the electronic device detects a first user operation acting on the playback control in the first state, in response to the first user operation, the electronic device pauses playing the second audio file, and changes The display state of the playback control is set to a second state, and the second state indicates that the playback of the second audio file is paused.
The method according to any one of claims 1-6, wherein the method further comprises:

When detecting that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module.
The method according to any one of claims 1-7, wherein the method further comprises:

When detecting that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module.
7. The method according to any one of claims 1-7, wherein the electronic device playing the second audio file from the starting playback position through an audio output module comprises:

If the electronic device determines that the location of the electronic device is not consistent with the preset location, the electronic device plays the second audio file from the starting playback position through the audio output module.
The method according to any one of claims 1-9, wherein the method further comprises:

The electronic device stops collecting sounds in the external environment through the audio input module within the first time period.
An electronic device, characterized by comprising an audio input module, an audio output module, a processor, and a memory, wherein:

The memory is used to store program instructions;

The processor is configured to perform the following operations according to the program instructions:

Collect the sound in the external environment through the audio input module;

If it is determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the first audio file is sent to the music recognition server, and the first audio file contains the sound, and the music recognition server is used for The first audio file finds the second audio file from the audio resource library, and determines the initial playback position of the second audio file; wherein the similarity between the features of the second audio file and the features of the first audio file , Higher than the similarity between the feature of the third audio file and the feature of the sound, the third audio file is an audio file in the above audio resource library except the second audio file, and the start of the second audio file The start playback position corresponds to the end position of the first audio file;

Receiving the second audio file and first indication information sent by the music recognition server, where the first indication information indicates a starting playback position of the second audio file;

Playing the second audio file from the starting playback position through an audio output module.
The electronic device according to claim 11, wherein the electronic device further comprises a camera, and the processor is further configured to perform the following operations according to the program instructions:

Obtain the user's mouth shape information through the camera;

If the voiceprint information of the sound is consistent with the pre-stored voiceprint information, sending the lip shape information to the music recognition server;

Wherein, the music recognition server is also used to convert the mouth shape information into text information;

The music recognition server is further specifically configured to: search for a second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text corresponding to the second audio file The similarity between the information and the text information corresponding to the lip shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
The electronic device according to claim 12, wherein the processor is specifically configured to perform the following operations according to the program instructions:

If it is determined that the voice is a human voice, the user's mouth shape information is obtained through the camera.
The electronic device according to any one of claims 11-13, wherein the processor is specifically configured to perform the following operations according to the program instructions:

If it is determined that the audio input module and/or the audio output module are not occupied, the audio input module is used to collect sounds in the external environment.
The electronic device according to any one of claims 11-14, wherein the tag of the second audio file is included in the user tag of the first user.
The electronic device according to any one of claims 11-15, wherein the electronic device further comprises a display screen, and the processor is further configured to perform the following operations according to the program instructions:

Display the identification information of the second audio file and the playback controls on the display screen;

Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;

If a first user operation acting on the play control in the first state is detected, in response to the first user operation, pause the play of the second audio file, and set the display state of the play control The second state indicates that the second audio file is paused.
The electronic device according to any one of claims 11-16, wherein the processor is further configured to perform the following operations according to the program instructions:

When detecting that the electronic device is in a locked state, stop collecting sounds in the external environment through the audio input module.
The electronic device according to any one of claims 11-17, wherein the processor is further configured to perform the following operations according to the program instructions:

When detecting that the electronic device is at a preset location, stop collecting sounds in the external environment through the audio input module.
The electronic device according to any one of claims 11-17, wherein the processor is specifically configured to perform the following operations according to the program instructions:

If it is determined that the location of the electronic device is inconsistent with the preset location, the second audio file is played from the starting playback position through the audio output module.
The electronic device according to any one of claims 11-19, wherein the processor is further configured to perform the following operations according to the program instructions:

Stop collecting sounds in the external environment through the audio input module in the first time period.
A humming recognition method is characterized in that it comprises:

The open platform acquires a first audio file, where the first audio file includes sounds in the external environment;

If the open platform determines that the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform searches for the second audio file from the audio resource library according to the first audio file, and determines the The initial playback position of the second audio file; wherein the similarity between the features of the second audio file and the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound Degree, the third audio file is an audio file other than the second audio file in the audio resource library, and the start playback position of the second audio file corresponds to the end position of the first audio file;

The open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device to play the second audio file from the start playback position.
The method of claim 21, wherein the method further comprises:

The open platform obtains the user's mouth shape information through the electronic device;

If the open platform determines that the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform converts the lip-shape information into text information;

The searching for a second audio file from an audio resource library according to the first audio file includes:

Find a second audio file from the audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file and the text corresponding to the lip shape information The similarity of the information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.
The method according to claim 22, wherein said open platform acquiring user's mouth shape information through said electronic device comprises:

If the open platform determines that the sound included in the first audio file is a human voice, the user's mouth shape information is acquired through the electronic device.
The method according to any one of claims 21-23, wherein the open platform obtaining the first audio file comprises:

If the audio input module and/or audio output module is not occupied by other applications, the open platform obtains the first audio file.
The method according to any one of claims 21-24, wherein the tag of the second audio file is included in the user tag of the first user.
The method according to any one of claims 21-25, wherein after the open platform plays the second audio file from the starting playback position, the method further comprises:

The open platform displays the identification information of the second audio file and the playback control through the electronic device;

Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;

If the open platform detects a first user operation acting on the playback control in the first state, in response to the first user operation, the open platform suspends playing the second audio file, or controls Other applications of the electronic device pause the playback of the second audio file, and set the display state of the playback control to the second state, and the second state indicates that the second audio file is paused.
The method according to any one of claims 21-26, wherein the method further comprises:

When detecting that the electronic device is in a locked state, the open platform stops acquiring the first audio file.
The method according to any one of claims 21-27, wherein the method further comprises:

When it is detected that the electronic device is at a preset location, the open platform stops acquiring the first audio file.
The method according to any one of claims 21-27, wherein the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device from Playing the second audio file at the start playback position includes:

If the open platform determines that the location of the electronic device is inconsistent with the preset location, the open platform plays the second audio file from the starting playback position, or the development platform controls other applications of the electronic device Playing the second audio file from the starting playback position.
The method according to any one of claims 21-29, wherein the method further comprises:

The open platform stops acquiring the first audio file in the first time period.
A computer program product containing instructions, characterized in that, when the computer program product runs on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10, 21-30 .
A computer-readable storage medium, comprising instructions, characterized in that, when the instructions are executed on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10, 21-30 .