CN112015943A

CN112015943A - Humming recognition method and related equipment

Info

Publication number: CN112015943A
Application number: CN201910472410.9A
Authority: CN
Inventors: 叶波; 吴小进; 周昕宇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2020-12-01
Also published as: WO2020239001A1

Abstract

In the humming recognition method, electronic equipment can continuously acquire sound in an external environment, and when the sound is judged to be sound emitted by a preset user, the electronic equipment sends a first audio file containing the sound to a music recognition server so as to perform humming recognition. After the electronic equipment receives the identified second audio file and the starting playing position thereof sent by the music identification server, the electronic equipment can play the second audio file from the ending position of the sound. Wherein, the starting playing position of the second audio file corresponds to the ending position of the first audio file. By the method, the operation steps of humming recognition of the user triggering terminal can be reduced, the humming recognition efficiency is improved, meanwhile, the effect of playing audio along with humming of the user can be achieved, and user experience is improved.

Description

Humming recognition method and related equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a humming recognition method and related devices.

Background

Humming recognition is a research focus in the field of audio retrieval today. Unlike the way audio is retrieved using text (e.g., text such as song name, singer, or lyrics), and the way audio is retrieved using a piece of music being played, humming recognition can retrieve audio through a piece of music that the user hums.

At present, the following two ways are mainly used for the user to trigger the terminal to perform humming recognition: in the first method, a user needs to search an application with a humming recognition function, search a feature control corresponding to the humming recognition in the application, and then perform an operation on the feature control to trigger the terminal to perform humming recognition. In the second way, the user needs to wake up the intelligent voice assistant (e.g. siri, makita demon, etc.) by the wake-up word first and then input the voice command to trigger the terminal to perform humming recognition. It can be seen that, in the prior art, the way in which the user triggers the terminal to perform humming recognition is tedious.

Disclosure of Invention

The application provides a humming recognition method and related equipment, which can reduce the operation steps of humming recognition of a user trigger terminal, improve the humming recognition efficiency, and simultaneously can realize the effect of playing audio following humming of a user, and improve user experience.

The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.

In a first aspect, embodiments of the present application provide a humming recognition method, which may include: the electronic equipment collects sound in an external environment through the audio input module; if the electronic equipment judges that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic equipment sends a first audio file to a music identification server, wherein the first audio file contains the sound, and the music identification server is used for searching a second audio file from an audio resource library according to the first audio file and determining the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file; the electronic equipment receives the second audio file and first indication information sent by the music identification server, wherein the first indication information indicates the initial playing position of the second audio file; and the electronic equipment plays the second audio file from the initial playing position through an audio output module. By the method, the operation steps of humming recognition of the user triggering terminal can be reduced, the humming recognition efficiency is improved, meanwhile, the effect of playing audio along with humming of the user can be achieved, and user experience is improved.

With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic equipment acquires the mouth shape information of a user through a camera; if the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic equipment sends the mouth shape information to a music identification server; wherein, the music identification server is further configured to convert the mouth shape information into text information, and the finding out a second audio file from an audio repository according to the first audio file includes: and searching a second audio file from an audio resource library according to the first audio file and the text information corresponding to the mouth shape information, wherein the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.

With reference to the first aspect, in a possible implementation manner, the acquiring, by the electronic device, mouth shape information of a user through a camera includes: and if the electronic equipment judges that the voice is the voice, acquiring the mouth shape information of the user through a camera.

With reference to the first aspect, in a possible implementation manner, the electronic device collects sound in an external environment through an audio input module, and includes: if the electronic equipment judges that the audio input module and/or the audio output module are not occupied, the electronic equipment collects sound in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the tag of the second audio file is included in a user tag of the first user.

With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playing position through an audio output module, the method further includes: the electronic equipment displays the identification information of the second audio file and a playing control; the display state of the playing control is a first state, and the first state represents that the second audio file is being played; and if the electronic equipment detects a first user operation acting on the playing control in the first state, responding to the first user operation, pausing the playing of the second audio file by the electronic equipment, and setting the display state of the playing control to be a second state, wherein the second state represents that the playing of the second audio file is paused.

With reference to the first aspect, in a possible implementation manner, the method further includes: when the electronic equipment is detected to be in the locked state, the electronic equipment stops collecting the sound in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the method further includes: when the electronic equipment is detected to be in a preset place, the electronic equipment stops collecting sound in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the playing, by the electronic device, the second audio file from the starting playing position through an audio output module includes: and if the electronic equipment judges that the position of the electronic equipment is inconsistent with a preset place, the electronic equipment plays the second audio file from the initial playing position through the audio output module.

With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic equipment stops collecting the sound in the external environment through the audio input module within a first time period.

With reference to the first aspect, in a possible implementation manner, the electronic device collects sound in an external environment through an audio input module, and includes: if the electronic device judges that the humming recognition function of the electronic device is started, the electronic device collects sounds in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the method further includes: and when the duration that the ambient light brightness is smaller than the preset value is detected and is longer than the preset time, the electronic equipment stops collecting the sound in the external environment through the audio input module.

With reference to the first aspect, in a possible implementation manner, the music identification server is further configured to find out a second audio file from an audio repository according to the first audio file when the music identification server determines that the sound signal is a music piece.

With reference to the first aspect, in a possible implementation manner, in a time period from a time point when the electronic device starts playing the second audio file to a preset time point (for example, a time value of 5 th second, 6 th second, and the like), the volume of playing the second audio file is gradually increased from low to high.

With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playing position through the audio output module, the electronic device may further detect whether the second audio file is stored in a pre-stored music folder, and if so, the electronic device may play other audio files in the music folder after playing the second audio file.

In a second aspect, an embodiment of the present application provides an electronic device, which includes an audio input module, an audio output module, a processor, and a memory, wherein: the memory is to store program instructions; the processor is configured to perform the following operations in accordance with the program instructions: collecting sound in an external environment through an audio input module; if the voiceprint information of the sound is judged to be consistent with the pre-stored voiceprint information, sending a first audio file to a music identification server, wherein the first audio file comprises the sound, and the music identification server is used for searching a second audio file from an audio resource library according to the first audio file and determining the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file; receiving the second audio file and first indication information sent by the music identification server, wherein the first indication information indicates an initial playing position of the second audio file; and playing the second audio file from the initial playing position through an audio output module. Through the electronic equipment, the operation steps of the user touching humming recognition can be reduced, the humming recognition efficiency is improved, meanwhile, the effect of playing audio by following the humming of the user can be realized, and the user experience is improved.

With reference to the second aspect, in a possible implementation manner, the electronic device further includes a camera, and the processor is further configured to perform the following operations according to the program instructions: acquiring the mouth shape information of a user through a camera; if the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the mouth shape information is sent to a music identification server; the music recognition server is also used for converting the mouth shape information into text information; the music recognition server is further specifically configured to: and searching a second audio file from an audio resource library according to the first audio file and the text information corresponding to the mouth shape information, wherein the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: and if the voice is judged to be the voice, acquiring the mouth shape information of the user through the camera.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: and if the audio input module and/or the audio output module are not occupied, acquiring the sound in the external environment through the audio input module.

With reference to the second aspect, in one possible implementation manner, the tag of the second audio file is included in a user tag of the first user.

With reference to the second aspect, in a possible implementation manner, the electronic device further includes a display screen, and the processor is further configured to perform the following operations according to the program instructions: displaying the identification information of the second audio file and the playing control through the display screen; the display state of the playing control is a first state, and the first state represents that the second audio file is being played; and if detecting a first user operation acting on the playing control in the first state, responding to the first user operation, pausing the playing of the second audio file, and setting the display state of the playing control to be a second state, wherein the second state represents that the playing of the second audio file is paused.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: and when the electronic equipment is detected to be in the locked state, stopping collecting the sound in the external environment through the audio input module.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: and when the electronic equipment is detected to be in a preset place, stopping collecting the sound in the external environment through the audio input module.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: and if the position of the electronic equipment is judged to be inconsistent with the preset place, playing the second audio file from the initial playing position through an audio output module.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: stopping the collection of sound in the external environment through the audio input module for a first period of time.

With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if the humming recognition function is judged to be started, the sound in the external environment is collected through the audio input module.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: and when the duration that the ambient light brightness is smaller than the preset value is detected and is longer than the preset time, stopping collecting the sound in the external environment through the audio input module.

With reference to the second aspect, in a possible implementation manner, the music identification server is further configured to find a second audio file from the audio repository according to the first audio file when the music identification server determines that the sound signal is a music piece.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: in a time period from the time when the second audio file starts to be played to a preset time (for example, a time value of 5 th second, 6 th second, etc.), the volume of playing the second audio file is gradually increased from low to high.

With reference to the second aspect, in one possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: and detecting whether the second audio file is stored in a pre-stored music folder, if so, playing other audio files in the music folder after the second audio file is played.

In a third aspect, an embodiment of the present application provides a humming recognition method, including: the method comprises the steps that an open platform obtains a first audio file, wherein the first audio file comprises sound in an external environment; if the open platform judges that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open platform searches a second audio file from an audio resource library according to the first audio file and determines the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file; and the open platform plays the second audio file from the initial playing position, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position. By the method, the operation steps that the user triggers the humming recognition can be reduced, the humming recognition efficiency is improved, meanwhile, the effect of playing audio by following the humming of the user can be achieved, and the user experience is improved.

With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform acquires the mouth shape information of a user through the electronic equipment; if the open platform judges that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open platform converts the mouth shape information into text information; the finding out a second audio file from an audio repository according to the first audio file comprises: and searching a second audio file from an audio resource library according to the first audio file and the text information corresponding to the mouth shape information, wherein the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.

With reference to the third aspect, in a possible implementation manner, the acquiring, by the open platform, the mouth shape information of the user includes: and if the open platform judges that the sound included in the first audio file is the voice, acquiring the mouth shape information of the user through the electronic equipment.

With reference to the third aspect, in a possible implementation manner, the acquiring, by the open platform, the first audio file includes: and if the open platform judges that the audio input module and/or the audio output module is not occupied by other applications, the open platform acquires a first audio file.

With reference to the third aspect, in one possible implementation manner, the tag of the second audio file is included in a user tag of the first user.

With reference to the third aspect, in a possible implementation manner, after the open platform plays the second audio file from the start playing position, the method further includes: the open platform displays the identification information of the second audio file and a playing control through the electronic equipment; the display state of the playing control is a first state, and the first state represents that the second audio file is being played; if the open platform detects a first user operation acting on the play control in the first state, responding to the first user operation, the open platform pauses playing the second audio file, or controls other application programs of the electronic equipment to pause playing the second audio file, and sets the display state of the play control to be a second state, wherein the second state represents that the second audio file pauses playing.

With reference to the third aspect, in a possible implementation manner, the method further includes: and when the electronic equipment is detected to be in a locked state, the open platform stops acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the method further includes: and when the electronic equipment is detected to be in a preset place, the open platform stops acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the playing, by the open platform, the second audio file from the starting playing position, or controlling, by the development platform, other application programs of the electronic device to play the second audio file from the starting playing position includes: and if the open platform judges that the position of the electronic equipment is inconsistent with a preset place, the open platform plays the second audio file from the initial playing position, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position.

With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform stops acquiring the first audio file within a first time period.

With reference to the third aspect, in a possible implementation manner, the acquiring, by the open platform, a first audio file includes: and if the humming recognition function of the electronic equipment is started, the open platform acquires a first audio file.

With reference to the third aspect, in a possible implementation manner, the method further includes: and when the duration that the environmental light brightness of the electronic equipment is smaller than the preset value is detected and is larger than the preset time, the open platform stops acquiring the first audio file.

With reference to the third aspect, in a possible implementation manner, the open platform is further configured to find a second audio file from an audio repository according to the first audio file when the first audio file is determined to be a music piece.

With reference to the third aspect, in a possible implementation manner, in a time period from a time when the open platform starts playing the second audio file to a preset time (for example, a time value of 5 th second, 6 th second, and the like), the volume of playing the second audio file is gradually increased from low to high.

With reference to the third aspect, in a possible implementation manner, after the development platform controls the other application programs of the electronic device to play the second audio file from the start playing position, the open platform may further detect whether the second audio file is stored in a music folder pre-stored in the electronic device, and if so, the open platform may control the other application programs of the electronic device to play the other audio files in the music folder after controlling the other application programs of the electronic device to play the second audio file.

In a fourth aspect, an embodiment of the present application provides a computer program product including instructions, which, when run on an electronic device, causes the electronic device to perform the method according to any one of the possible implementations of the first aspect, or when run on an open platform, causes the open platform to perform the method according to any one of the possible implementations of the third aspect.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions, and is characterized in that when the instructions are executed on an electronic device, the instructions cause the electronic device to execute, for example, causing the electronic device to execute any one of the possible implementations according to the first aspect, or when the instructions are executed on an open platform, the open platform executes any one of the possible implementations according to the third aspect.

In the humming recognition method provided by the present application, the electronic device may continuously obtain the sound in the external environment, and when it is determined that the sound is the sound made by the predetermined user, the electronic device sends the first audio file containing the sound to the music recognition server to perform humming recognition. After the electronic equipment receives the identified second audio file and the starting playing position thereof sent by the music identification server, the electronic equipment can play the second audio file from the ending position of the sound. Wherein, the starting playing position of the second audio file corresponds to the ending position of the first audio file. By the method, the operation steps of humming recognition of the user triggering terminal can be reduced, the humming recognition efficiency is improved, meanwhile, the effect of playing audio along with humming of the user can be achieved, and user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1A is a schematic structural diagram of an intelligent terminal provided in an embodiment of the present application;

fig. 1B is a block diagram of a software structure of an intelligent terminal according to an embodiment of the present disclosure;

fig. 1C is a schematic structural diagram of an intelligent home device provided in an embodiment of the present application;

FIG. 1D is a schematic structural diagram of an in-vehicle device according to an embodiment of the present disclosure;

fig. 2 is a user interface for displaying an application menu on an intelligent terminal according to an embodiment of the present disclosure;

3A-3B are some user interfaces for displaying recognition results provided by embodiments of the present application;

fig. 3C is a user interface displayed when the intelligent terminal is in a locked state according to an embodiment of the present application;

3D-3F are further user interfaces for displaying recognition results provided by embodiments of the present application;

FIG. 3G is a user interface for humming recognition provided by an embodiment of the present application;

FIGS. 4A-4B are user interfaces for setting humming recognition functionality according to embodiments of the present application;

FIGS. 5A-5C illustrate still other user interfaces for setting humming recognition functionality according to embodiments of the present application;

FIGS. 5D-5F are user interfaces for setting access rights for the humming recognition function provided by embodiments of the present application;

FIG. 5G is a user interface for entering voiceprint information provided by an embodiment of the present application;

FIGS. 6A-6B illustrate still other user interfaces for setting humming recognition functionality according to embodiments of the present application;

FIG. 6C is yet another user interface for entering voiceprint information provided by an embodiment of the present application;

FIGS. 7A-7B are user interfaces for setting a humming recognition function on some embodiments of the present application;

FIG. 7C is yet another user interface for entering voiceprint information provided by an embodiment of the present application;

8A-8B are user interfaces for displaying recognition results on some vehicle-mounted devices provided by the embodiment of the application;

FIG. 9 is a flow chart of a humming recognition method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.

First, some concepts related to the present application will be specifically described.

Humming recognition is a way to perform audio retrieval by a user humming a piece of music. The working principle of humming recognition is: the electronic equipment acquires a music piece hummed by the user, then sends the music piece to the server, the server matches an audio file most similar to the hummed piece hummed by the user according to the similarity, and then the server feeds the audio file back to the electronic equipment. Optionally, the server matches the audio file most similar to the user humming fragment from a pre-stored audio resource library by extracting features (e.g., a sequence of fundamental frequencies) from the music piece and then using the features for retrieval. Humming recognition is a fuzzy match, since the segments hummed by the user are unlikely to be completely similar to the segments of the actual audio files in the library. For fuzzy matching, the accuracy of recognition can be improved by using a string editing distance, a Dynamic Time Warping (DTW) algorithm and the like.

The User Interface (UI) is a media interface for interaction and information exchange between an application program or an operating system and a user, and it realizes conversion between an internal form of information and a form acceptable to the user. The user interface of the application program is source code written by a specific computer language such as java, extensible markup language (XML), and the like, and the interface source code is analyzed and rendered on the electronic device 300, and finally presented as content that can be recognized by a user, such as controls such as pictures, characters, buttons, and the like. A control (control) is a basic element of a user interface, and typical controls include buttons (button), widgets (widget), toolbars (toolbar), menu bars (menu bar), text boxes (text box), scroll bars (scrollbar), pictures and texts. The properties and contents of the controls in the interface are defined by tags or nodes, such as XML defining the controls contained by the interface by nodes < Textview >, < ImgView >, < VideoView >, and the like. A node corresponds to a control or attribute in the interface, and the node is rendered as user-viewable content after parsing and rendering. In addition, many applications, such as hybrid applications (hybrid applications), typically include web pages in their interfaces. A web page, also called a page, can be understood as a special control embedded in an application program interface, the web page is a source code written by a specific computer language, such as hypertext markup language (HTML), Cascading Style Sheets (CSS), java scripts (JavaScript, JS), etc., and the web page source code can be loaded and displayed as a content recognizable to a user by a browser or a web page display component similar to a browser function. The specific content contained in the web page is also defined by tags or nodes in the source code of the web page, such as HTML, which defines elements and attributes of the web page by < p >, < img >, < video >, < canvas >.

A commonly used presentation form of the user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display screen of the electronic device.

The following embodiments of the application provide a humming recognition method and an electronic device, so that the electronic device can play an audio file corresponding to a music piece along with humming of a user in a humming music piece process of the user, reduce operation steps of humming recognition by a user trigger terminal, and improve humming recognition efficiency.

In the following embodiments of the application, an electronic device (e.g., a smart terminal, a smart home, a vehicle-mounted device, etc.) will perform the humming recognition operation provided by the embodiments of the application under the condition that the electronic device has the authority of humming recognition. The flow of the humming recognition operation can be implemented by referring to the following steps: first, the electronic device collects sounds in the external environment through an audio input module (e.g., a microphone); then, if the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends a first audio file containing the sound to a music recognition server for humming recognition, so as to recognize an audio file matched with a music piece hummed by the user, and determine a starting playing position of the audio file. Wherein the identified start playing position of the audio file corresponds to the end position of the first audio file. After the electronic device receives the audio file fed back by the music recognition server and the information containing the start playing position, the electronic device can play the audio file from the start playing position, so as to achieve the purpose of playing the audio file following the humming of the user. The system architecture and implementation flow for performing the humming recognition operation will be further described, but not specifically illustrated herein.

In the following embodiments of the present application, before the electronic device performs the humming recognition operation provided in the embodiments of the present application, it needs to determine whether the audio input module and/or the audio output module of the electronic device is occupied, and if the audio input module and/or the audio output module of the electronic device is occupied, for example, playing audio/video, making a call, performing voice navigation, and the like, the electronic device does not perform the humming recognition operation provided in the embodiments of the present application; if the electronic device's own audio input module and/or audio output module is not occupied, the electronic device performs the humming recognition operation provided by the embodiments of the present application. Optionally, after the audio input module and/or the audio output module of the electronic device are released, for example, the audio/video playing ends, the phone hang-up, the voice navigation ends, and so on, the electronic device may perform the humming recognition operation provided by the embodiments of the present application. It should also be understood that the humming recognition operation provided by the embodiments of the present application has a lower priority than operations of the audio input module and/or the audio output module that are occupied by other than the humming recognition operation in the electronic device.

In the following embodiments of the present application, during the humming recognition operation performed by the electronic device according to the embodiments of the present application, if other requests for audio resources are detected that require occupation of the operations of the audio input module and/or the audio output module, the electronic device calls the audio input module and/or the audio output module to perform the operations corresponding to the requests. Specifically, if the request takes less than a predetermined amount of time (e.g., 1 second) for the audio output module, for example, the request may be a notification tone (e.g., SMS tone, application push tone), the humming recognition operation may take up the audio input module, and the corresponding operation takes up the audio output module.

In the following embodiments of the present application, the humming recognition operation provided by the embodiments of the present application may be implemented by a system application or a third party application of the electronic device. In one possible implementation, the system application or the third party application may be dedicated to performing the humming recognition operations provided by embodiments of the present application; in yet another possible implementation, the system application or the third party application may also execute other services (or functions), and the humming recognition operation provided by the embodiments of the present application is only integrated as a service (or function) in the system application or the third party application.

It should be understood that the term "humming recognition" is only used in this embodiment, and the meaning of the term is described in this embodiment, and the term should not be construed as limiting the embodiment. For example, in some possible implementations, "humming recognition" may also be referred to by the names "listening to songs recognition", "humming retrieval", and so on.

In the embodiment of the application, the electronic device performing the humming recognition operation may be an intelligent terminal, an intelligent home device, or a vehicle-mounted device. Next, an exemplary smart terminal 100 provided in the following embodiments of the present application will be first described.

Fig. 1A shows a schematic structural diagram of the smart terminal 100.

The smart terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera module 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an acceleration sensor 180C, a distance sensor 180D, a proximity light sensor 180E, a fingerprint sensor 180F, a touch sensor 180G, an ambient light sensor 180H, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the intelligent terminal 100. In other embodiments of the present application, the smart terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a neural-Network Processing Unit (NPU), a modem processor, an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and the like. The different processing units may be separate devices or may be integrated into one or more processors. In some embodiments, the smart terminal 100 may also include one or more processors 110.

The controller may be a neural center and a command center of the intelligent terminal 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the intelligent terminal 100.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera module 193, etc. through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface, thereby implementing the touch function of the intelligent terminal 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface to enable the ability to play identified audio files via a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also pass audio signals to the wireless communication module 160 through the PCM interface, thereby implementing a function of playing the identified audio files through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the UART interface, so as to implement the function of playing the identified audio file through the bluetooth headset.

The MIPI interface may be used to connect the processor 110 with the peripheral devices such as the display screen 194 and the camera module 193. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, the processor 110 and the camera module 193 communicate through a CSI interface to implement the camera function of the smart terminal 100, so as to obtain the mouth shape information of the user. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the intelligent terminal 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera module 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the intelligent terminal 100, and may also be used to transmit data between the intelligent terminal 100 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface can also be used for connecting other intelligent terminals, such as AR equipment and the like.

It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not form a structural limitation on the intelligent terminal 100. In other embodiments, the intelligent terminal 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive the wireless charging input through a wireless charging coil of the smart terminal 100. The charging management module 140 may also supply power to the intelligent terminal through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display 194, the camera module 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the smart terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the smart terminal 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the smart terminal 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the smart terminal 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves. Illustratively, the wireless communication module 160 may include a Bluetooth module, a Wi-Fi module, and the like. In one possible implementation, the smart terminal may determine its location through the wireless communication module 160.

In some embodiments, the antenna 1 of the smart terminal 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160 so that the smart terminal 100 can communicate with networks and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The intelligent terminal 100 can realize the display function through the GPU, the display screen 194, the application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute instructions to generate or change display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the smart terminal 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The intelligent terminal 100 can realize the camera shooting function through the camera shooting module 193, the ISP, the video codec, the GPU, the display screen 194, the application processor AP, the neural network processor NPU, and the like.

The camera module 193 can be used to collect color image data of a photographic subject. The ISP can be used to process color image data collected by the camera module 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera module 193.

In some embodiments, the light sensing element of the camera of the color camera module may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats.

In some embodiments, the smart terminal 100 may include 1 or N camera modules 193, where N is a positive integer greater than 1. Specifically, the intelligent terminal 100 may include 1 front camera module 193 and 1 rear camera module 193. The front camera module 193 can be generally used to collect the self color image data of the photographer facing the display screen 194, and the rear camera module 193 can be used to collect the color image data of the photographed object (such as people, scenery, etc.) facing the photographer.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the intelligent terminal 100 selects a frequency point, the digital signal processor is used for performing fourier transform and the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The intelligent terminal 100 may support one or more video codecs. In this way, the intelligent terminal 100 can play or record videos in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) -1, MPEG-2, MPEG-3, MPEG-4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent recognition of the intelligent terminal 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the smart terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, data such as audio files, photos, videos, and the like are saved in the external memory card.

Internal memory 121 may be used to store one or more computer programs, including instructions. The processor 110 may execute the above instructions stored in the internal memory 121, so as to enable the smart terminal 100 to execute the photo preview method of the smart terminal provided in some embodiments of the present application, and various functional applications and data processing. The internal memory 121 may include a program storage area and a data storage area. Wherein, the storage program area can store an operating system; the storage area may also store one or more application programs (e.g., gallery, contacts, etc.), etc. The storage data area may store data (e.g., photos, contacts, etc.) created during use of the smart terminal 100. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The intelligent terminal 100 can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The audio output module 170A, also called "speaker" or "loudspeaker", is used to convert the audio electrical signal into a sound signal. The smart terminal 100 may listen to music through the speaker 170A or listen to a hands-free call.

The audio output module 170B, also called "receiver" or "earphone", is used to convert the audio electrical signal into a sound signal. When the smart terminal 100 receives a call or voice information, it can receive voice by bringing the receiver 170B close to the ear of the person.

The audio input module 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The smart terminal 100 may be provided with at least one microphone 170C. In other embodiments, the smart terminal 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the intelligent terminal 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions. In the embodiment of the present application, in the case that the intelligent terminal 100 turns on the "humming recognition" function, the microphone 170C may collect a sound signal near the intelligent terminal 100.

In some embodiments, a CPU or digital or audio processor in the processor 110 may process the sound collected by the microphone 170C. In an embodiment, when the processor 110 determines that the sound collected within the preset time is a human voice, the processor 110 extracts voiceprint information from the sound, and sends a first audio file containing the sound to the music recognition server through the mobile communication module 150 or the wireless communication module 160 if the voiceprint information of the sound is consistent with pre-stored voiceprint information.

In some embodiments, processor 110 includes a user representation module that may capture user information of a user using the smart terminal, which may include information about the user's attributes (age, gender, occupation, etc.), habits, user behaviors, and the like. In a possible implementation manner, the intelligent terminal can abstract the user information to form a user tag and send the user tag to the server for storage. In yet another possible implementation manner, the intelligent terminal may send the user information to the server, and the server analyzes the user information, forms a user tag, and stores the user tag. The user tag corresponds to a user account (or called user ID) of a user using the smart terminal. In the embodiment of the application, labels can be formed on user abstractions according to habits or preferences of users for playing audio files, such as rock, ballad, popular and the like, and labels can be formed on favorite singers by recording, such as plum blossom, Liangshui, Chengyou and the like. In a possible implementation manner, the tag of the second audio file identified by the audio file is included in a user tag of a first user, where the first user may be a user using the smart terminal, and may also be a user corresponding to a user account logged in the smart terminal.

The headphone interface 170D is used to connect a wired headphone. The earphone interface 170D may be the USB interface 130, or may be an open mobile intelligent platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The smart terminal 100 determines the intensity of the pressure according to the change in the capacitance. When a touch operation is applied to the display screen 194, the smart terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The smart terminal 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion gesture of the smart terminal 100. In some embodiments, the angular velocity of the smart terminal 100 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyroscope sensor 180B detects a shake angle of the intelligent terminal 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the intelligent terminal 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes. In some embodiments, the smart terminal 100 may determine its own moving direction through the gyro sensor 180B to improve the accuracy of determining its own position.

The acceleration sensor 180C may detect the magnitude of acceleration of the smart terminal 100 in various directions (generally, three axes). The magnitude and direction of gravity may be detected when the smart terminal 100 is stationary. The gesture recognition method can also be used for recognizing the gesture of the intelligent terminal 100 and applied to horizontal and vertical screen switching, pedometers and the like. In some possible implementations, the user interface illustrated in the following embodiments may perform the switching between horizontal and vertical screens as the gesture of the smart terminal is changed.

A distance sensor 180D for measuring distance. The smart terminal 100 may measure the distance by infrared or laser. In some embodiments, when a scene is shot, the intelligent terminal 100 may utilize the distance sensor 180D to measure distance to achieve fast focusing, so as to improve the accuracy of the acquired mouth shape information.

The proximity light sensor 180E may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The smart terminal 100 emits infrared light to the outside through the light emitting diode. The smart terminal 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the smart terminal 100. When insufficient reflected light is detected, the smart terminal 100 may determine that there are no objects near the smart terminal 100. The intelligent terminal 100 can detect that the user holds the intelligent terminal 100 to talk near the ear by using the proximity light sensor 180E, so as to automatically extinguish the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180F is used to sense the ambient light level. The smart terminal 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180F may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180F may also cooperate with the proximity light sensor 180G to detect whether the smart terminal 100 is in a pocket, so as to prevent accidental touches. In one possible implementation manner, when the intelligent terminal detects that the ambient light brightness is smaller than the preset value and the duration of the ambient light brightness is longer than the preset time through the ambient light sensor 180F, the intelligent terminal stops collecting the sound in the external environment through the audio input module. The fingerprint sensor 180G is used to collect a fingerprint. The smart terminal 100 may unlock the fingerprint by using the collected fingerprint characteristic, so as to unlock the locked state of the smart terminal 100.

Touch sensor 180H, also referred to as a touch panel or touch sensitive surface. The touch sensor 180H may be disposed on the display screen 194, and the touch sensor 180H and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180H is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180H may be disposed on the surface of the smart terminal 100 at a different position than the display screen 194.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The smart terminal 100 may receive a key input, and generate a key signal input related to user setting and function control of the smart terminal 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the smart terminal 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The smart terminal 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The intelligent terminal 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the intelligent terminal 100 employs eSIM, namely: an embedded SIM card. The eSIM card may be embedded in the smart terminal 100 and cannot be separated from the smart terminal 100.

The smart terminal 100 exemplarily illustrated in fig. 1A may display various user interfaces described in various embodiments below through the display screen 194. The smart terminal 100 may detect a touch operation in each user interface through the touch sensor 180H, such as a click operation in each user interface (e.g., a touch operation on an icon, a double-click operation), and an upward or downward sliding operation in each user interface, or an operation of performing a circle-drawing gesture, and so on. In some embodiments, the smart terminal 100 may detect a motion gesture performed by the user holding the smart terminal 100 by the gyroscope sensor 180B, the acceleration sensor 180C, or the like, for example, shaking the smart terminal. In some embodiments, the smart terminal 100 may detect the non-touch gesture operation through the camera module 193 (e.g., 3D camera, depth camera).

The software system of the intelligent terminal 100 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the intelligent terminal 100.

Fig. 1B is a block diagram of a software structure of an intelligent terminal 100 according to an embodiment of the present disclosure.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 1B, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 1B, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide a communication function of the smart terminal 100. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the intelligent terminal vibrates, and the indicator light flickers.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, g.264, MP3, AAC, AMR, JPG, PNG, and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The software system shown in fig. 1B relates to application presentation (e.g., gallery, file manager) using sharing capability, an instant sharing module providing sharing capability, print service (print service) and print background service (print spooner) providing printing capability, and an application framework layer providing print framework, WLAN service, bluetooth service, and a kernel and an underlying layer providing WLAN bluetooth capability and basic communication protocol.

The following describes an exemplary workflow of the software and hardware of the intelligent terminal 100 in conjunction with a scenario of setting the humming recognition authority.

When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the example that the touch operation is a touch operation, and the control corresponding to the touch operation is a switch control of the humming recognition function, the humming recognition application calls an interface of the application framework layer, starts the humming recognition application, starts the microphone drive by calling the kernel layer, and collects sound in the external environment through the microphone 170C.

Fig. 1C schematically illustrates a structural diagram of the smart home device 110 provided in the embodiment of the present application.

For example, the smart home device may be a smart sound box or a smart television. As shown in fig. 1C, the smart home device 110 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, an RJ45 communication processing module 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components may be connected by a bus. Wherein:

the processor 102 is operable to read and execute computer readable instructions. In particular implementations, processor 102 may primarily include a controller, an operator, and registers. The controller is mainly responsible for instruction decoding and sending out control signals for operations corresponding to the instructions. The arithmetic unit is mainly responsible for executing fixed-point or floating-point arithmetic operation, shift operation, logic operation and the like, and can also execute address operation and conversion. The register is mainly responsible for storing register operands, intermediate operation results and the like temporarily stored in the instruction execution process. In a Specific implementation, the hardware architecture of the processor 102 may be an Application Specific Integrated Circuits (ASIC) architecture, an MIPS architecture, an ARM architecture, or an NP architecture, etc.

In some embodiments, the processor 102 may be configured to parse a signal received by the wireless communication processing module 104, for example, a request for modifying setting information sent by the intelligent terminal 100, an identified audio file sent by the music recognition server, and indication information for indicating a start playing position, and the like. The processor 102 may be configured to perform corresponding processing operations according to the parsing result, such as modifying the setting information of the smart home device 110 according to the request, and playing the identified audio file from the playing position, for example.

In some embodiments, the processor 102 may also be configured to process sounds collected by the smart home device 110 in the external environment. For example, the processor 102 may extract voiceprint information in the sound, and if the processor 102 determines that the voiceprint information of the sound matches pre-stored voiceprint information, the first audio file including the sound is sent to the music recognition server through the wireless communication module 104.

In some embodiments, the processor 102 may also be configured to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the intelligent terminal 100 for feeding back an identification status (e.g., identification success, identification failure, etc.).

Memory 103 is coupled to processor 102 for storing various software programs and/or sets of instructions. In particular implementations, memory 103 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as an embedded operating system, e.g., DuEROS, AliGenie, etc. The memory 103 may also store communication programs that may be used to communicate with the intelligent terminal 100, one or more servers (e.g., music recognition servers), or additional devices.

The wireless communication processing module 104 may include one or more of a Bluetooth (BT) communication processing module 104A, WLAN communication processing module 104B.

In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module may listen to a signal transmitted by another device (the smart terminal 100), such as a play request, a request for changing setting information, and the like, and may transmit a response signal, such as a request response, and the like, so that the other device (e.g., the smart terminal 100) may discover the smart home device 110 and establish a wireless communication connection with the other device, and communicate with the other device through one or more wireless communication technologies in bluetooth or WLAN.

In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module may also transmit signals, such as broadcast bluetooth signals and beacon signals, so that other devices (e.g., the smart terminal 100) may discover the smart home device 110 and establish wireless communication connections with other devices (e.g., the smart terminal 100) to communicate with other devices (e.g., the smart terminal 100) through one or more wireless communication technologies in bluetooth or WLAN.

The wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown). The cellular mobile communication processing module may communicate with other devices, such as servers, via cellular mobile communication technology.

The power switch 105 may be used to control the power supply of the power source to the smart home device 110.

The RJ45 communication processing module 106 may be used to process data received or transmitted over the RJ45 interface. The RJ45 interface is used primarily to couple modem modems.

The USB interface 107 may be used to communicate with other devices (e.g., computers, laptops, etc.) via a data line.

The audio input module 108 may be used to capture sound in the external environment and convert the sound into electrical signals. In a possible implementation manner, the smart home device 110 may receive a voice instruction input by a user through the audio input module 108, and in response to the voice instruction, the smart home device executes an operation corresponding to the voice instruction.

The audio output module 109 is configured to convert the audio electrical signal into a sound signal, and the smart home device 100 may play the sound signal through the audio output module 109.

In a possible implementation manner, the smart home device 110 may further include a display screen 110 (not shown), and the display screen 110 may be used to display images, videos, and the like. The display screen 110 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the smart home device 110 may include 1 or N display screens 110, where N is a positive integer greater than 1.

It is understood that the structure illustrated in fig. 1C does not constitute a specific limitation to the smart home device 110. In other embodiments of the present application, the smart home device 110 may include more or fewer components than those shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Fig. 1D schematically shows a structure of the in-vehicle device 120 provided in the present application.

For example, the vehicle-mounted device may be a vehicle-mounted sound box or a vehicle-mounted computer. As shown in fig. 1C, the in-vehicle device 120 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, a display screen 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components may be connected by a bus. Wherein:

In some embodiments, the processor 102 may also be used to process sounds in the external environment captured by the in-vehicle device 120. For example, the processor 102 may extract voiceprint information in the sound, and if the processor 102 determines that the voiceprint information of the sound matches pre-stored voiceprint information, the first audio file including the sound is sent to the music recognition server through the wireless communication module 104.

Memory 103 is coupled to processor 102 for storing various software programs and/or sets of instructions. In particular implementations, memory 103 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as an embedded operating system like uCLinux, GENIVI, ecos, etc. The memory 103 may also store communication programs that may be used to communicate with the intelligent terminal 100, one or more servers (e.g., music recognition servers), or additional devices.

In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module may listen to a signal transmitted by another device (the smart terminal 100), such as a play request, a request for changing setting information, and the like, and may transmit a response signal, such as a request response, and the like, so that the other device (e.g., the smart terminal 100) may discover the in-vehicle device 120 and establish a wireless communication connection with the other device to communicate with the other device through one or more wireless communication technologies in bluetooth or WLAN.

In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module may also transmit signals, such as broadcast bluetooth signals and beacon signals, so that other devices (e.g., the smart terminal 100) may discover the in-vehicle device 120 and establish wireless communication connections with other devices (e.g., the smart terminal 100) to communicate with other devices (e.g., the smart terminal 100) through one or more wireless communication technologies in bluetooth or WLAN.

The power switch 105 may be used to control the power supply of the power source to the in-vehicle device 120.

The display screen 110 may be used to display images, video, and the like. The display screen 110 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the smart home device 110 may include 1 or N display screens 110, where N is a positive integer greater than 1.

The USB interface 107 may be used to communicate with other devices such as a display, the smart terminal 100, or an audio player device through a data line.

The audio input module 108 may be used to capture sound in the external environment and convert the sound into electrical signals. In one possible implementation manner, the in-vehicle device 120 may receive a voice instruction input by a user through the audio input module 108, and in response to the voice instruction, the in-vehicle device executes an operation corresponding to the voice instruction.

The audio output module 109 is configured to convert the audio electrical signal into a sound signal, and the in-vehicle device 120 may play the sound signal through the audio output module 109.

In some embodiments, in-vehicle device 120 may also include a serial interface such as an RS-232 interface. The serial interface can be connected to other devices, such as audio play-out devices like a loudspeaker box, so that the audio play-out devices cooperatively play the identified audio files.

It is to be understood that the configuration illustrated in fig. 1C does not constitute a specific limitation to the in-vehicle apparatus 120. In other embodiments of the present application, the in-vehicle device 120 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

An exemplary user interface for displaying an application menu on the smart terminal 100 is described below.

Fig. 2 illustrates a user interface 21 for the smart terminal 100 to display an application menu. As shown in fig. 2, the user interface 21 may include: a status bar 201, a tray 217 with common application icons, a calendar widget 213, a weather widget 215, and other application icons. Wherein:

the status bar 201 may include: one or more signal strength indicators 203 for mobile communication signals (which may also be referred to as cellular signals), one or more signal strength indicators 205 for wireless fidelity (Wi-Fi) signals, a battery status indicator 209, and a time indicator 211.

The calendar gadget 213 may be used to indicate the current time, e.g., date, day of week, time division information, etc.

The weather gadget 215 may be used to indicate a weather type, such as cloudy sunny, light rain, etc., and may also be used to indicate information such as temperature, etc.

The tray 217 with common application icons may show: phone icon 219, contact icon 221, text message icon 223, camera icon 225.

Other application icons may be, for example: WeChat (Wechat) icon 227, QQ icon 229, Twitter (Twitter) icon 231, Facebook (Facebook) icon 233, mailbox icon 235, cloud sharing icon 237, memo icon 239, Payment treasure icon 221, gallery icon 225, set icon 227. The user interface 21 may also include a page indicator 229. Other application icons may be distributed on other pages, page indicator 229 may be used to indicate the number of pages, and which page the user is currently browsing, such as page indicator 229 showing 3 dots, and 2 nd dot being black, and the other two dots being white, indicating that the current handset includes 3 pages and that the user is browsing 2 nd page. In addition, the user may slide left and right on the current page to browse application icons in other pages. In some embodiments, the user interface 21 exemplarily shown in fig. 2 may be one of the main interfaces (Home screen).

In other embodiments, the intelligent terminal 100 may also include a home screen key. The home screen key may be a physical key or a virtual key. The home screen key may be used to receive a user's instruction and return the currently displayed UI to the home interface in response to the user's instruction, which may facilitate the user in viewing the home screen at any time. The instruction may be an operation instruction for the user to press the home screen key once, an operation instruction for the user to press the home screen key twice in a short time, or an operation instruction for the user to press the home screen key for a long time. In other embodiments of the present application, the home screen key may also incorporate a fingerprint recognizer for fingerprint acquisition and recognition therewith when the home screen key is pressed.

It is understood that fig. 2 is only an exemplary illustration of the user interface on the intelligent terminal 100, and should not be construed as a limitation to the embodiments of the present application.

Next, some embodiments of humming recognition provided by embodiments of the present application will be described.

In the embodiment of the present application, the intelligent terminal 100 may play the recognized audio file along with the humming progress of the user, and in addition, the intelligent terminal 100 may display the recognition result through the display screen 194. The recognition result may be displayed when the intelligent terminal 100 is in a use state, or may be displayed when the intelligent terminal 100 is in a locked state, and embodiments in these two application scenarios are further described below. It should be noted that, in the embodiment of the present application, the intelligent terminal 100 being in the use state indicates that the intelligent terminal 100 is being used by a user, the display screen 194 of the intelligent terminal 100 being in the normally-on state, and the display screen 194 may display user interfaces such as a desktop, an application program interface, a pull-down notification bar, and a negative screen. The fact that the intelligent terminal 100 is in the locked state means that the screen of the intelligent terminal is locked, and in most cases, after the intelligent terminal 100 is locked, the user needs to receive a password input by the user or verify other unlocking manners (for example, fingerprint unlocking, facial unlocking, and the like) to unlock the intelligent terminal. In general, a user may cause the smart terminal 100 to turn off the screen and enter a locked state by clicking a power key of the smart terminal 100 or clicking a virtual control of "lock screen". In addition, the screen locking interface is a user interface displayed by the intelligent terminal 100 after the intelligent terminal 100 enters the locked state and before the intelligent terminal is not unlocked. In the process that the intelligent terminal 100 is in the locked state, the intelligent terminal may display a screen locking interface, and may also be in a screen off (or black screen) state.

First, an embodiment in which the intelligent terminal 100 displays the recognition result in the use state will be described.

Fig. 3A exemplarily shows a user interface 31 that displays the recognition result in the use interface of the application program of the smart terminal 100. It should be noted that the application program performing the humming recognition operation and the application program in use may be the same application program or different application programs, and the embodiments of the present application are not limited thereto. In addition, the application program in the using process is not limited in the present application, and may be an application program such as a wechat, a QQ, a microblog, a mailbox, and the like, and for example, a chat interface in the using process of the wechat is taken as an example in fig. 3A. As shown in fig. 3A, the user interface 31 may include: a display area 318, an input area 319, and a notification window 315. Wherein:

display area 318 may be used to display chat content, which may include text/voice communications with a user of another social account using the present intelligent terminal 100.

An input area 319 may be used to input chat content, and the input area 319 may include a first control 319A, a second control 319B, a third control 319C, and a fourth control 319D. The first control 319A is configured to receive an operation of a user, and in response to the operation of the user, the intelligent terminal 100 displays a voice input button, and typically, the user may input voice information by pressing the voice input button for a long time. It should be noted that, when the voice input button receives the operation of the user, the intelligent terminal 100 needs to collect the voice information input by the user, the audio input module will be occupied by the voice input service of the social application, and the intelligent terminal 100 does not perform the humming recognition operation provided in the embodiment of the present application. And a second control 319B, configured to receive an operation by a user, where in response to the operation by the user, the intelligent terminal 100 displays a keyboard/tablet, and generally, the intelligent terminal 100 may receive text information input by the user through the keyboard/tablet. And a third control 319C, configured to receive an operation by a user, and in response to the operation by the user, the intelligent terminal 100 displays a plurality of emoticons/images for the user to select. A fourth control 319D, configured to receive an operation by a user, and in response to the operation by the user, the intelligent terminal 100 displays selection boxes of multiple input types, such as pictures, photographs, documents, red packages, video calls, and the like, for the user to select. Similarly, when the "shooting" or "video call" selection box receives an operation of a user, the intelligent terminal 100 needs to acquire audio and video information input by the user, the audio input module and/or the audio output module will be occupied by the audio and video input service of the social application program, and the intelligent terminal 100 does not perform the humming recognition operation provided by the embodiment of the present application.

A notification window 315 for displaying the recognition result of the piece of music hummed by the user, wherein the notification window 315 may include: a humming recognition icon 316, a first display area 314, a play control 310, and a control 312.

The humming recognition icon 316 is used to indicate the source of the notification window 315, so that the user can quickly and conveniently know that the notification window 315 is the recognition result output by the humming recognition service (or called a function or an application). It should be noted that the humming recognition icon 316 is only an exemplary icon, and in a specific implementation, the humming recognition icon may be another pattern, such as an icon of a note or other style, and the embodiment of the present invention is not limited thereto.

The first display area 314 may be used to display identification information of the identified audio file, which may provide the user with more information about the identified audio file. The identification information of the audio file may be the name of the song, the lyrics, the name of the singer, the name of the album, the picture of the album cover, the poster of the singer, etc. of the audio file. As shown, the first display area 314 includes the name of the song "Drift ocean see you" in the first display area). The first display area 314 may further include operation indication information (e.g., "click-to-stop play" included in the first display area 314), which may provide a reminder for the user to operate, thereby improving convenience for the user to operate. Optionally, the first display area 314 may also contain artist information or lyric information of a currently playing audio file. In yet another possible case, the smart terminal 100 may also display lyric information of the currently played audio file in the form of a floating window. The floating window is a movable window that is displayed in a floating manner in the display interface of the intelligent terminal 100.

The play control 310 may be configured to receive a user operation, and in response to the user operation, the intelligent terminal 100 suspends playing or continues playing the audio file. Specifically, after the intelligent terminal 100 recognizes the audio file corresponding to the music piece hummed by the user, the audio file will be played following the singing progress of the user, and the playing control 310 at this time displays a first state, where the first state indicates that the audio file is being played. Optionally, during the playing process of the recognized audio file, the intelligent terminal 100 does not perform the humming recognition operation provided by the embodiment of the present application. Under the condition that the playing control 310 is displayed in the first state, if the playing control 310 receives an operation of a user, the intelligent terminal 100 suspends playing the audio file, and displays the playing control 310 in the second state, where the second state represents that the playing of the audio file is suspended. It is understood that, in the case that the playing control 310 is displayed in the second state, if the playing control 310 receives the operation of the user, the intelligent terminal 100 continues to play the audio file, and displays the playing control 310 in the first state.

The control 312 may be configured to receive a user operation, and in response to the user operation, the intelligent terminal 100 suspends playing the audio file to retrieve the user's voice signal, and performs humming recognition on the retrieved voice signal. In one possible scenario, during the process of retrieving the sound signal, the intelligent terminal 100 may display a prompt message (e.g., "in recognition … …") indicating that the intelligent terminal 100 is retrieving the sound signal for humming recognition. In yet another possible scenario, in response to the user's manipulation of the controls 312, the intelligent terminal 100 pauses playing the audio file and jumps to the user interface 35 for displaying humming recognition. The user interface 35 will be described in detail later, and will not be described herein.

In one possible implementation, the notification window 315 disappears after displaying the preset time, which may be 4 seconds, 5 seconds, or other time value. Alternatively, when the notification window 315 receives an operation of the user to slide upward, the smart terminal 100 no longer displays the notification window 315 in the user interface 31 in response to the operation.

In one possible implementation, the notification window 315 may also be displayed in a drop-down notification bar, as shown in FIG. 3B. As shown in fig. 3B, when the downward sliding gesture on the status bar 201 is detected, in response to the gesture, the intelligent terminal 100 may display a drop-down notification bar 318 on the user interface 21, where the drop-down notification bar 318 includes a notification window 315 and a control window 313, where:

the notification window 315 can be as described with reference to fig. 3A and will not be described in detail herein.

The control window 313 may display a plurality of switch controls, such as a switch control 317 for "humming recognition" and a switch control for other functions (e.g., Wi-Fi, Bluetooth, flashlight, etc.). The control window 313 is described in detail in the following setup interface for introducing humming recognition, and is not specifically expanded here.

In one possible case, during the opening of the humming recognition function, a humming recognition icon 311 is displayed in the status bar 201. It is understood that the status bar 201 may be included in a plurality of display interfaces of the smart terminal 100. In this way, the user can know the starting state of the humming recognition function through a plurality of display interfaces of the intelligent terminal 100 conveniently.

Next, an embodiment in which the smart terminal 100 displays the recognition result in the locked state will be described.

Fig. 3C illustrates the user interface 32 displayed when the smart terminal 100 is in the locked state, and the user interface 32 may also be referred to as a lock screen interface. As shown in FIG. 3C, user interface 32 includes status bar 201, calendar gadget 213, and lock screen wallpaper 523. Wherein:

the status bar 201 can refer to the description in fig. 2, and is not described in detail here. Specifically, the status bar 201 herein includes a humming recognition icon 311 and a lock icon 323, wherein the humming recognition icon 311 is used for indicating that the humming recognition function is in an on state, and the lock icon 323 is used for indicating that the intelligent terminal 100 is in a lock state.

The calendar gadget 213 may be as described with reference to FIG. 2 and will not be described in further detail herein. Optionally, the user interface 32 may also include a weather widget 215.

The screen locking wallpaper 523 may be a picture set by a user, a preset picture of the intelligent terminal 100, or a picture downloaded from the network by the intelligent terminal 100.

Fig. 3D illustrates yet another user interface 32 that displays the recognition result.

As shown in fig. 3D, when the intelligent terminal 100 recognizes the audio file of the music piece hummed by the user while the intelligent terminal is in the locked state, the intelligent terminal 100 displays a notification window 324 above the user interface 32, and the notification window 324 may include: a humming recognition icon 316, a second display area 322, a play control 310, a control 312, and a volume control 328.

The humming recognition icon 316, the play control 310 and the control 312 can be described with reference to FIG. 3A, and are not described herein again.

The second display area 322 functions in the same manner as the first display area 314 of fig. 3A, and may display identification information of the recognized audio file. In contrast, the second display area 322 herein includes not only the name "see you through sea in the floating ocean" of the audio file, but also the singer "lie prosperity" of the audio file, and the lyric information "strange city o, familiar corner … …" of the currently played audio file, wherein the bold part "regardless of the future face" of the lyric information is the part of the lyric currently sung by the user. It is understood that the lyric information may change with the progress of the playing of the audio file, such that the lyric information remains synchronized with the playing of the audio file.

In a possible implementation, the tags of the second audio file are included in the user tags of the first user, the meaning of which can be referred to above. The recognition results of humming fragments of the same song by different intelligent terminals may be different, for example, in the process of singing "see you through sea in the floating ocean" by the user 1, the audio file recognized by the intelligent terminal of the user 1 may be a version singing in plum blossom; in the process that the user 2 sings 'reading you through the sea in the floating ocean', the audio file identified by the intelligent terminal of the user 2 may be the version sung in Liangjing. It will be appreciated that the recognition results differ due to the different user tags of user 1 and user 2.

Volume control 328 may be used to adjust the volume at which the audio file is played. Volume control 328 may be used to receive a user action in response to which smart terminal 100 adjusts the volume at which an audio file is played. Optionally, when the received user operation is a leftward sliding operation, the intelligent terminal 100 decreases the volume of playing the audio file; when the received user operation is a slide to the right, the smart terminal 100 increases the volume of playing the audio file. In one possible implementation manner, the ratio of the distance from the audio file control 328 to the left end point of the line segment to the length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.

In a possible implementation manner, the intelligent terminal 100 gradually increases the volume of playing the audio file from low to high in a time period from the time when the audio file starts to be played to a preset time (e.g., a time value of 5 th second, 6 th second, etc.). For example, the volume may be gradually increased from the minimum value of the volume to the volume value set by the user, or gradually increased from 30% of the volume value set by the user to 100% of the volume value set by the user. It should be noted that the volume value set by the user is the volume value indicated by the audio file control 328, and optionally, the volume value set by the user is the volume value set by the user as a result of the user adjusting the volume value last time.

In one possible implementation, the notification window 324 disappears after the identified audio file has been played.

In one possible implementation, the intelligent terminal 100 may also display the content of the notification window 324 above the user interface 32 in the form of a user interface, which may be shown as the user interface 33 in fig. 3E. In FIG. 3E, the user interface 33 includes content contained in a notification window 324, such as the humming recognition icon 316, the second display area 322, the play controls 310, the controls 312, and the volume controls 328. Optionally, the user interface 32 may also display a background picture, such as a poster of a song artist, a picture of an album in which the identified audio file was recorded, and so forth. Alternatively, when the user interface 33 receives an operation (e.g., a sliding operation to the right) from the user, the intelligent terminal 100 displays the user interface 32 (i.e., a screen-locking interface) below the user interface 33.

Fig. 3F illustrates yet another user interface 34 that displays the recognition result.

In one embodiment, in the user interface 31 or the user interface 21, if an operation (e.g., a click operation, a long press operation, a press operation, etc.) input by the user with respect to the notification window 315 is detected, or in the user interface 32, if an operation (e.g., a click operation, a long press operation, a press operation, etc.) input by the user with respect to the notification window 324 is detected, the intelligent terminal 100 displays the user interface 34. Optionally, before the user interface 32 jumps to the user interface 34, the intelligent terminal 100 receives an unlocking operation (e.g., fingerprint unlocking, password unlocking, face unlocking, etc.) input by the user, and in case of successful unlocking, the intelligent terminal 100 performs an operation of jumping from the user interface 32 to the user interface 34.

The user interface 34 includes: the humming recognition icon 316, the second display area 322, the play control 310, the control 312, the volume control 328, the control 330, the control 332, and the control 334, wherein:

the humming recognition icon 316, the second display area 322, the play control 310, the control 312, and the volume control 328 can be referred to the above descriptions, and are not repeated herein.

Controls 330 may be used to collect the identified audio files. The control 330 may receive an operation of a user, and in response to the operation of the user, the intelligent terminal 100 adds an identifier of the identified audio file to a preset favorite (or a folder called "favorite music", which is not limited in this application) to facilitate the user to search for or play the identified audio file next time.

Controls 332 may be used to download the identified audio file. The control 332 may receive an operation by a user, and in response to the operation by the user, the intelligent terminal 100 downloads the audio resources of the identified audio file from the network. Optionally, in response to the operation of the user, the intelligent terminal 100 displays a selection box, and the selection box includes sound quality options such as "standard quality", "high quality", "lossless quality", and the like. The selection box is configured to receive a selection operation of a user on an option, and in response to the selection operation of the user on an option, the intelligent terminal 100 downloads an audio resource with sound quality corresponding to the option.

A control 334 may be used to share the identified audio file. The control 334 may receive an operation of a user, and in response to the operation of the user, the smart terminal 100 displays a sharing frame, where the sharing frame includes a plurality of sharing objects, for example, a QQ, a WeChat, a microblog, a twitter, and the like. The sharing frame is used for receiving a selection operation of a user on a sharing object, and in response to the selection operation of the user on the sharing object, the intelligent terminal 100 sends the identifier of the audio file or the audio resource to the sharing object corresponding to the selection operation.

FIG. 3G illustrates a user interface 35 for humming recognition.

In one embodiment, in the user interface 21, the user interface 31, the user interface 32, the user interface 33 and the user interface 34, if the user's operation (e.g., clicking operation) on the control 312 is detected, the intelligent terminal 100 displays the user interface 35 for humming recognition.

The humming recognition icon 316, the indicator 350, the controls 352 and the controls 354 are included in the user interface 35. Wherein:

the humming recognition icon 316 can be referred to above and will not be described further herein.

The indicator 350 may indicate the time information of the piece of music that the user hums, which changes with the increase of the time that the user hums the audio file, and keeps in synchronization with the time length of the user hums. The indicator 350 may also indicate to the user that the voice signal is entered with operation prompt information (e.g., "multiple humming sentences are recognized more accurately" contained in the indicator 350), and may provide the user with a reminder of the operation in order to improve the accuracy of humming recognition. The operation instruction information may be other contents, for example, when the volume of the user's voice is detected to be small, operation instruction information such as "increase the volume (or sound close to the device) recognition is more accurate" may be displayed.

The control 352 may be configured to receive a user operation (e.g., a long press operation), and in response to the user operation, the smart terminal 100 collects a voice signal input by the user through the microphone 170C. When the finger of the user is detected to leave the display screen 194, the intelligent terminal 100 performs humming recognition according to the collected sound signal. Alternatively, when the smart terminal 100 receives the recognized audio file from the music recognition server, the smart terminal 100 may display the user interface 34 for displaying the recognition result.

In the embodiments of the present application, before the intelligent terminal 100 can implement the humming recognition function, the user can turn on or off the humming recognition function through the setting interface of the intelligent terminal 100. Some interfaces for setting humming recognition will be described below.

FIG. 4A illustrates a user interface 41 for setting the humming recognition functionality.

Similar to the manner in which the drop-down notification bar 318 is displayed as described above, when a slide-down gesture on the status bar 201 is detected, in response to the gesture, the smart terminal 100 may display a drop-down notification bar 401 on the user interface 41, the drop-down notification bar 401 including a control window 313 therein, wherein:

the control window 313 may display a plurality of switch controls, such as a switch control 317 for "humming recognition" and a switch control for other functions (e.g., Wi-Fi, Bluetooth, flashlight, etc.). The switch control 317 has two display states, a first display state (also referred to as an "ON" state) indicating that the humming recognition function is turned ON, and a second display state (also referred to as an "OFF state) indicating that the humming recognition function is turned OFF. In a case where the display state of the switch control 317 is the second display state, when the smart terminal 100 detects an operation (e.g., a touch operation on the switch control 317) on the switch control 317 in the control window 318, in response to the operation, the smart terminal 100 may turn on "humming recognition" and adjust the display state of the switch control 317 to the first display state. In a case where the display state of the on-off control 317 is the first display state, when the smart terminal 100 detects an operation on the on-off control 317 in the control window 318, in response to the operation, the smart terminal 100 may turn off the "humming recognition" and adjust the display state of the on-off control 317 to the second display state. In this way, the user can turn on/off the humming recognition function conveniently.

FIG. 4B illustrates yet another user interface 42 for setting the humming recognition functionality.

As shown in FIG. 4B, user interface 42 includes a display area 410, display area 410 being used to display a plurality of settable options, such as "flight mode," "Wi-Fi," "Bluetooth," and so forth. Display area 410 also includes a plurality of switch controls and a plurality of jump controls, the roles of which are described by switch control 412 and jump control 416, where:

the switch control 412 may be configured to receive a user operation (e.g., a click operation, a slide operation, etc.), and in response to the user operation, the intelligent terminal 100 changes the on state of the function/service/application (i.e., humming recognition function) corresponding to the switch control 412. For example, before the user's operation is received, the display state of the switch control 412 is "ON", which indicates that the humming recognition function is in the ON state. If the switch control 412 receives the user's operation, in response to the user's operation, the intelligent terminal 100 adjusts the display state of the switch control 412 to "OFF" and turns OFF the humming recognition function.

A jump control 416 operable to receive a user's operation, in response to which the intelligent terminal 100 jumps to a setting interface of a function/service/application (i.e., a do-not-disturb mode) corresponding to the jump control 416, it should be noted that the device interface may include a plurality of setting options for the "do-not-disturb mode" function, for example, adjustment of an on state of the do-not-disturb mode, setting of an on time of the do-not-disturb mode, setting of an auto-reply in the do-not-disturb mode, and so on.

FIGS. 5A-5C illustrate yet another example user interface for setting up humming recognition functionality.

As shown in FIG. 5A, user interface 51 includes a display area 522, similar to display area 410 included in user interface 41, display area 522 for displaying a plurality of configurable options, such as "flight mode," "Wi-Fi," "Bluetooth," and so forth.

Unlike the display area 410, the control corresponding to the humming recognition function is a skip control 520, and the skip control 520 can be used to skip the user interface to the "humming recognition" setup interface. As exemplarily shown in fig. 5A-5B, the skip control 520 receives a user's operation (e.g., a click operation), and in response to the user's operation, the intelligent terminal 100 skips from the user interface 51 to the setup interface of "humming recognition" (i.e., the user interface 52).

As shown in fig. 5B, user interface 52 includes a return key 530, a switch control 532, textual information 534, a switch control 536, a control 538, a control 540, a control 552, a plurality of jump controls (e.g., jump control 554), and a switch control 556. Wherein:

a return key 530 may be used to receive an operation by the user, and in response to the operation by the user, the intelligent terminal 100 returns to the last interface of the current page, i.e., the user interface 41 shown in fig. 5A. As will be appreciated by those skilled in the art, the last interface of an interface is determined at the time of application setup.

The functionality of switch control 532 can refer to the functionality of switch control 412 in FIG. 4B and is not described in detail herein.

The text information 534 may be used to explain the authority obtained by the intelligent terminal 100 after the humming recognition function is turned on, so as to facilitate the user to determine whether to give the intelligent terminal 100 the authority for humming recognition according to the explanation. The representation of the textual information 534 may be modified as desired and is not limited herein.

The switch control 536 is operable to receive a user operation, and in response to the user operation, the intelligent terminal 100 performs an operation of setting an enabled time period for humming recognition. For example, if the display state of the switch control 536 is "OFF" before receiving the user operation, indicating that the humming recognition operation has no enabled time, the humming recognition operation may be running. Optionally, in this case, the smart terminal does not display control 538 and control 540. After receiving an operation of the switch control 536 by the user, the smart terminal turns the display state of the switch control 536 to "ON" and displays the display control 538 and the control 540 in response to the operation by the user. The control 538 is configured to receive an enabling time of a humming recognition operation input by a user, and in response to the user operation, the intelligent terminal 100 performs the humming recognition operation provided by the embodiment of the application after the enabling time; the control 540 is used for receiving the end time of the humming recognition operation inputted by the user, and in response to the user operation, the intelligent terminal 100 does not perform the humming recognition operation provided by the embodiment of the application after the end time. It should be noted that, in the process that the intelligent terminal 100 no longer performs the humming recognition operation provided by the embodiment of the present application, the user can still actively trigger the humming recognition in the manner of the prior art.

Controls 522, available for adding, may be used to enable voiceprint information for humming recognition.

A jump 554 operable to receive an operation by a user, in response to which the intelligent terminal 100 jumps from the user interface 52 to the setup interface of voiceprint 1. The voiceprint 1 device interface may include naming and deletion functions, and the like.

And a switch control 556, operable to receive a user operation, in response to which the intelligent terminal 100 sets the available state of the humming recognition function. For example, if the display status of the switch control 556 is "OFF" before receiving the user operation, it indicates that the humming recognition operation is not available when the intelligent terminal 100 is locked, i.e., the intelligent terminal 100 does not perform the humming recognition operation when locked. After receiving the user's operation of the switch control 536, in response to the user's operation, the intelligent terminal 100 turns the display state of the switch control 556 to "ON" and adjusts the usable state of the humming recognition function, i.e., the humming recognition operation is also run when the intelligent terminal 100 is locked.

The switch control 557 may be configured to receive a user operation, and in response to the user operation, the intelligent terminal 100 sets the usable state of the humming recognition function. When the switch control 557 is in the on state, the intelligent terminal may obtain the position of the intelligent terminal during the operation of the humming recognition function. The smart terminal 100 determines whether to stop collecting sound in the external environment through the audio input module or whether to play the identified audio file from the start play position by determining whether the position where the smart terminal is located is a preset place. This determination will be described later, and will not be specifically described here.

As exemplarily shown in fig. 5B-5C, the display screen 194 receives a user's operation (e.g., a slide-up operation), and in response to the user's operation, the intelligent terminal 100 displays the setting contents of "humming recognition" under the switch control 556. As shown in FIG. 5C, the user interface 52 also includes content for setting access rights for the humming recognition function, and in particular, the user interface 52 also includes a skip control 558 and a plurality of switch controls (e.g., switch control 560). Wherein:

a skip control 558 may be used to set the type of wireless data that the humming recognition function allows access to, such as off, WLAN and cellular mobility data.

And a switch control 560 that may be used to set the system functions (i.e., location services) that the humming recognition function allows access to. For example, if the display status of the switch control 560 is "OFF" before receiving the user operation, it indicates that the humming recognition function is running, and the position information of the intelligent terminal 100 may not be obtained. After receiving the user's operation of the switch control 560, the intelligent terminal 100 turns the display state of the switch control 560 to "ON" in response to the user's operation and allows the humming recognition function to acquire position information of the intelligent terminal 100. Similarly, the system functions accessed by other humming recognition western medicines can also be set according to the above manner.

FIGS. 5D-5F illustrate some user interfaces for setting access to the humming recognition functionality.

As exemplarily shown in fig. 5D-5E, in response to user operation of the jump control 524, the intelligent terminal 100 jumps from the user interface 51 to the user interface 53, and the user interface 53 is used to display a plurality of system functions, such as bluetooth, location services, microphone, gallery, and the like. Where one system service corresponds to one jump control (e.g., "microphone" this system service corresponds to jump control 562).

As exemplarily shown in fig. 5E-5F, in response to user operation of jump control 562, smart terminal 100 jumps from user interface 53 to user interface 54, which user interface 54 is used to display a plurality of applications that require access to a microphone. And the user can control the permission of the application program for accessing the microphone through the switch control corresponding to the application program. For example, if the display state of the switch control 572 is "OFF" before receiving the user's operation, it indicates that the humming recognition function does not have access to the microphone. After receiving the user's operation of the switch control 572, the intelligent terminal 100 turns the display state of the switch control 572 to "ON" and allows the humming recognition function to access the microphone in response to the user's operation. Similarly, other ways in which an application accesses system functionality may be referred to above.

Fig. 5G illustrates a user interface 55 for entering voiceprint information.

In one embodiment, in response to a user operation on a control 552 in the user interface 52, the smart terminal 100 jumps from the user interface 52 to the user interface 55 to enter voiceprint information that the user wants to add. In the user interface 55, an indicator 570, text information 572, and a control 574 are included. Wherein:

indicator 570 may be used to provide a prompt to the user to instruct the user to enter voiceprint information.

Text message 572 is the text that the user desires to read. Optionally, the intelligent terminal may display different text messages for the user to read. Therefore, the voice signals of more users can be input, and the accuracy of the voiceprint information is improved.

In a possible case, the intelligent terminal can also instruct the user to sing several music pieces to perform the input of the voiceprint information. In this case, the content of the indicator 570 may be "please press the button for a long time and sing the following song segment to enter the voiceprint information", and correspondingly, the text information 572 is a piece of lyrics.

The control 574, which may be configured to receive a user operation (e.g., a long press operation), in response to the user operation, the smart terminal 100 collects a voice signal input by the user through the microphone 170C. When it is detected that the finger of the user leaves the display screen 194, the smart terminal 100 stores the collected voice signal during the period, extracts the voiceprint information of the collected voice signal, and then stores the extracted voiceprint information.

The humming recognition operation provided by the embodiment of the application can also be applied to smart home devices (e.g., smart speakers, televisions, etc.) and vehicle-mounted devices (e.g., vehicle-mounted speakers), and the smart home devices or the vehicle-mounted devices can execute the humming recognition operation provided by the embodiment of the application. In a possible case, the smart home device or the vehicle-mounted device is not configured with a display screen (e.g., a smart speaker, a vehicle-mounted speaker), and the user may set the humming recognition function on the smart home device or the vehicle-mounted device through the smart terminal 100.

FIGS. 6A-6B illustrate still other user interfaces for setting the humming recognition functionality. Optionally, the user interfaces may be interfaces in smart home applications.

As shown in fig. 6A, the user interface 61 includes a display area 60, the display area 60 includes indication information 600, reminder information 602, a selection box 610, a selection box 614, a control 608, and a display area 606, wherein:

the indication information 600 may be used to indicate family information set by the user, and may also be character information such as "Annie's family" and "Jack's family".

The reminding information 602 may be used to prompt the user about some abnormal situations that need to be noticed, and the intelligent terminal 100 may generate corresponding reminding information according to the state of each intelligent home device. For example, if the security door is not closed for a long time, the smart terminal 100 may display the reminder 602. Or, if the remaining amount of the filter element of the air purifier is less than the preset value, the intelligent terminal 100 may display a reminder message "the filter element of the air purifier needs to be replaced", and the like.

Selection box 610 may display a plurality of selectable home states for the user to select, such as "go home", "go away from home", "sleep", "read", and "more". The preset enabling state can exist in each furniture device corresponding to a home state. For example, if the user performs a selection operation on the "go home" selection box, in response to the selection operation, the smart terminal 100 controls the ceiling lamp of the living room and the air conditioner to be turned on. Optionally, the user may set the starting state of each piece of home equipment in each home state, and may also customize more home states.

Selection box 604 may display multiple home spaces for the user to select, such as "all", "living room", "primary lying", "secondary lying", and so on. The selection box 604 may receive an operation (e.g., a click operation, a slide operation, etc.) of the user, and in response to the operation, the smart terminal 100 displays the smart home devices included in correspondence with the selected home space in the display box 606. For example, when the selection box 604B receives a click operation of the user, the smart terminal 100 displays the smart home devices included in the "living room" in the display box 606

And a control 608, which is used for receiving the operation of the user, and responding to the operation of the user, the intelligent terminal 100 displays the intelligent home device adding interface. The user can enter the information of the new intelligent household equipment through the adding interface.

The display area 606 may be used to display information of one or more smart home devices, where the information may include basic information such as a picture, a name, and an on state. The display area 606 may also be used to receive an operation by a user, and in response to the operation by the user, the smart terminal 100 displays a setting interface of the smart home device corresponding to the operation.

As exemplarily shown in fig. 6A-6B, the display area 606 receives a click operation by a user, and in response to the click operation by the user, the smart terminal 100 jumps from the user interface 61 to the user interface 62. Included in user interface 62 are a return key 620, a switch control 622, a volume control 626, a switch control 628, a control 630, a switch control 620, a switch control 634, a control 636, a control 638, a control 640, and a jump control 642. Wherein:

the return key 620 may be used to receive a user operation, and in response to the user operation, the smart terminal 100 returns to the previous page (i.e., the user interface 61) of the current page.

The switch control 622 may be configured to receive an operation of a user, and in response to the operation of the user, the smart terminal 100 controls an on or off state of the smart speaker. The mode of controlling the smart sound box by the electronic device may be sending a control instruction to instruct the smart sound box to execute an operation corresponding to the control instruction.

Volume control 626 may be used to adjust the volume at which the audio file is played. Volume control 626 may be configured to receive an operation from a user, in response to which smart speaker is controlled by smart terminal 100 to adjust the volume of the played audio file. Optionally, when the received user operation is a leftward sliding operation, the intelligent terminal 100 controls the intelligent sound box to reduce the volume of playing the audio file; when the received user operation is a rightward sliding operation, the intelligent terminal 100 controls the intelligent sound box to increase the volume of the played audio file. In a possible implementation manner, a ratio of a distance from the audio file control 626 to a left end point of the located line segment to a line segment length of the located line segment and a ratio of a current volume to a maximum volume played by the smart sound box have a corresponding relationship.

The switch control 628 may be configured to receive an operation input by a user, and in response to the operation of the user, the smart terminal 100 controls the smart speaker to turn on the sound effect optimization function or turn off the sound effect optimization function.

The control 630 may be configured to receive an operation of inputting time by a user, and in response to the operation by the user, the intelligent terminal 100 controls the intelligent sound box to set the off time as the time input by the user.

The switch control 620 may be configured to receive a user operation (e.g., a click operation, a slide operation, etc.), and in response to the user operation, the smart speaker 100 controls the smart speaker to change the on state of the humming recognition function. For example, before receiving the user operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function of the smart speaker is in an ON state at this time. If the switch control 412 receives the operation of the user, in response to the operation of the user, the intelligent terminal 100 adjusts the display state of the switch control 412 to "OFF", and controls the intelligent speaker to stop the humming recognition function.

And a switch control 634, configured to receive an operation by a user, and in response to the operation by the user, the smart speaker 100 controls the smart speaker to perform an operation of setting an enabled time period for humming recognition. For example, if the display status of the switch control 634 is "OFF" before receiving the user operation, indicating that the humming recognition operation of the smart speaker has no enabled time, the humming recognition operation may be running. Optionally, in this case, the smart terminal 100 does not display the control 636 and the control 638. After receiving the user's operation of switch control 634, the electronic device transitions the display state of switch control 634 to "ON" and displays display control 636 and control 638 in response to the user's operation. The control 636 is configured to receive an enabling time of the humming recognition operation input by the user, and in response to the user's operation, the intelligent terminal 100 controls the intelligent sound box to execute the humming recognition operation, and sets the enabling time of the humming recognition as the operation of the enabling time input by the user; the control 638 is configured to receive the end time of the humming recognition operation input by the user, and in response to the user's operation, the intelligent terminal 100 controls the intelligent speaker to execute the humming recognition operation, and sets the end time of the humming recognition operation as the end time input by the user.

In a possible implementation manner, the smart speaker may not be able to set the start time of the humming recognition function by itself, in this case, in response to the operation of the user on the control 636, the smart terminal 100 sends an instruction to start the humming recognition function to the smart speaker at the turn-on time to control the smart speaker to start the humming recognition function; in response to the user's operation on the control 638, the smart terminal 100 sends an instruction to stop the humming recognition function to the smart speaker at the end time, so as to control the smart speaker to stop the humming recognition function.

Controls 640, which may be used to add voiceprint information that may be used to enable humming recognition.

A jump control 554, operable to receive an operation by a user, in response to which the intelligent terminal 100 jumps from the user interface 62 to the setup interface of voiceprint 1. The voiceprint 1 device interface may include naming and deletion functions, and the like.

In one possible case, the voiceprint information used for matching in the smart box is the voiceprint information stored in the smart terminal 100. Optionally, after receiving the voice signal input by the user in the intelligent terminal 100, the voiceprint information extracted from the voice signal is sent to the intelligent speaker capable of performing humming recognition for storage. Thus, the intelligent sound box with the humming recognition function can use the voiceprint information stored in the electronic equipment to match the sound signals. The user interface for entering voiceprint information in this case can be referred to fig. 5G.

In another possible case, the voiceprint information used for matching in the smart speaker is extracted from the user's voice signal re-entered by the smart speaker. Fig. 6C illustrates yet another user interface 63 for entering voiceprint information. In response to user operation of control 640, smart terminal 100 jumps from user interface 62 to user interface 63. The user interface 63 includes:

the indication information 650 may be used to provide prompt information to the user to instruct the user to enter voiceprint information.

Text message 652 is the text that the user desires to read. Optionally, the electronic device may display different text messages for the user to read. Therefore, the voice signals of more users can be input, and the accuracy of the voiceprint information is improved.

In one possible scenario, the electronic device may also instruct the user to sing several pieces of music for voiceprint information entry. In this case, the content of the indicator 650 may be "please get close to the smart speaker, press the play key for a long time, and sing the following song segment to enter the voiceprint information", and correspondingly, the text information 652 is a piece of lyrics. It should be noted that the play key refers to a play key of the smart sound box, and the play key may be a physical key or a virtual key.

It should be noted that, the humming recognition function can be set by other smart home devices (not limited to smart home devices without a display screen, or smart home devices configured with a display screen) according to the above-described manner, without being limited to a smart speaker. Similarly, the function setting of the vehicle-mounted device may also be performed on the smart terminal 100, and this case also refers to the manner described above.

The user interfaces of the intelligent terminal 100 for setting the humming recognition function of the intelligent home device are introduced above. In a possible implementation manner, a display screen is arranged on the intelligent home equipment or the vehicle-mounted equipment, and the humming recognition function of the intelligent home equipment or the vehicle-mounted equipment can be set. The following describes a setting interface for humming recognition on a vehicle-mounted device.

FIGS. 7A-7B illustrate a user interface on a vehicle device for displaying the set humming recognition functionality.

Fig. 7A exemplarily shows a user interface 71 for displaying an application menu on the in-vehicle apparatus. The user interface 71 may also be referred to as a main menu. As shown in fig. 7A, the user interface 71 may include: calendar gadget 700, status bar 702, display area 708, control 706, where:

the calendar gadget 700 may be used to indicate the current time, such as the date, day of the week, time division information, and the like.

The status bar 201 may include: bluetooth indicator 704, one or more signal strength indicators 705 for wireless fidelity (Wi-Fi) signals, and time indicator 703.

A display area 708 may be used to display a plurality of application icons, such as a navigation icon 708A, a phone icon 708B, a music icon 708C, a video icon 708D, a gallery icon 708E, a radio icon 708F, a locomotive recorder icon 708G, a settings icon 708H.

And a control 706, which is used for receiving the operation of the user, and responding to the operation of the user, the vehicle-mounted equipment jumps back to the user interface 71 (namely the main menu interface) from the current interface.

As shown in fig. 7A-7B, when an operation of the set icon 708H by the user is received, the in-vehicle apparatus jumps from the user interface 71 to the user interface 72 in response to the operation by the user. The user interface 72 is a user interface for displaying a setting menu, which includes a plurality of setting options, such as "system settings 720", "user settings 722", "sound effects settings 724", "network settings 726", "time settings 728", and so forth. The content displayed in the display area 716 is setting content corresponding to the setting option. Alternatively, "system settings 720" may be a default selected setting option, in which case the content displayed in the display area 716 sets the corresponding setting content for the system. Alternatively, if one setting option receives an operation by the user, the display area 716 displays the setting content corresponding to the one setting option in response to the operation by the user.

In one possible implementation, the display area 716 may receive an operation (e.g., a slide-up or slide-down operation) by a user, and in response to the operation, the display area 716 may display more setting contents. As shown in FIG. 7B, the content displayed in the display area 716 is the setting content of the humming recognition.

Switch controls 710, 712, 714 may be included in display area 716.

A switch control 710 may be used to turn on or off the humming recognition function.

Controls 712, which may be used to add voiceprint information that may be used to enable humming recognition. Optionally, control 712 may be used to receive a user operation in response to which the in-vehicle device jumps to a user interface for entering voiceprint information, such as the exemplary illustrated user interface 73. The user interface 73 will be described in more detail later, and is not specifically described here.

And a jump 714, which is used for receiving the operation of the user, and responding to the operation of the user, the vehicle-mounted device jumps from the user interface 72 to the setting interface of the voiceprint 1. The voiceprint 1 device interface may include naming and deletion functions, and the like.

Fig. 7C illustrates a user interface 73 for entering voiceprint information.

In one embodiment, in response to a user operation with respect to control 712 in user interface 72, the in-vehicle device jumps from user interface 72 to user interface 73 to enter voiceprint information that the user wants to add. In the user interface 73, indication information 730 and text information 572 are included. Wherein:

the indication information 730 may be used to provide prompt information for the user to instruct the user to enter the voiceprint information. It should be noted that the play key is a play key of the sound box, and in a possible case, the play key of the sound box is a physical key around the display screen of the vehicle-mounted device.

Text information 732, which is the text that the user needs to read. Optionally, the vehicle-mounted device may display different text messages for the user to read. Therefore, the voice signals of more users can be input, and the accuracy of the voiceprint information is improved.

In one possible case, the vehicle-mounted device may also instruct the user to sing several pieces of music for the entry of voiceprint information. In this case, the content of the indicator 730 may be "please get close to the speaker, press the play key for a long time, and sing the following song segment to enter the voiceprint information", and correspondingly, the text information 732 is a piece of lyrics.

After the voice signal input by the user is input, the vehicle-mounted equipment can extract the voiceprint information of the collected voice signal and store the voiceprint information.

The setting interface for humming recognition on the vehicle-mounted device is described above, and it should be noted that, not limited to the above-described user interface, the setting interface for humming recognition on the vehicle-mounted device may also refer to the user interface 51, the user interface 52, the user interface 53, and the user interface 54 in the above-described intelligent terminal. A further description of the user interface in the vehicle-mounted device that displays the humming recognition result is presented next.

Fig. 8A illustrates a user interface 81 for displaying the recognition result on the in-vehicle apparatus. As shown in FIG. 8A, when the vehicle-mounted device recognizes the audio file according to the music piece hummed by the user, the vehicle-mounted device plays the audio file following the progress of the humming by the user and displays a notification window 842 on a current interface thereof for displaying the recognition result of the music piece hummed by the user, and the notification window 842 may include: humming recognition icon 840, third display area 841, play control 843, and control 844.

The humming recognition icon 840 is used to indicate the source of the notification window 842, so that the user can quickly and conveniently know that the notification window 842 is the recognition result output by the humming recognition service (or called function, application). It should be noted that the humming recognition icon 840 is only an exemplary icon, and in a specific implementation, the humming recognition icon may be another pattern, such as an icon of a note or other style, and the embodiment of the present invention is not limited thereto.

The third display region 841 may be used to display identification information of the recognized audio file. For example, the third display region 841 contains the title of the song "you see through the ocean" in the floating ocean. The third display region 841 may further include operation indication information, for example, "click-to-stop playing" included in the third display region 841 may provide a reminder for the user of the operation, thereby improving convenience of the user's operation. Optionally, the third display region 841 may also contain singer information or lyric information of a currently played audio file. In yet another possible scenario, the in-vehicle device may also display lyric information of the currently playing audio file in the form of a floating window. The floating window is a movable window which is displayed in a floating mode in a display interface of the vehicle-mounted equipment.

The play control 843 may be configured to receive an operation of a user, and in response to the operation of the user, the in-vehicle device suspends playing or continues playing the audio file. Specifically, after the vehicle-mounted device recognizes the audio file corresponding to the music piece hummed by the user, the audio file is played along with the singing progress of the user, and the playing control 843 displays the first state at this time. Under the condition that the playing control 843 is displayed in the first state, if the playing control 843 receives an operation of a user, the vehicle-mounted device suspends playing the audio file, and displays the playing control 843 in the second state. Under the condition that the playing 843 is displayed in the second state, if the playing control 843 receives the operation of the user, the vehicle-mounted device continues to play the audio file, and the playing control 843 is displayed in the first state.

And a control 844, configured to receive an operation of a user, and in response to the operation of the user, the vehicle-mounted device suspends playing the audio file to reacquire the sound signal of the user, and performs humming recognition on the reacquired sound signal.

In one possible implementation, the control 844 is not displayed in the vehicle-mounted device, and the indication information "speech outgoing call" recognization "may be displayed, so that the humming recognition can be performed again. In this case, if the in-vehicle device detects that the speech information input by the user is "re-recognized", the in-vehicle device will pause playing the audio file, perform humming recognition on the segment immediately after the user humms again, and then display the notification window again according to the re-recognition result. In this way, manual operation by a user is not needed, and the instruction for re-humming recognition is conveniently output during the driving process of the user.

In one possible implementation, the notification window 842 disappears after displaying the preset time, which may be a time value of 4 seconds, 5 seconds, etc. Alternatively, when the notification window 842 receives an operation of the user to slide upward, the in-vehicle apparatus no longer displays the notification window 842 in the user interface 81 in response to the operation. Alternatively, the notification window may disappear after the current song has been played.

Fig. 8B illustrates a user interface 82 for displaying the recognition result on yet another vehicle-mounted device. As shown in FIG. 8B, when the vehicle-mounted device recognizes the audio file according to the music piece hummed by the user, the vehicle-mounted device plays the audio file according to the progress of the humming by the user, and displays a user interface 82 on a current interface of the vehicle-mounted device, for displaying the recognition result of the music piece hummed by the user, wherein the user interface 82 may include: humming recognition icon 840, third display area 841, play control 843, control 844, volume control 851, control 853, and control 854.

The humming recognition icon 840, the third display area 841, the playing control 843, and the control 844 can all refer to the description in FIG. 8A, and are not repeated herein.

A volume control 851, which can be used to adjust the volume of playing the audio file. The volume control 851 may be used to receive an operation by the user, and in response to the operation, the in-vehicle apparatus adjusts the volume at which the audio file is played. Optionally, when the received user operation is leftward sliding, the vehicle-mounted device decreases the volume of playing the audio file; and when the received user operation is rightward sliding, the vehicle-mounted equipment increases the volume of the played audio file. In a possible implementation manner, the ratio of the distance from the audio file control 851 to the left end point of the line segment where the audio file control is located to the line segment length of the line segment where the audio file control is located has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.

Controls 853 may be used to collect the identified audio files. The control 853 may receive an operation of a user, and in response to the operation of the user, the vehicle-mounted device adds an identifier of the identified audio file to a preset favorite (or a folder called "favorite music," which is not limited in this application), so that the user can search for or play the identified audio file next time.

A control 854 can be used to download the identified audio file. The control 854 may receive a user action in response to which the in-vehicle device downloads the audio asset of the identified audio file from the network. Optionally, in response to the operation of the user, the vehicle-mounted device displays a selection box, and the selection box comprises sound quality options such as "standard quality", "high quality", "lossless quality", and the like. The selection box is used for receiving the selection operation of a user on one option, and in response to the selection operation of the user on one option, the vehicle-mounted equipment downloads the audio resource with the tone quality corresponding to the option.

A control 855 that can be utilized to share the identified audio file. The control 855 may receive an operation of a user, and in response to the operation of the user, the vehicle-mounted device displays a sharing frame, where the sharing frame includes a plurality of sharing objects, for example, one or more terminal devices connected to the vehicle-mounted device via bluetooth. The sharing frame is used for receiving selection operation of a user on a sharing object, and in response to the selection operation of the user on the sharing object, the vehicle-mounted equipment sends the identification of the audio file or the audio resource to the sharing object corresponding to the selection operation.

In one possible implementation, the notification window 842 is operable to receive user actions in response to which the in-vehicle device displays the user interface 82.

In one possible implementation, the user interface 82 may receive a sliding operation by the user, in response to which the in-vehicle device displays the user interface that was most recently displayed before the user interface 82.

It should be noted that, not limited to the vehicle-mounted device, other smart home devices (smart home devices configured with a display screen) may set the humming recognition function according to the method described in the foregoing fig. 7A to 7C, and display the recognition result according to the method described in the foregoing fig. 8A to 8B. In addition, the user interfaces for displaying the recognition results in the vehicle-mounted device and the smart home device may refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, and the user interface 34 in the smart terminal described above. However, since the functions of the devices are not exactly the same, the interface elements included in the user interface may be adjusted accordingly.

Based on the aforementioned UI embodiments, the following embodiments describe a system architecture for implementing the humming recognition method provided by the present application. The system architecture comprises an electronic device and a music recognition server. Wherein:

the electronic device may be the smart terminal 100 exemplarily shown in fig. 1A, and specifically may be a portable electronic device such as a mobile phone and a tablet computer, or a wearable device such as a smart watch and a smart bracelet, and the electronic device may also be the smart home device 110 exemplarily shown in fig. 1C or the vehicle-mounted device 120 exemplarily shown in fig. 1D. Specifically, the electronic device may have an audio input module, and an audio output module. The electronic equipment can collect sound in an external environment through the audio input module, sends a sound signal to the music recognition server for humming recognition, receives recognized audio files and a playing position from the music recognition server, and plays the recognized audio files from the playing position through the loudspeaker module. In a possible implementation manner, the electronic device may further include a camera module, where the camera module is configured to acquire the mouth shape information of the user, and the electronic device may send the acquired mouth shape information to a music recognition server, where the music recognition server performs humming recognition by combining the mouth shape information with the sound signal.

The music recognition server may perform feature extraction on the received sound signal, perform retrieval using the extracted features (e.g., the sequence of fundamental frequencies), and match audio information that is most similar to the user humming piece from a pre-stored audio resource library (alternatively referred to as a feature database). Alternatively, the music recognition server may be a single server, and the music recognition server may be composed of a plurality of servers. Optionally, the audio resource library may be stored in the music recognition server, and the audio resource library may also be stored in other devices (for example, a database server) that have a connection relationship with the music recognition server.

Referring to FIG. 9, a flow chart of a humming recognition method according to the present invention is shown, and the humming recognition method according to the present invention includes but is not limited to the following steps.

S901, the electronic equipment collects sound in an external environment through an audio input module.

Optionally, before the electronic device collects the sound in the external environment through the audio input module, the electronic device needs to determine whether the audio input module and/or the audio output module of the electronic device is occupied. If the own audio input module and/or audio output module is occupied, for example, playing audio/video, making a call, performing voice navigation, etc., the electronic device does not collect sounds in the external environment through the audio input module for humming recognition operation, it should be noted that the electronic device does not collect sounds in the external environment through the audio input module for humming recognition operation, and it does not represent that in this case, the electronic device does not perform the operation of collecting sounds in the external environment through the audio input module, but represents that the electronic device obtains sounds for the purpose other than humming recognition. For example, during a call, an electronic device (e.g., a mobile phone) needs to acquire sound in an external environment through an audio input module in order to acquire voice information input by a user and acquire environmental sound for noise reduction.

If the audio input module and/or the audio output module of the electronic equipment are not occupied, the sound in the external environment is collected through the audio input module. Optionally, after the audio input module and/or the audio output module of the electronic device are released, for example, the audio/video playing ends, the phone is hung up, the voice navigation ends, and the like, the electronic device may capture the sound in the external environment through the audio input module. It should also be understood that the humming recognition operation provided by the embodiments of the present application has a lower priority than other operations of the electronic device that require audio resources than the humming recognition operation.

And S902, if the voiceprint information of the sound is judged to be consistent with the pre-stored voiceprint information, the electronic equipment sends a first audio file to a music identification server, wherein the first audio file comprises the sound.

Specifically, before the electronic device sends the first audio file to the music recognition server, the electronic device matches the voiceprint information of the sound with the pre-stored voiceprint information. If the matching is successful, namely the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic equipment sends the first audio file to a music recognition server so as to perform humming recognition; if the matching fails, namely the voiceprint information of the sound is inconsistent with the pre-stored voiceprint information, the electronic equipment continues to acquire the sound in the external environment through the audio input module. Note that the pre-stored voiceprint information is pre-stored voiceprint information extracted from a voice signal input by a user. Specifically, the electronic device may receive the sound input by the user through the user interfaces 55, 63, 73 in the above embodiments; and then, the electronic equipment performs voiceprint extraction processing on the collected voice and stores the extracted voiceprint information.

In addition, the voiceprint information of the sound is consistent with the pre-stored voiceprint information, and the voiceprint information of the sound is not completely the same as the pre-stored voiceprint information; when the similarity between the voiceprint information of the sound and the pre-stored voiceprint information is not less than a preset value (e.g., 90%, 95%), it can be determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information. Specifically, the way for the electronic device to match the voiceprint information of the sound with the pre-stored voiceprint information may be: the electronic device extracts voiceprint information from the voice signal, and the electronic device calculates a similarity between the extracted voiceprint information and the predicted voiceprint information. If the similarity is larger than or equal to a preset value, the electronic equipment judges that the voiceprint information of the sound is consistent with the pre-stored voiceprint information; if the similarity is smaller than a preset value, the electronic equipment judges that the voiceprint information of the sound is inconsistent with the pre-stored voiceprint information.

S903, the music identification server searches a second audio file from the audio resource library according to the first audio file, and determines the initial playing position of the second audio file.

In one embodiment, the music recognition server may find the second audio file from the audio repository according to the first audio file by: feature extraction is performed on the first audio file, and a retrieval is performed using the extracted features (e.g., the sequence of fundamental frequencies), and a second audio file that is most similar to the first audio file is selected from a pre-stored audio resource library (alternatively referred to as a feature database). That is, the similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is the audio file of the audio resource library except the second audio file.

Optionally, the music recognition server may convert the first audio file into text information by using an Automatic Speech Recognition (ASR) technique, so as to determine the lyric information corresponding to the first audio file. Further, the music recognition server may determine the progress of the user humming music according to the recognized text information, and further determine the starting playing position of the second audio file. The starting playing position of the second audio file corresponds to the ending position of the first audio file, so that the electronic equipment plays the second audio file from the starting playing position, and the effect of playing audio along with the humming progress of the user can be achieved.

S904, the music identification server sends the second audio file and first indication information to the electronic equipment, and the first indication information indicates the initial playing position of the second audio file.

S905, after receiving the second audio file and the first indication information sent by the music identification server, the electronic device plays the second audio file from the initial playing position through the audio output module.

The following will provide further supplementary description of the embodiments of the respective steps in the above-described method.

In one embodiment, before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether the humming recognition function is turned on. The electronic device may receive the user's settings for the humming recognition function through the user interfaces shown by way of example in the user interface 41, the user interface 42, the user interface 51, the user interface 52, the user interface 62, and the user interface 72 described in the embodiments above. If the electronic equipment judges that the humming recognition function of the electronic equipment is started, the electronic equipment executes a step of acquiring sounds in an external environment through an audio input module; if the electronic device determines that the humming recognition function is not turned on, the electronic device does not perform the step of collecting sounds in the external environment through the audio input module.

In yet another possible implementation, when the electronic device is detected to be in the locked state, the electronic device stops collecting sounds in the external environment through the audio input module. It is understood that the electronic device may capture sounds in the external environment through the audio input module after detecting the unlocking of the electronic device. This implementation is described in the above embodiment with reference to the switch controls 556 in the user interface 52, and the switch controls 556 can be used to set the available state of the humming recognition function. By the mode, when the electronic equipment is in a locked state, the collection of the environmental sound can be stopped, the power consumption can be reduced, and the electric quantity of the electronic equipment can be saved.

In yet another possible implementation manner, when the electronic device is detected to be at the preset location, the electronic device stops collecting the sound in the external environment through the audio input module. It can be understood that, if it is detected that the electronic device is no longer located at the predetermined location, the electronic device may capture the sound in the external environment through the audio input module. The preset location may be a location set by a user (e.g., a company location set by the user, etc.), and the preset location may also be a location pre-stored in the electronic device (e.g., a school, a hospital, a theater, etc.). The electronic device may determine its location through a Global Positioning System (GPS), Bluetooth (BT), or a Wireless Local Area Network (WLAN). This possible implementation can be referred to the description of switch control 557 in user interface 52 in the above embodiment. Specifically, when the "do not disturb the environment" switch control (switch control 557) is in an on state, the electronic device detects in real time (or detects according to a preset period) whether the electronic device itself is located at a preset location, and if the electronic device is detected to be located at the preset location, the electronic device stops collecting sound in the external environment through the audio input module. It should be noted that the preset location is a location that is not suitable for playing the audio file, and in this way, the problem of playing the second audio file in a unsuitable location can be avoided, and the power of the electronic device can be saved.

In yet another possible implementation manner, when the duration that the ambient light brightness is less than the preset value is detected and is greater than the preset time, the electronic device stops collecting the sound in the external environment through the audio input module. It can be understood that when the duration that the ambient light brightness is greater than or equal to the preset value is detected, and is greater than the preset time, the electronic device can collect the sound in the external environment through the audio input module. Optionally, the electronic device may sense the ambient light brightness through an ambient light sensor. It should be noted that the situation that the ambient light brightness of the electronic device is less than the duration of the preset value and is greater than the preset time may represent that the electronic device is located in a user pocket or the current time is at night, and in this case, the electronic device is not suitable for playing the audio file.

In yet another possible implementation, the electronic device stops capturing sound in the external environment through the audio input module for a first period of time. The first time period may be a preset time period (for example, 11 pm to 9 am), and the first time period may also be a time period determined according to time information input by the user. The case where the first time period is determined according to the time information input by the user may correspond to the introduction of the switch control 536 in the user interface 52 in the above embodiment. The user may enter the start time and the end time of the humming recognition function, and the first time period is the time period from the end time to the start time.

In one embodiment, in step S902, before the electronic device determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information, the electronic device may determine whether the sound signal is a human voice. If the electronic equipment judges that the sound is the voice, the electronic equipment judges whether the voiceprint information of the collected sound is consistent with the prestored voiceprint information; if the electronic equipment judges that the sound is not the human voice, the electronic equipment continues to collect the sound in the external environment through the audio input module. The method for judging whether the sound is the human sound by the electronic device may be: the electronic equipment calculates the frequency of the sound, and if the frequency is within a preset frequency range, the electronic equipment judges that the sound is a human voice; if the frequency is not within the preset frequency range, the electronic equipment judges that the sound is not the human voice. The preset frequency range can be set according to requirements, for example, since the reference sound zone of the male voice is 64Hz to 523Hz, the reference sound zone of the female voice is 160Hz to 1200Hz, and the preset frequency range can be 64Hz to 1200 Hz.

In one embodiment, the electronic device may further obtain the mouth shape information of the user through a camera. For example, the electronic device may receive the setting of the access right of the user to the camera through the user interface 52 and the user interface 53 as exemplarily shown in the above embodiments. Optionally, the on state of the camera of the electronic device may be consistent with the on state of the humming recognition function. Optionally, before the electronic device obtains the mouth shape information of the user through the camera, the electronic device determines whether the sound is a human voice. If the voice is judged to be the voice, the electronic equipment can acquire the mouth shape information of the user through the camera. The manner of determining whether the sound is a human sound can refer to the above description, and is not described herein again. By the method, the power consumption of the electronic equipment can be reduced, and the electric quantity of the electronic equipment is saved.

In this case, the music recognition server may further receive the mouth shape information sent by the humming recognition server, and the music recognition server may determine the text information according to the mouth shape information, and determine the final recognition result by combining the text information determined by the mouth shape and the first audio file. That is, the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information. In this way, the accuracy of identifying the second audio file can be further improved.

In one embodiment, before the music recognition server searches the second audio file from the audio repository according to the first audio file, the music recognition server determines whether the first audio file is a music piece. Optionally, the music identification server may determine whether the first audio file is a music piece according to the text information corresponding to the first audio file and a plurality of intervals between consecutive texts in the audio file. It should be noted that the music recognition server pre-stores text information (which may be understood as lyrics) corresponding to an audio file, and a plurality of intervals between consecutive texts in the audio file. And if the similarity between the text information corresponding to the first audio file and the pre-stored text information corresponding to one or more audio files is not less than a preset value, and the similarity between a plurality of interval times between continuous texts in the first audio file and a plurality of interval times between continuous texts in the one or more audio files is not less than a preset value, judging that the first audio file is a music segment. Specifically, when the music recognition server determines that the sound signal is a music piece, the music recognition server searches a second audio file from an audio resource library according to the first audio file. Optionally, the second audio file is included in the one or more audio files. When the music identification server judges that the sound signal is not a music piece, the music identification server feeds back a result that the sound signal is not a music piece to the electronic equipment.

In one possible implementation, the tag of the second audio file is included in the user tag of the first user. The first user is a user logged in the electronic device or a user using the electronic device, and the music identification server is pre-stored with a user tag of the first user. By the method, the second audio file can better accord with the preference of the user, and the user experience is improved.

In one embodiment, before the electronic device plays the second audio file from the start playing position through the audio output module, it is required to determine whether the position information of the electronic device is consistent with a preset location. Specifically, if the electronic device determines that the position of the electronic device is inconsistent with a preset location, the electronic device plays the second audio file from the initial playing position through the audio output module. Optionally, if the electronic device determines that the position of the electronic device is consistent with the preset location, the electronic device may only display the humming recognition result, but does not play the audio file, and the user interface for displaying the humming recognition result may refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, the user interface 34, the user interface 81, and the user interface 82 described in the foregoing embodiments, which are not described herein again. The meaning of the preset location and the determining manner of the preset location can refer to the above description, and are not described herein again. This possible implementation can be referred to the description of switch control 557 in user interface 52 in the above embodiment. Specifically, when the "do not disturb the environment" switch control (switch control 557) is in an on state, the electronic device needs to determine that the position where the electronic device is located is not the preset location before playing the audio file. By the method, the problem that the second audio file is played in an inappropriate place can be avoided, and the electric quantity of the electronic equipment is saved.

In another possible implementation manner, before the electronic device plays the second audio file, the environmental volume of the environment where the electronic device is located is determined, and the electronic device determines the volume for playing the second audio file according to the environmental volume. Specifically, the greater the ambient volume, the greater the volume of the second audio file played by the electronic device, and the smaller the ambient volume, the smaller the volume of the second audio file played by the electronic device.

In one possible implementation manner, after the electronic device plays the second audio file from the start playing position through an audio output module, the method further includes: the electronic equipment displays the identification information of the second audio file and a playing control; the display state of the playing control is a first state, and the first state represents that the second audio file is being played; if the electronic equipment detects a first user operation acting on the playing control in the first state, the electronic equipment pauses playing the second audio file in response to the first user operation, and sets the display state of the playing control to a second state, wherein the second state represents that the playing of the second audio file is paused. Optionally, if the electronic device detects a second user operation acting on the play control in the second state, in response to the second user operation, the electronic device continues to play the second audio file, and sets the display state of the play control to the first state. In addition, the electronic device displays the identification information of the second audio file and the user interface of the play control, which may refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, the user interface 34, the user interface 81, and the user interface 82 described in the above embodiments, and details are not repeated here.

In a possible implementation manner, the electronic device gradually increases the volume of playing the second audio file from low to high in a time period from a time when the electronic device starts playing the second audio file to a preset time (e.g., a time value of 5 th second, 6 th second, etc.). For example, the volume may be gradually increased from the minimum value of the volume to the volume value set by the user, or gradually increased from 30% of the volume value set by the user to 100% of the volume value set by the user.

In a possible implementation manner, after the electronic device plays the second audio file from the initial playing position through the audio output module, the electronic device may further detect whether the second audio file is stored in a pre-stored music folder, and if so, the electronic device may play other audio files in the music folder after playing the second audio file.

It can be understood that, regarding a specific implementation manner of each step of the method described in fig. 9, reference may be made to the embodiments described in fig. 1A to fig. 8B, which are not described herein again.

In an embodiment, the humming recognition method provided by the present application can also be applied in an open platform. Specifically, the open platform acquires a first audio file, wherein the first audio file comprises sound in an external environment; if the open platform judges that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open platform searches a second audio file from an audio resource library according to the first audio file and determines the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the initial playing position of the second audio file corresponds to the ending position of the first audio file; the open platform plays the second audio file from the initial playing position, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position.

The open platform is a platform that provides an open Application Programming Interface (API) or a function (function). That is, the open platform may have a function of an application program provided with an API, or a function. Alternatively, the open platform may implement the method performed by the electronic device and the music recognition server in fig. 9 by calling an API (or function). For example, the open platform may be a voice assistant platform, may only include a voice assistant on the electronic device side, may also include a platform in which the electronic device side and the server side are directly associated with the voice assistant, and may also be only a platform in which the server side is associated with the voice assistant, which is not specifically limited in the embodiment of the present invention. The method for acquiring the first audio by the open platform may be that the open platform acquires the first audio file through an audio input module of a device where the open platform is located, or the open platform receives the first audio file sent by the electronic device having a connection relationship with the open platform. Optionally, the electronic device may actively send the first audio file to the open platform, or the open platform actively acquires the first audio file from the electronic device. And then, the open platform calls an API (or function) with a voiceprint recognition function to judge whether the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, and if the voiceprint information of the first audio file is judged to be consistent with the pre-stored voiceprint information, the open platform calls the API (or function) with a humming recognition function to search a second audio file from an audio resource library according to the first audio file. And then, the open platform plays the second audio file from the initial playing position through an audio output module of the device where the open platform is located, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position. Optionally, the open platform may send a second audio file and first indication information to the electronic device, where the first indication information includes the start playing position, and the first indication information is used to indicate the electronic device to play the second audio file from the start playing position.

It should be noted that the manner in which the open platform performs humming recognition provided by the embodiments of the present application can refer to the specific implementation manner of each step of the method described in fig. 9, and will not be described herein again.

The embodiments of the present application can be combined arbitrarily to achieve different technical effects.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk), among others.

In short, the above description is only an example of the technical solution of the present application, and is not intended to limit the protection scope of the present application. Any modifications, equivalents, improvements and the like made in accordance with the disclosure of the present application are intended to be included within the scope of the present application.

Claims

1. A humming recognition method, comprising:

the electronic equipment collects sound in an external environment through the audio input module;

if the electronic equipment judges that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic equipment sends a first audio file to a music identification server, wherein the first audio file contains the sound, and the music identification server is used for searching a second audio file from an audio resource library according to the first audio file and determining the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file;

the electronic equipment receives the second audio file and first indication information sent by the music identification server, wherein the first indication information indicates the initial playing position of the second audio file;

and the electronic equipment plays the second audio file from the initial playing position through an audio output module.

2. The method of claim 1, further comprising:

the electronic equipment acquires the mouth shape information of a user through a camera;

if the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic equipment sends the mouth shape information to a music identification server;

wherein, the music identification server is further configured to convert the mouth shape information into text information, and the finding out a second audio file from an audio repository according to the first audio file includes:

and searching a second audio file from an audio resource library according to the first audio file and the text information corresponding to the mouth shape information, wherein the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.

3. The method of claim 2, wherein the electronic device obtains the mouth shape information of the user through a camera, and the method comprises:

and if the electronic equipment judges that the voice is the voice, acquiring the mouth shape information of the user through a camera.

4. The method of any of claims 1-3, wherein the electronic device captures sounds in the external environment through an audio input module, comprising:

if the electronic equipment judges that the audio input module and/or the audio output module are not occupied, the electronic equipment collects sound in the external environment through the audio input module.

5. The method of any of claims 1-4, wherein the tag of the second audio file is included in a user tag of the first user.

6. The method of any of claims 1-5, wherein after the electronic device plays the second audio file from the start play position through an audio output module, the method further comprises:

the electronic equipment displays the identification information of the second audio file and a playing control;

the display state of the playing control is a first state, and the first state represents that the second audio file is being played;

and if the electronic equipment detects a first user operation acting on the playing control in the first state, responding to the first user operation, pausing the playing of the second audio file by the electronic equipment, and setting the display state of the playing control to be a second state, wherein the second state represents that the playing of the second audio file is paused.

7. The method according to any one of claims 1-6, further comprising:

when the electronic equipment is detected to be in the locked state, the electronic equipment stops collecting the sound in the external environment through the audio input module.

8. The method according to any one of claims 1-7, further comprising:

when the electronic equipment is detected to be in a preset place, the electronic equipment stops collecting sound in the external environment through the audio input module.

9. The method of any of claims 1-7, wherein the electronic device playing the second audio file from the start play position via an audio output module, comprises:

and if the electronic equipment judges that the position of the electronic equipment is inconsistent with a preset place, the electronic equipment plays the second audio file from the initial playing position through the audio output module.

10. The method according to any one of claims 1-9, further comprising:

the electronic equipment stops collecting the sound in the external environment through the audio input module within a first time period.

11. An electronic device comprising an audio input module, an audio output module, a processor, and a memory, wherein:

the memory is to store program instructions;

the processor is configured to perform the following operations in accordance with the program instructions:

collecting sound in an external environment through an audio input module;

if the voiceprint information of the sound is judged to be consistent with the pre-stored voiceprint information, sending a first audio file to a music identification server, wherein the first audio file comprises the sound, and the music identification server is used for searching a second audio file from an audio resource library according to the first audio file and determining the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file;

receiving the second audio file and first indication information sent by the music identification server, wherein the first indication information indicates an initial playing position of the second audio file;

and playing the second audio file from the initial playing position through an audio output module.

12. The electronic device of claim 11, further comprising a camera, wherein the processor is further configured to perform the following operations according to the program instructions:

acquiring the mouth shape information of a user through a camera;

if the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the mouth shape information is sent to a music identification server;

the music recognition server is also used for converting the mouth shape information into text information;

the music recognition server is further specifically configured to: and searching a second audio file from an audio resource library according to the first audio file and the text information corresponding to the mouth shape information, wherein the similarity between the text information corresponding to the second audio file and the text information corresponding to the mouth shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.

13. The electronic device of claim 12, wherein the processor is specifically configured to perform the following operations in accordance with the program instructions:

and if the voice is judged to be the voice, acquiring the mouth shape information of the user through the camera.

14. The electronic device according to any of claims 11-13, wherein the processor is specifically configured to perform the following operations according to the program instructions:

and if the audio input module and/or the audio output module are not occupied, acquiring the sound in the external environment through the audio input module.

15. The electronic device of any of claims 11-14, wherein the tag of the second audio file is included in a user tag of the first user.

16. The electronic device of any of claims 11-15, further comprising a display screen, wherein the processor is further configured to perform the following operations according to the program instructions:

displaying the identification information of the second audio file and the playing control through the display screen;

and if detecting a first user operation acting on the playing control in the first state, responding to the first user operation, pausing the playing of the second audio file, and setting the display state of the playing control to be a second state, wherein the second state represents that the playing of the second audio file is paused.

17. The electronic device of any of claims 11-16, wherein the processor is further configured to perform the following operations in accordance with the program instructions:

and when the electronic equipment is detected to be in the locked state, stopping collecting the sound in the external environment through the audio input module.

18. The electronic device of any of claims 11-17, wherein the processor is further configured to perform the following operations in accordance with the program instructions:

and when the electronic equipment is detected to be in a preset place, stopping collecting the sound in the external environment through the audio input module.

19. The electronic device according to any of claims 11-17, wherein the processor is specifically configured to perform the following operations according to the program instructions:

and if the position of the electronic equipment is judged to be inconsistent with the preset place, playing the second audio file from the initial playing position through an audio output module.

20. The electronic device of any of claims 11-19, wherein the processor is further configured to perform the following operations in accordance with the program instructions:

stopping the collection of sound in the external environment through the audio input module for a first period of time.

21. A humming recognition method, comprising:

the method comprises the steps that an open platform obtains a first audio file, wherein the first audio file comprises sound in an external environment;

if the open platform judges that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open platform searches a second audio file from an audio resource library according to the first audio file and determines the initial playing position of the second audio file; the similarity between the characteristics of the second audio file and the characteristics of the first audio file is higher than the similarity between the characteristics of a third audio file and the characteristics of the sound, the third audio file is the audio file except the second audio file in the audio resource library, and the starting playing position of the second audio file corresponds to the ending position of the first audio file;

and the open platform plays the second audio file from the initial playing position, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position.

22. The method of claim 21, further comprising:

the open platform acquires the mouth shape information of a user through the electronic equipment;

if the open platform judges that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open platform converts the mouth shape information into text information;

the finding out a second audio file from an audio repository according to the first audio file comprises:

23. The method of claim 22, wherein the open platform obtains the mouth shape information of the user through the electronic device, and comprises:

and if the open platform judges that the sound included in the first audio file is the voice, acquiring the mouth shape information of the user through the electronic equipment.

24. The method of any of claims 21-23, wherein the open platform acquiring the first audio file comprises:

and if the audio input module and/or the audio output module are not occupied by other applications, the open platform acquires a first audio file.

25. The method of any of claims 21-24, wherein the tag of the second audio file is included in a user tag of the first user.

26. The method of any of claims 21-25, wherein after the open platform plays the second audio file from the start play position, the method further comprises:

the open platform displays the identification information of the second audio file and a playing control through the electronic equipment;

if the open platform detects a first user operation acting on the play control in the first state, responding to the first user operation, the open platform pauses playing the second audio file, or controls other application programs of the electronic equipment to pause playing the second audio file, and sets the display state of the play control to be a second state, wherein the second state represents that the second audio file pauses playing.

27. The method according to any one of claims 21-26, further comprising:

and when the electronic equipment is detected to be in a locked state, the open platform stops acquiring the first audio file.

28. The method according to any one of claims 21-27, further comprising:

and when the electronic equipment is detected to be in a preset place, the open platform stops acquiring the first audio file.

29. The method of any of claims 21-27, wherein the open platform playing the second audio file from the start playing location, or the development platform controlling other applications of the electronic device to play the second audio file from the start playing location, comprises:

and if the open platform judges that the position of the electronic equipment is inconsistent with a preset place, the open platform plays the second audio file from the initial playing position, or the development platform controls other application programs of the electronic equipment to play the second audio file from the initial playing position.

30. The method according to any one of claims 21-29, further comprising:

the open platform stops acquiring the first audio file within a first time period.

31. A computer program product comprising instructions for causing an electronic device to perform the method of any of claims 1-10, 21-30 when the computer program product is run on the electronic device.

32. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-10, 21-30.