WO2020239001A1 - Humming recognition method and related device - Google Patents

Humming recognition method and related device Download PDF

Info

Publication number
WO2020239001A1
WO2020239001A1 PCT/CN2020/092802 CN2020092802W WO2020239001A1 WO 2020239001 A1 WO2020239001 A1 WO 2020239001A1 CN 2020092802 W CN2020092802 W CN 2020092802W WO 2020239001 A1 WO2020239001 A1 WO 2020239001A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio file
electronic device
audio
user
information
Prior art date
Application number
PCT/CN2020/092802
Other languages
French (fr)
Chinese (zh)
Inventor
叶波
吴小进
周昕宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020239001A1 publication Critical patent/WO2020239001A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit

Definitions

  • This application relates to the field of computer technology, in particular to a humming recognition method and related equipment.
  • Humming recognition is currently a research hotspot in the field of audio retrieval. Different from using text (for example, song name, singer, or lyrics) to retrieve audio, and different from using a piece of music to retrieve audio, humming recognition can be based on the music segment hummed by the user. retrieve audio.
  • the user triggers the terminal to perform humming recognition mainly in the following two ways: the first method, the user first needs to find an application with humming recognition function, and then find the corresponding functional control of humming recognition in the application , And then perform operations on the functional control to trigger the terminal to perform humming recognition.
  • the user first needs to wake up the intelligent voice assistant (for example, siri, Tmall Genie, etc.) through a wake-up word, and then input a voice command to trigger the terminal to perform humming recognition.
  • the manner in which the user triggers the terminal to perform humming recognition is relatively complicated.
  • This application provides a humming recognition method and related equipment, which can reduce the operation steps of a user triggering a terminal to perform humming recognition, improve the efficiency of humming recognition, and at the same time, can achieve the effect of playing audio following the user's humming, and improve the user Experience.
  • an embodiment of the present application provides a humming recognition method, which may include: an electronic device collects sound in an external environment through an audio input module; if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is the same, the electronic device sends a first audio file to the music recognition server, and the first audio file contains the sound, and the music recognition server is configured to retrieve the audio resource from the audio resource according to the first audio file.
  • the second audio file is found in the library, and the initial playback position of the second audio file is determined; wherein the similarity between the feature of the second audio file and the feature of the first audio file is higher than that of the third
  • the similarity between the feature of the audio file and the feature of the sound, the third audio file is the audio file excluding the second audio file in the audio resource library, and the start playback position of the second audio file is the same as
  • the end position of the first audio file corresponds; the electronic device receives the second audio file and first indication information sent by the music recognition server, and the first indication information indicates the start of the second audio file The starting playback position; the electronic device plays the second audio file from the starting playback position through the audio output module.
  • the method further includes: the electronic device obtains the user's mouth shape information through a camera; if the voiceprint information of the voice is consistent with the prestored voiceprint information, then The electronic device sends the lip shape information to a music recognition server; wherein, the music recognition server is also used to convert the lip shape information into text information, and the first audio file is obtained from an audio resource library
  • Finding the second audio file includes: finding the second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
  • the electronic device obtains the user's mouth shape information through a camera, including: if the electronic device determines that the sound is a human voice, acquiring the user's mouth shape information through the camera .
  • the electronic device collects sounds in the external environment through an audio input module, including: if the electronic device determines that the audio input module and/or the audio output module If it is not occupied, the electronic device collects sounds in the external environment through the audio input module.
  • the tag of the second audio file is included in the user tag of the first user.
  • the method further includes: the electronic device displays The identification information of the second audio file, and the playback control; wherein, the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the electronic device detects the function In response to the first user operation of the play control in the first state, the electronic device pauses playing the second audio file and sets the display state of the play control The second state indicates that the second audio file is paused.
  • the method further includes: when detecting that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module.
  • the method further includes: when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module .
  • the electronic device playing the second audio file from the starting playback position through the audio output module includes: if the electronic device determines that the electronic device is If the location is not consistent with the preset location, the electronic device plays the second audio file from the start playback position through the audio output module.
  • the method further includes: the electronic device stops collecting sounds in the external environment through the audio input module within the first time period.
  • the electronic device collects sounds in the external environment through the audio input module, including: if the electronic device determines that its own humming recognition function is enabled, the electronic device collects the sound through the audio input module Sound in the external environment.
  • the method further includes: when it is detected that the duration of the ambient light brightness is less than the preset value and greater than the preset time, the electronic device stops collecting external data through the audio input module. Sound in the environment.
  • the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.
  • the electronic device will use the time period from the time when the second audio file is played to the preset time (for example, the 5th second, the 6th second, etc.)
  • the volume of the second audio file is gradually increased from low to high.
  • the electronic device may also detect the second audio file Whether it is stored in a pre-stored music folder, if so, the electronic device can play other audio files in the music folder after playing the second audio file.
  • an embodiment of the present application provides an electronic device.
  • This electronic device includes an audio input module, an audio output module, a processor, and a memory.
  • the memory is used to store program instructions;
  • the program instructions perform the following operations: collect sounds in the external environment through the audio input module; if it is determined that the voiceprint information of the voice is consistent with the prestored voiceprint information, send the first audio file to the music recognition server, and the first audio file is sent to the music recognition server.
  • An audio file contains the sound
  • the music recognition server is configured to find a second audio file from an audio resource library according to the first audio file, and determine the starting playback position of the second audio file; where The similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library Except for the audio files of the second audio file, the start playback position of the second audio file corresponds to the end position of the first audio file; receiving the second audio file sent by the music recognition server, and First indication information, where the first indication information indicates the start playback position of the second audio file; the second audio file is played from the start playback position through an audio output module.
  • the operation steps of the user's touch and humming recognition can be reduced, and the efficiency of humming recognition can be improved.
  • the effect of playing audio following the user's humming can be realized, and the user experience can be improved.
  • the electronic device further includes a camera
  • the processor is further configured to perform the following operations according to the program instructions: obtain the user's mouth shape information through the camera; If the voiceprint information of the voice is consistent with the pre-stored voiceprint information, the lip-shape information is sent to the music recognition server; wherein the music recognition server is also used to convert the lip-shape information into text information; the music recognition The server is also specifically configured to find a second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file is The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
  • the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the sound is a human voice, obtain the user's mouth shape information through a camera.
  • the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the audio input module and/or the audio output module is not occupied, pass the audio The input module collects sounds in the external environment.
  • the tag of the second audio file is included in the user tag of the first user.
  • the electronic device further includes a display screen
  • the processor is further configured to perform the following operations according to the program instructions: display identification information of the second audio file on the display screen , And a play control; wherein the display state of the play control is a first state, and the first state indicates that the second audio file is being played; if it is detected that it acts on the play in the first state
  • the first user operation of the control in response to the first user operation, pause the playback of the second audio file, and set the display state of the playback control to the second state, the second state representing the second The audio file is paused.
  • the processor is further configured to perform the following operations according to the program instructions: when detecting that the electronic device is in a locked state, stop collecting the external environment through the audio input module In the voice.
  • the processor is further configured to perform the following operations according to the program instructions: when it is detected that the electronic device is at a preset location, stop collecting external data through the audio input module Sound in the environment.
  • the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the location of the electronic device is inconsistent with the preset location, the audio output module The second audio file is played at the start playback position.
  • the processor is further configured to perform the following operations according to the program instructions: stop collecting sounds in the external environment through the audio input module within the first time period.
  • the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that its own humming recognition function is turned on, collect data in the external environment through the audio input module sound.
  • the processor is further configured to perform the following operations according to the program instructions: when it is detected that the duration of the ambient light brightness is less than a preset value and is greater than the preset time, Stop collecting sounds in the external environment through the audio input module.
  • the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.
  • the processor is further configured to perform the following operations according to the program instructions: from the time when the second audio file starts to be played to a preset time (for example, the 5th second, During the time period of the 6th second, the volume of the second audio file will be gradually increased from low to high.
  • a preset time for example, the 5th second
  • the processor is further configured to perform the following operations according to the program instructions: detecting whether the second audio file is stored in a pre-stored music folder, and if so, After the second audio file is played, other audio files in the music folder are played.
  • an embodiment of the present application provides yet another humming recognition method.
  • the method includes: an open platform obtains a first audio file, and the first audio file includes sounds in an external environment; if the open platform determines If the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform searches for the second audio file from the audio resource library according to the first audio file, and determines the value of the second audio file The initial playback position; wherein the similarity between the features of the second audio file and the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound, and the third
  • the audio file is an audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file corresponds to the end position of the first audio file; The second audio file is played at the initial playback position, or the development platform controls other applications of the electronic device to play the second audio file from the initial playback position.
  • the operation steps for the user to trigger the humming recognition can be reduced, and the efficiency of the humming recognition can be improved.
  • the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.
  • the method further includes: the open platform obtains the user's mouth shape information through the electronic device; if the open platform determines the sound of the first audio file The pattern information is consistent with the pre-stored voiceprint information, the open platform converts the lip shape information into text information; the searching for the second audio file from the audio resource library according to the first audio file includes: The text information corresponding to the first audio file and the lip shape information finds the second audio file from the audio resource library, wherein the text information corresponding to the second audio file is compared with the text information corresponding to the lip shape information. The similarity is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
  • the open platform obtains the user's mouth shape information, including: if the open platform determines that the voice included in the first audio file is a human voice, using the The electronic device obtains the user's mouth shape information.
  • the open platform acquiring the first audio file includes: if the open platform determines that the audio input module and/or the audio output module is not occupied by other applications, then the open platform Get the first audio file.
  • the tag of the second audio file is included in the user tag of the first user.
  • the method further includes: the open platform displays the second audio file through an electronic device Second, the identification information of the audio file and the playback control; wherein the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the open platform detects that it acts on The first user operation of the playback control in the first state, in response to the first user operation, the open platform pauses the second audio file, or controls other applications of the electronic device to pause the playback.
  • the second audio file is set, and the display state of the playback control is set to the second state, and the second state indicates that the second audio file is paused.
  • the method further includes: when detecting that the electronic device is in a locked state, the open platform stops acquiring the first audio file.
  • the method further includes: when it is detected that the electronic device is at a preset location, the open platform stops acquiring the first audio file.
  • the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device from the start Playing the second audio file at the playback position includes: if the open platform determines that the position of the electronic device is inconsistent with a preset location, then the open platform plays the second audio file from the starting playback position, Or the development platform controls another application program of the electronic device to play the second audio file from the start playback position.
  • the method further includes: the open platform stops acquiring the first audio file within the first time period.
  • the open platform acquiring the first audio file includes: if the humming recognition function of the electronic device is enabled, the open platform acquiring the first audio file.
  • the method further includes: when it is detected that the ambient light brightness of the electronic device is less than the duration of the preset value and greater than the preset time, the open platform stops acquiring The first audio file.
  • the open platform is also used to find a second audio file from an audio resource library according to the first audio file when it is determined that the first audio file is a music fragment .
  • the open platform will use the time period from the moment when the second audio file starts to play to the preset moment (for example, the 5th second, the 6th second, etc.)
  • the volume of the second audio file is gradually increased from low to high.
  • the open platform may also detect the second audio file. Second, whether the audio file is stored in the pre-stored music folder of the electronic device. If so, the open platform can control other applications of the electronic device to play the second audio file after other applications of the electronic device have finished playing the second audio file Other audio files in the music folder.
  • the embodiments of the present application provide a computer program product containing instructions, when the computer program product is run on an electronic device, the electronic device is caused to execute, such as causing the above-mentioned electronic device to execute as any one of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions run on an electronic device, the electronic device is caused to execute such as the first Any possible implementation manner in the aspect, or when the instruction runs on an open platform, causes the open platform to execute any possible implementation manner in the third aspect.
  • the electronic device can continuously acquire the sound in the external environment.
  • the electronic device sends the first sound containing the sound to the music recognition server. Audio files for humming recognition.
  • the electronic device After the electronic device receives the recognized second audio file and its starting position from the music recognition server, it can start playing the second audio file from the ending position of the sound.
  • the start playback position of the second audio file corresponds to the end position of the first audio file.
  • FIG. 1A is a schematic structural diagram of a smart terminal provided by an embodiment of the present application.
  • FIG. 1B is a software structure block diagram of a smart terminal provided by an embodiment of the present application.
  • FIG. 1C is a schematic structural diagram of a smart home device provided by an embodiment of the present application.
  • FIG. 1D is a schematic structural diagram of a vehicle-mounted device provided by an embodiment of the present application.
  • Figure 2 is a user interface for displaying application menus on a smart terminal provided by an embodiment of the present application
  • 3A-3B are some user interfaces that display recognition results provided by embodiments of the present application.
  • FIG. 3C is a user interface displayed when a smart terminal is in a locked state according to an embodiment of the present application.
  • 3D-FIG. 3F are other user interfaces that display recognition results provided by embodiments of the present application.
  • FIG. 3G is a user interface for humming recognition provided by an embodiment of the present application.
  • 4A-4B are some user interfaces for setting the humming recognition function provided by embodiments of the present application.
  • 5A-5C are some other user interfaces for setting the humming recognition function provided by the embodiments of the present application.
  • Figures 5D-5F are some user interfaces for setting access rights for the humming recognition function provided by embodiments of the present application.
  • FIG. 5G is a user interface for entering voiceprint information provided by an embodiment of the present application.
  • 6A-6B are other user interfaces for setting the humming recognition function provided by the embodiments of the present application.
  • FIG. 6C is another user interface for entering voiceprint information provided by an embodiment of the present application.
  • 7A-7B are user interfaces for setting the humming recognition function on some vehicle-mounted devices provided by the embodiments of the present application.
  • FIG. 7C is another user interface for entering voiceprint information provided by an embodiment of the present application.
  • 8A-8B are user interfaces for displaying recognition results on some vehicle-mounted devices provided by embodiments of the present application.
  • Fig. 9 is a flowchart of a humming recognition method provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “plurality” means two or more.
  • Humming recognition is a way to perform audio retrieval through music fragments hummed by users.
  • the working principle of humming recognition is: the electronic device obtains a music piece hummed by the user, and then sends the music piece to the server.
  • the server matches the audio file that is most similar to the user's humming record through the similarity. Then, the server The audio file is fed back to the electronic device.
  • the server extracts a feature (for example, a fundamental frequency sequence) from a music segment, and then uses the feature to perform a search, and matches an audio file that is most similar to the user's humming segment from a pre-stored audio resource library. Since the user's humming segment cannot be completely similar to the actual audio file segment in the library, humming recognition is a fuzzy match. For fuzzy matching, string edit distance and dynamic time warping (DTW) algorithms can be used to improve the accuracy of recognition.
  • DTW dynamic time warping
  • User interface is a medium interface for interaction and information exchange between applications or operating systems and users. It realizes the conversion between the internal form of information and the form acceptable to users.
  • the user interface of the application is the source code written in a specific computer language such as java, extensible markup language (XML), etc.
  • the interface source code is parsed and rendered on the electronic device 300, and finally presented to the user can be recognized Content, such as pictures, text, buttons and other controls.
  • Controls are the basic elements of the user interface. Typical controls include buttons, widgets, toolbars, menu bars, text boxes, and scroll bars. scrollbar), pictures and text.
  • the attributes and content of the controls in the interface are defined by tags or nodes.
  • XML specifies the controls contained in the interface through nodes such as ⁇ Textview>, ⁇ ImgView>, and ⁇ VideoView>.
  • a node corresponds to a control or attribute in the interface, and the node is parsed and rendered as user-visible content.
  • applications such as hybrid applications, usually include web pages in their interfaces.
  • a webpage also called a page, can be understood as a special control embedded in the application program interface.
  • the webpage is source code written in a specific computer language, such as hypertext markup language (HTML), cascading style Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc.
  • web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions.
  • the specific content contained in a web page is also defined by tags or nodes in the source code of the web page. For example, HTML defines the elements and attributes of the web page through ⁇ p>, ⁇ img>, ⁇ video>, and ⁇ canvas>.
  • GUI graphical user interface
  • the following embodiments of the present application provide a humming recognition method and electronic device, which can enable the electronic device to follow the user's humming and play the audio file corresponding to the music fragment while the user is humming a music fragment, thereby reducing the user triggering the terminal Perform the operation steps of humming recognition to improve the efficiency of humming recognition.
  • electronic devices for example, smart terminals, smart homes, in-vehicle devices, etc.
  • the implementation process of the humming recognition operation can refer to the following steps: first, the electronic device collects the sound in the external environment through the audio input module (for example, a microphone); then, if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is consistent, the electronic device sends the first audio file containing this sound to the music recognition server for humming recognition, so as to identify the audio file that matches the music segment hummed by the user, and determine the audio file Start playback position.
  • the audio input module for example, a microphone
  • the start playback position of the recognized audio file corresponds to the end position of the first audio file.
  • the electronic device before the electronic device performs the humming recognition operation provided by the embodiments of the present application, it needs to determine whether its own audio input module and/or audio output module is occupied, if its own audio input module and/or audio The output module is occupied, for example, to play audio/video, make a call, perform voice navigation, etc., the electronic device does not perform the humming recognition operation provided in the embodiment of this application; if the electronic device’s own audio input module and/or audio If the output module is not occupied, the electronic device performs the humming recognition operation provided in the embodiment of the present application.
  • the electronic device can execute the Humming recognition operation.
  • the priority of the humming recognition operation provided in the embodiments of the present application is lower than the priority of the operations of the electronic device other than the humming recognition operation that need to occupy the audio input module and/or the audio output module.
  • the electronic device when the electronic device performs the humming recognition operation provided by the embodiments of the present application, if it detects other requests for audio resources that require the operation of the audio input module and/or audio output module, The electronic device invokes the audio input module and/or audio output module to perform an operation corresponding to the request.
  • the time required for the request to occupy the audio output module is less than the preset value (for example, 1 second), for example, the request can be a notification sound (for example, a short message sound, an application push sound) ,
  • the humming recognition operation can occupy the audio input module, and at the same time, the operation corresponding to the request occupies the audio output module.
  • the humming recognition operation provided by the embodiments of the present application may be a system application or a third-party application of the electronic device.
  • the system application or the third-party application may be dedicated to performing the humming recognition operation provided in the embodiment of the present application; in another possible implementation, the system application or the third-party application
  • the third-party application can also execute other services (or functions).
  • the humming recognition operation provided in the embodiment of the present application is only integrated into the system application or the third-party application as a service (or function).
  • humming recognition is only a name used in this embodiment, and its representative meaning has been recorded in this embodiment, and its name does not constitute any limitation to this embodiment.
  • “humming recognition” may also be referred to as “listening to song recognition”, “humming retrieval” and other names.
  • the electronic device that performs the humming recognition operation may be a smart terminal, a smart home device, or a vehicle-mounted device.
  • FIG. 1A shows a schematic diagram of the structure of the smart terminal 100.
  • the smart terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera module 193, display 194 , And subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an acceleration sensor 180C, a distance sensor 180D, a proximity light sensor 180E, a fingerprint sensor 180F, a touch sensor 180G, an ambient light sensor 180H, and so on.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the smart terminal 100.
  • the smart terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a central processing unit (CPU), and a graphics processing unit (GPU).
  • AP application processor
  • CPU central processing unit
  • GPU graphics processing unit
  • NPU Neural network processor
  • modem processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor etc.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the smart terminal 100 may also include one or more processors 110.
  • the controller may be the nerve center and command center of the smart terminal 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated access is avoided, the waiting time of the processor 110 is reduced, and the efficiency of the smart terminal 100 is improved.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may couple the touch sensor 180K, charger, flash, camera module 193, etc., through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the smart terminal 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera module 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera module 193 communicate through a CSI interface to implement the camera function of the smart terminal 100, so as to obtain the user's mouth shape information.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the smart terminal 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera module 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the smart terminal 100, and can also be used to transfer data between the smart terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect to other smart terminals, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the smart terminal 100.
  • the smart terminal 100 may also adopt different interface connection modes in the above-mentioned embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the smart terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the smart terminal through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera module 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the smart terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the smart terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the smart terminal 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 may receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the smart terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the wireless communication module 160 may include a Bluetooth module, a Wi-Fi module, and the like.
  • the smart terminal can determine its own location through the wireless communication module 160.
  • the antenna 1 of the smart terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the smart terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the smart terminal 100 can implement a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphical rendering.
  • the processor 110 may include one or more GPUs, which execute instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the smart terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the smart terminal 100 can realize a camera function through a camera module 193, an ISP, a video codec, a GPU, a display screen 194, an application processor AP, a neural network processor NPU, and the like.
  • the camera module 193 can be used to collect color image data of the subject.
  • the ISP can be used to process the color image data collected by the camera module 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera module 193.
  • the photosensitive element of the camera of the color camera module may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats.
  • the smart terminal 100 may include 1 or N camera modules 193, and N is a positive integer greater than 1.
  • the smart terminal 100 may include a front camera module 193 and a rear camera module 193.
  • the front camera module 193 can usually be used to collect the photographer's own color image data facing the display screen 194, and the rear camera module 193 can be used to collect the photographic objects (such as people, landscapes, etc.) faced by the photographer. Color image data.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the smart terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the smart terminal 100 may support one or more video codecs. In this way, the smart terminal 100 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG)-1, MPEG-2, MPEG-3, MPEG-4, etc.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the smart terminal 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the smart terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save audio files, photos, videos and other data in an external memory card.
  • the internal memory 121 may be used to store one or more computer programs, and the one or more computer programs include instructions.
  • the processor 110 can run the above-mentioned instructions stored in the internal memory 121 to enable the smart terminal 100 to execute the smart terminal photographing preview method provided in some embodiments of the present application, as well as various functional applications and data processing.
  • the internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store the operating system; the storage program area can also store one or more application programs (such as a gallery, contacts, etc.) and so on.
  • the data storage area can store data (such as photos, contacts, etc.) created during the use of the smart terminal 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the smart terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the audio output module 170A also called “speaker” and “speaker”, is used to convert audio electrical signals into sound signals.
  • the smart terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the audio output module 170B also called “receiver” and “earpiece”, is used to convert audio electrical signals into sound signals.
  • the smart terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the audio input module 170C also called a "microphone” and a “microphone”, is used to convert sound signals into electrical signals.
  • the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C.
  • the smart terminal 100 may be provided with at least one microphone 170C.
  • the smart terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals.
  • the smart terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the microphone 170C can collect sound signals near the smart terminal 100.
  • the CPU or digital processor or audio processor in the processor 110 may process the sound collected by the microphone 170C.
  • the processor 110 determines that the voice collected within the preset time is a human voice, the processor 110 extracts voiceprint information from the voice. If the voiceprint information of the voice is compared with the pre-stored voiceprint If the information is consistent, the first audio file containing the sound is sent to the music recognition server through the mobile communication module 150 or the wireless communication module 160.
  • the processor 110 includes a user portrait module.
  • the user portrait module can collect user information of the user who uses the smart terminal.
  • the user information may include the user's attributes (age, gender, occupation, etc.), life habits, User behavior and other information.
  • the smart terminal can abstract the user information based on the user information to form a user tag, and send it to the server for storage.
  • the smart terminal may send the user information to the server, and the server analyzes the user information to form a user tag and store it.
  • the user tag has a corresponding relationship with the user account (or called user ID) of the user who uses the smart terminal.
  • users can be abstracted into labels based on their habit or preference for playing audio files, such as rock, folk songs, pop, etc., and favorite singers can also be recorded to form labels, for example, Li Zongsheng, Liang Jingru , Eason Chan and so on.
  • the tag of the second audio file identified by the audio file is included in the user tag of the first user, and the first user may be a user who uses the smart terminal, or may be logged in to the smart terminal. The user corresponding to the user account on.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the smart terminal 100 determines the intensity of the pressure according to the change in capacitance.
  • the smart terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the smart terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch location but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the smart terminal 100.
  • the angular velocity of the smart terminal 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the smart terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the smart terminal 100 through a reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the smart terminal 100 may determine its own moving direction through the gyro sensor 180B, so as to improve the accuracy of determining its own position.
  • the acceleration sensor 180C can detect the magnitude of the acceleration of the smart terminal 100 in various directions (generally three-axis). When the smart terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of the smart terminal 100, and be used in applications such as horizontal and vertical screen switching, pedometers, etc. In some possible implementation manners, the user interface exemplified in the following embodiments may switch between horizontal and vertical screens as the posture of the smart terminal changes.
  • the smart terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the smart terminal 100 may use the distance sensor 180D to measure distances to achieve rapid focusing and improve the accuracy of the acquired lip information.
  • the proximity light sensor 180E may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the smart terminal 100 emits infrared light to the outside through the light emitting diode.
  • the smart terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the smart terminal 100. When insufficient reflected light is detected, the smart terminal 100 can determine that there is no object near the smart terminal 100.
  • the smart terminal 100 may use the proximity light sensor 180E to detect that the user holds the smart terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180F is used to sense the brightness of the ambient light.
  • the smart terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180F can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180F can also cooperate with the proximity light sensor 180G to detect whether the smart terminal 100 is in a pocket to prevent accidental touch.
  • the smart terminal stops collecting the external environment through the audio input module. sound.
  • the fingerprint sensor 180G is used to collect fingerprints.
  • the smart terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking to release the locked state of the smart terminal 100.
  • the touch sensor 180H can also be called a touch panel or a touch-sensitive surface.
  • the touch sensor 180H may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180H and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180H is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180H may also be disposed on the surface of the smart terminal 100, which is different from the position of the display screen 194.
  • the button 190 includes a power button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the smart terminal 100 may receive key input, and generate key signal input related to user settings and function control of the smart terminal 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the smart terminal 100.
  • the smart terminal 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 may also be compatible with external memory cards.
  • the smart terminal 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the smart terminal 100 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the smart terminal 100 and cannot be separated from the smart terminal 100.
  • the smart terminal 100 exemplarily shown in FIG. 1A can display various user interfaces described in the following embodiments through a display screen 194.
  • the smart terminal 100 can detect touch operations in each user interface through the touch sensor 180H, such as a click operation in each user interface (such as a touch operation on an icon, a double-click operation), and for example, up or down in each user interface. Swipe down, or perform circle-drawing gestures, etc.
  • the smart terminal 100 may detect a motion gesture performed by the user holding the smart terminal 100 by hand, for example, shaking the smart terminal through a gyroscope sensor 180B, an acceleration sensor 180C, etc.
  • the smart terminal 100 can detect non-touch gesture operations through the camera module 193 (such as a 3D camera, a depth camera).
  • the software system of the smart terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture.
  • the embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the smart terminal 100 by way of example.
  • FIG. 1B is a software structure block diagram of a smart terminal 100 provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text and controls that display pictures.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the smart terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, smart terminals vibrate, and indicator lights flash.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, G.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the software system shown in Figure 1B involves the presentation of applications that use sharing capabilities (such as gallery and file manager), instant sharing modules that provide sharing capabilities, and print services and print spooler that provide printing capabilities.
  • sharing capabilities such as gallery and file manager
  • instant sharing modules that provide sharing capabilities
  • print services and print spooler that provide printing capabilities.
  • application framework layer provides printing framework, WLAN service, Bluetooth service, and the core and bottom layer provide WLAN Bluetooth capabilities and basic communication protocols.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, etc.).
  • the original input events are stored in the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch touch operation, and the control corresponding to the touch operation is the switch control of the humming recognition function as an example, the humming recognition application calls the interface of the application framework layer, starts the humming recognition application, and then starts by calling the kernel layer
  • the microphone is driven, and the sound in the external environment is collected through the microphone 170C.
  • FIG. 1C exemplarily shows a schematic structural diagram of a smart home device 110 provided in an embodiment of the present application.
  • the smart home device may be a device such as a smart speaker or a smart TV.
  • the smart home device 110 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, an RJ45 communication processing module 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:
  • the processor 102 can be used to read and execute computer readable instructions.
  • the processor 102 may mainly include a controller, an arithmetic unit, and a register.
  • the controller is mainly responsible for instruction decoding, and sends out control signals for the operation corresponding to the instruction.
  • the arithmetic unit is mainly responsible for performing fixed-point or floating-point arithmetic operations, shift operations and logical operations, etc., and can also perform address operations and conversions.
  • the register is mainly responsible for storing the register operands and intermediate operation results temporarily stored during the execution of the instruction.
  • the hardware architecture of the processor 102 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, etc.
  • ASIC application specific integrated circuit
  • the processor 102 may be used to parse the signal received by the wireless communication processing module 104, for example, a request to modify setting information sent by the smart terminal 100, the recognized audio file sent by the music recognition server, and instructions Indication information of the starting playback position, etc.
  • the processor 102 may be configured to perform corresponding processing operations according to the analysis result, such as modifying the setting information of the smart home device 110 according to the request, or playing the recognized audio file from the playback position, and so on.
  • the processor 102 may also be used to process sounds in the external environment collected by the smart home device 110. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.
  • the processor 102 may also be used to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the smart terminal 100 for feedback of the recognition status (such as successful recognition, recognition failure, etc.).
  • the memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions.
  • the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 103 may store an operating system, such as an embedded operating system such as DuerOS and AliGenie.
  • the memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.
  • the wireless communication processing module 104 may include one or more of the Bluetooth (BT) communication processing module 104A and the WLAN communication processing module 104B.
  • BT Bluetooth
  • one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover smart home device 110 and establish a wireless communication connection with other devices through one or more of Bluetooth or WLAN wireless communication Technology to communicate with other devices.
  • other devices such as smart terminal 100
  • one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the smart home device 110, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.
  • signals such as broadcast Bluetooth signals and beacon signals
  • the wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown).
  • the cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.
  • the power switch 105 can be used to control the power supply to the smart home device 110.
  • the RJ45 communication processing module 106 may be used to process data received or sent through the RJ45 interface.
  • RJ45 interface is mainly used to connect modem modem.
  • the USB interface 107 can be used to communicate with other devices (for example, a computer, a notebook computer, etc.) through a data cable.
  • the audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals.
  • the smart home device 110 may receive a voice command input by the user through the audio input module 108, and in response to the voice command, the smart home device performs an operation corresponding to the voice command.
  • the audio output module 109 is used to convert audio electrical signals into sound signals, and the smart home device 100 can play the sound signals through the audio output module 109.
  • the smart home device 110 may further include a display screen 110 (not shown), and the display screen 110 may be used to display images, videos, and the like.
  • the display screen 110 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.
  • FIG. 1C does not constitute a specific limitation on the smart home device 110.
  • the smart home device 110 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • FIG. 1D exemplarily shows a schematic structural diagram of a vehicle-mounted device 120 provided in the present application.
  • the vehicle-mounted device may be a vehicle-mounted speaker or a vehicle-mounted computer.
  • the vehicle-mounted device 120 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, a display screen 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:
  • the processor 102 can be used to read and execute computer readable instructions.
  • the processor 102 may mainly include a controller, an arithmetic unit, and a register.
  • the controller is mainly responsible for instruction decoding, and sends out control signals for the operation corresponding to the instruction.
  • the arithmetic unit is mainly responsible for performing fixed-point or floating-point arithmetic operations, shift operations and logical operations, etc., and can also perform address operations and conversions.
  • the register is mainly responsible for storing the register operands and intermediate operation results temporarily stored during the execution of the instruction.
  • the hardware architecture of the processor 102 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, etc.
  • ASIC application specific integrated circuit
  • the processor 102 may be used to parse the signal received by the wireless communication processing module 104, for example, a request to modify setting information sent by the smart terminal 100, the recognized audio file sent by the music recognition server, and instructions Indication information of the starting playback position, etc.
  • the processor 102 may be configured to perform corresponding processing operations according to the analysis result, such as modifying the setting information of the smart home device 110 according to the request, or playing the recognized audio file from the playback position, and so on.
  • the processor 102 may also be used to process the sound in the external environment collected by the vehicle-mounted device 120. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.
  • the processor 102 may also be used to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the smart terminal 100 for feedback of the recognition status (such as successful recognition, recognition failure, etc.).
  • the memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions.
  • the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 103 may store an operating system, such as embedded operating systems such as uCLinux, GENIVI, and ecos.
  • the memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.
  • the wireless communication processing module 104 may include one or more of the Bluetooth (BT) communication processing module 104A and the WLAN communication processing module 104B.
  • BT Bluetooth
  • one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover the vehicle-mounted device 120 and establish a wireless communication connection with other devices through one or more wireless communication technologies in Bluetooth or WLAN Communicate with other devices.
  • other devices such as smart terminal 100
  • one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the in-vehicle device 120, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.
  • signals such as broadcast Bluetooth signals and beacon signals
  • the wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown).
  • the cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.
  • the power switch 105 can be used to control the power supply to the vehicle-mounted device 120 from the power source.
  • the display screen 110 can be used to display images, videos, etc.
  • the display screen 110 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.
  • the USB interface 107 can be used to communicate with other devices such as a display, the smart terminal 100 or an audio external device through a data line.
  • the audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals.
  • the in-vehicle device 120 may receive a voice instruction input by the user through the audio input module 108, and in response to the voice instruction, the in-vehicle device performs an operation corresponding to the voice instruction.
  • the audio output module 109 is used to convert audio electrical signals into sound signals, and the vehicle-mounted device 120 can play the sound signals through the audio output module 109.
  • the in-vehicle device 120 may also include a serial interface such as an RS-232 interface.
  • the serial interface can be connected to other devices, such as speakers and other audio playback devices, so that the audio playback devices cooperate to play the recognized audio files.
  • the structure illustrated in FIG. 1C does not constitute a specific limitation on the in-vehicle device 120.
  • the in-vehicle device 120 may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the following describes an exemplary user interface on the smart terminal 100 for displaying application menus.
  • FIG. 2 exemplarily shows a user interface 21 of the smart terminal 100 for displaying an application menu.
  • the user interface 21 may include: a status bar 201, a tray 217 with icons of commonly used applications, a calendar widget 213, a weather widget 215, and other application icons. among them:
  • the status bar 201 may include: one or more signal strength indicators 203 for mobile communication signals (also called cellular signals), one or more signal strength indicators 205 for wireless fidelity (Wi-Fi) signals , Battery status indicator 209, time indicator 211.
  • the calendar widget 213 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.
  • the weather widget 215 can be used to indicate the type of weather, such as cloudy to clear, light rain, etc., and can also be used to indicate information such as temperature.
  • the tray 217 with icons of commonly used application programs can display: a phone icon 219, a contact icon 221, a short message icon 223, and a camera icon 225.
  • the user interface 21 may also include a page indicator 229.
  • Other application icons can be distributed on other pages.
  • the page indicator 229 can be used to indicate the number of pages and which page the user is currently browsing. For example, the page indicator 229 displays 3 small dots, and the second dot is Black, the other two small dots are white, indicating that the current mobile phone includes 3 pages, and the user is browsing the second page.
  • users can swipe left and right on the current page to browse application icons on other pages.
  • the user interface 21 exemplarily shown in FIG. 2 may be a user interface in the home screen.
  • the smart terminal 100 may also include a home screen key.
  • the main screen key can be a physical key or a virtual key.
  • the home screen key can be used to receive a user's instruction, and in response to the user's instruction, return the currently displayed UI to the main interface, so that the user can view the home screen at any time.
  • the above instruction can be an operation instruction for the user to press the home screen key once, or an operation instruction for the user to press the home screen key twice in a short period of time, or the user long press the home screen key within a predetermined time Operation instructions.
  • the home screen key can also be integrated with a fingerprint recognizer, so that when the home screen key is pressed, fingerprints are collected and recognized.
  • FIG. 2 only exemplarily shows the user interface on the smart terminal 100, and should not constitute a limitation to the embodiment of the present application.
  • the smart terminal 100 can play the recognized audio file following the user's humming progress.
  • the smart terminal 100 can display the recognition result through the display screen 194.
  • the recognition result may be displayed when the smart terminal 100 is in use, or may be displayed when the smart terminal 100 is in a locked state.
  • the display screen 194 can display the desktop and the application program interface. , Pull down notification bar, negative one screen and other user interfaces.
  • the smart terminal 100 When the smart terminal 100 is in the locked state, it means that the screen of the smart terminal is locked. In most cases, the smart terminal 100 needs to receive a password input by the user or verify other unlocking methods (for example, fingerprint unlocking, face unlocking, etc.) before unlocking. Generally, the user can turn off the screen of the smart terminal 100 and enter the locked state by clicking the power button of the smart terminal 100 or clicking the virtual control of "lock screen".
  • the lock screen interface refers to a user interface displayed by the smart terminal 100 after the smart terminal 100 enters the locked state and before the lock is unlocked. While the smart terminal 100 is in the locked state, the smart terminal may display a lock screen interface, or it may be in the off-screen (or referred to as black screen) state.
  • FIG. 3A exemplarily shows the user interface 31 displaying the recognition result in the use interface of the application program of the smart terminal 100.
  • the application program that performs the humming recognition operation and the application program in use may be the same application program or different application programs, which is not limited in the embodiment of the present application.
  • this application does not limit the application in use, which can be WeChat, QQ, Weibo, mailbox and other applications.
  • the chat interface during the use of WeChat is taken as an example in FIG. 3A.
  • the user interface 31 may include: a display area 318, an input area 319, and a notification window 315. among them:
  • the display area 318 may be used to display chat content, and the chat content may include text/voice communication content between the user using the smart terminal 100 and the user of another social account.
  • the input area 319 can be used to input chat content.
  • the input area 319 can include a first control 319A, a second control 319B, a third control 319C, and a fourth control 319D.
  • the first control 319A is used to receive a user's operation.
  • the smart terminal 100 displays a voice input button.
  • the user can input voice information by long pressing the voice input button.
  • the voice input button receives the user's operation, the smart terminal 100 needs to collect the voice information input by the user.
  • the audio input module will be occupied by the voice input service of the social application, and the smart terminal 100 will not execute this application.
  • the humming recognition operation provided by the embodiment.
  • the second control 319B is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays a keyboard/handwriting pad. Generally, the smart terminal 100 can receive text information input by the user through the keyboard/handwriting pad.
  • the third control 319C is used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a plurality of emoticons/motion pictures for the user to select.
  • the fourth control 319D is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays multiple input type selection boxes, such as pictures, shooting, documents, red envelopes, video calls, etc., for the user to choose .
  • the smart terminal 100 needs to collect the audio and video information input by the user, and the audio input module and/or audio output module will be used by the social application.
  • the video input service is occupied, and the smart terminal 100 does not perform the humming recognition operation provided in the embodiment of the present application.
  • the notification window 315 is used to display the recognition result of the music segment hummed by the user.
  • the notification window 315 may include: a humming recognition icon 316, a first display area 314, a playback control 310, and a control 312.
  • the humming recognition icon 316 is used to indicate the source of the notification window 315, in order to facilitate the user to quickly understand that the notification window 315 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 316 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.
  • the first display area 314 can be used to display the identification information of the recognized audio file, and can provide the user with more information about the recognized audio file.
  • the identification information of the audio file may be the song name, lyrics, artist name, album name, album cover picture, artist poster, etc. of the audio file.
  • the first display area 314 contains the name of the song "Across the Ocean to See You").
  • the first display area 314 may also include operation instruction information (for example, "click to stop playback" contained in the first display area 314), which may provide the user with operation reminders and improve the convenience of user operations.
  • the first display area 314 may also contain singer information or lyrics information of the currently played audio file.
  • the smart terminal 100 may also display the lyrics information of the currently played audio file in the form of a floating window.
  • the floating window is a movable window displayed floating in the display interface of the smart terminal 100.
  • the play control 310 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 pauses or continues to play the audio file. Specifically, after the smart terminal 100 recognizes the audio file corresponding to the music clip hummed by the user, it will play the audio file following the user’s singing progress. At this time, the playback control 310 displays the first state, which indicates the audio The file is being played. Optionally, during the playback of the recognized audio file, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application.
  • the smart terminal 100 When the play control 310 is displayed in the first state, if the play control 310 receives a user's operation, the smart terminal 100 pauses playing the audio file and displays the play control 310 in the second state, which indicates The audio file is paused. It is understandable that when the player 310 is displayed in the second state, if the player control 310 receives a user's operation, the smart terminal 100 continues to play the audio file and displays the player control 310 in the first state.
  • the control 312 may be used to receive a user's operation.
  • the smart terminal 100 pauses playing the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal.
  • the smart terminal 100 may display a prompt message (for example, "recognizing"), which is used to indicate that the smart terminal 100 is reacquiring the sound. Signal for humming recognition.
  • the smart terminal 100 pauses playing the audio file and jumps to the user interface 35 for displaying humming recognition.
  • the user interface 35 will be specifically introduced later, and the description will not be expanded here.
  • the notification window 315 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, and so on. Or, when the notification window 315 receives the user's upward sliding operation, the smart terminal 100 no longer displays the notification window 315 in the user interface 31 in response to the operation.
  • the notification window 315 may also be displayed in a drop-down notification bar.
  • the smart terminal 100 may display a pull-down notification bar 318 on the user interface 21, and the pull-down notification bar 318 includes a notification window 315.
  • the control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.).
  • the control window 313 will be described in detail in the subsequent introduction to the setting interface of humming recognition, and will not be specifically expanded here.
  • an icon 311 for humming recognition is displayed in the status bar 201.
  • the status bar 201 may be included in multiple display interfaces of the smart terminal 100. In this way, it is convenient for the user to know the on state of the humming recognition function through multiple display interfaces of the smart terminal 100.
  • Fig. 3C exemplarily shows the user interface 32 displayed when the smart terminal 100 is in the locked state.
  • the user interface 32 may also be referred to as a lock screen interface.
  • the user interface 32 includes a status bar 201, a calendar widget 213, and a lock screen wallpaper 523. among them:
  • the status bar 201 can refer to the description in FIG. 2, which will not be repeated here.
  • the status bar 201 here includes a humming recognition icon 311 and a lock icon 323.
  • the humming recognition icon 311 is used to indicate that the humming recognition function is on, and the lock icon 323 is used to indicate that the smart terminal 100 is in a locked state.
  • the calendar widget 213 can refer to the description in FIG. 2, which will not be repeated here.
  • the user interface 32 may also include a weather widget 215.
  • the lock screen wallpaper 523 may be a picture set by the user, or a picture preset by the smart terminal 100, or a picture downloaded by the smart terminal 100 from the network.
  • FIG. 3D exemplarily shows yet another user interface 32 displaying the recognition result.
  • the smart terminal 100 when the smart terminal is in the locked state, when the smart terminal 100 recognizes the audio file of the music clip hummed by the user, the smart terminal 100 displays a notification window 324 above the user interface 32, and the notification window 324 can be Including: humming recognition icon 316, second display area 322, playback control 310, control 312, and volume control 328.
  • the second display area 322 has the same function as the first display area 314 in FIG. 3A, and both can display the identification information of the recognized audio file.
  • the second display area 322 here not only includes the name of the audio file "Across the Ocean to See You", but also includes the singer of the audio file “Li Zongsheng”, and the lyrics of the currently playing audio file.
  • the message "A strange city, in a familiar corner", in which the bold part of the lyrics message "No matter what you will face” is the lyrics part of the user's current singing. It is understandable that the lyric information will change with the playing progress of the audio file, so that the lyric information and the playing of the audio file are kept synchronized.
  • the tag of the second audio file is included in the user tag of the first user, and the meaning of the first user can refer to the above introduction.
  • Different smart terminals may have different recognition results for the humming section of the same song. For example, when user 1 is singing "Across the Ocean to See You", the audio file recognized by the smart terminal of user 1 may be Li Zongsheng Singing version; while user 2 is singing "Across the Ocean to See You", the audio file recognized by user 2's smart terminal may be the version sung by Jingru Liang. It is understandable that the different recognition results are due to the different user tags of user 1 and user 2.
  • the volume control 328 can be used to adjust the volume of the playing audio file.
  • the volume control 328 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 adjusts the volume of the played audio file.
  • the smart terminal 100 reduces the volume of playing the audio file; when the received user operation is the sliding to the right, the smart terminal 100 increases the volume of playing the audio file.
  • the ratio of the distance from the audio file control 328 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.
  • the smart terminal 100 will make the volume of the played audio file change from the time when it starts to play the audio file to the preset time (for example, the 5th second, the 6th second, etc.) Gradually increase from low to high. For example, gradually increasing from the minimum value of the volume to the volume value set by the user, or gradually increasing from 30% of the volume value set by the user to 100% of the volume value set by the user, there may be other volume levels
  • the increase mode is not limited in the embodiment of this application.
  • the volume value set by the user is the volume value indicated by the audio file control 328.
  • the volume value adjusted by the user is the volume value set by the user, which is the most recent one.
  • the notification window 324 disappears after the recognized audio file is played.
  • the smart terminal 100 may also display the content of the notification window 324 above the user interface 32 in the form of a user interface, and the user interface may be as shown in the user interface 33 of FIG. 3E.
  • the user interface 33 contains the content contained in the notification window 324, for example, the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328.
  • the user interface 32 may also display a background picture.
  • the background picture may be a poster of a song singer, a picture of an album included in the recognized audio file, and so on.
  • the smart terminal 100 displays the user interface 32 below the user interface 33 (that is, the lock screen interface).
  • FIG. 3F exemplarily shows yet another user interface 34 displaying the recognition result.
  • a user input operation for the notification window 315 for example, a click operation, a long press operation, a press operation, etc.
  • the smart terminal 100 displays the user interface 34.
  • the smart terminal 100 receives an unlocking operation input by the user (for example, fingerprint unlocking, password unlocking, face unlocking, etc.). In the case of successful unlocking, the smart terminal 100 The terminal 100 performs an operation of jumping from the user interface 32 to the user interface 34.
  • the user interface 34 includes: a humming recognition icon 316, a second display area 322, a playback control 310, a control 312, a volume control 328, a control 330, a control 332, and a control 334, of which:
  • the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328 can refer to the above description, and will not be repeated here.
  • the control 330 can be used to collect the recognized audio files.
  • the control 330 can receive a user's operation.
  • the smart terminal 100 adds the identified audio file identifier to a preset favorite (or a folder called "favorite music", which is not included in this application). Restrictions), it is convenient for the user to find or play the identified audio file next time.
  • the control 332 can be used to download the recognized audio file.
  • the control 332 can receive a user's operation, and in response to the user's operation, the smart terminal 100 downloads the audio resource of the identified audio file from the network.
  • the smart terminal 100 displays a selection box that contains sound quality options such as "standard quality", "high quality", and "lossless quality".
  • the selection box is used to receive a user's selection operation of an option, and in response to the user's selection operation of an option, the smart terminal 100 downloads audio resources of sound quality corresponding to the option.
  • the control 334 can be used to share the recognized audio file.
  • the control 334 can receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a sharing box, which contains multiple sharing objects, such as QQ, WeChat, Weibo, Twitter, and so on.
  • the sharing box is used to receive a user's selection operation of a sharing object.
  • the smart terminal 100 sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.
  • Fig. 3G exemplarily shows a user interface 35 for humming recognition.
  • the smart terminal 100 displays User interface 35 for humming recognition.
  • the user interface 35 includes a humming recognition icon 316, an indicator 350, a control 352, and a control 354. among them:
  • the humming recognition icon 316 can refer to the above description, and will not be repeated here.
  • the indicator 350 may indicate the time information of the music segment that the user has hummed.
  • the time information changes as the time the user hums the audio file increases, and is synchronized with the time length of the user humming.
  • the indicator 350 can also instruct the user to enter the operation prompt information of the voice signal (for example, the "more accurate recognition of humming a few sentences" contained in the indicator 350), and can provide the user with operation reminders, so as to improve the recognition of humming accuracy.
  • the operation instruction information may also be other content. For example, when it is detected that the volume of the user's voice is low, the operation instruction information such as "increase the volume (or sound near the device) for more accurate recognition" may be displayed.
  • the control 352 may be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C. When detecting that the user's finger leaves the display screen 194, the smart terminal 100 performs humming recognition according to the collected sound signal. Optionally, when the smart terminal 100 receives the recognized audio file from the music recognition server, the smart terminal 100 may display the user interface 34 for displaying the recognition result.
  • a user's operation for example, a long press operation
  • the smart terminal 100 collects the user's input sound signal through the microphone 170C.
  • the smart terminal 100 performs humming recognition according to the collected sound signal.
  • the smart terminal 100 may display the user interface 34 for displaying the recognition result.
  • the above introduces some user interfaces for displaying recognition results and performing humming recognition in the smart terminal 100.
  • the user before the smart terminal 100 can implement the humming recognition function, the user can use the smart terminal 100
  • the setting interface of the humming recognition function can be turned on or off.
  • the following will introduce some setting interfaces of humming recognition.
  • FIG. 4A exemplarily shows a user interface 41 for setting the humming recognition function.
  • the smart terminal 100 can display the pull-down notification bar 401 on the user interface 41, and the pull-down notification
  • the column 401 includes a control window 313, in which:
  • the control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.).
  • the switch control 317 has two display states. The first display state (also known as "ON” state) indicates that the humming recognition function is turned on, and the second display state (also known as "OFF” state) Indicates that the humming recognition function is off.
  • the smart terminal 100 When the display state of the switch control 317 is the second display state, when the smart terminal 100 detects an operation on the switch control 317 in the control window 318 (such as a touch operation on the switch control 317), it responds to the In operation, the smart terminal 100 can turn on "humming recognition” and adjust the display state of the switch control 317 to the first display state.
  • the display state of the switch control 317 is the first display state
  • the smart terminal 100 detects an operation on the switch control 317 in the control window 318, in response to the operation, the smart terminal 100 can turn off the "humming recognition” ", and adjust the display state of the switch control 317 to the second display state. In this way, it is convenient for users to turn on/off the humming recognition function.
  • an icon 311 for humming recognition is displayed in the status bar 201.
  • the status bar 201 may be included in multiple display interfaces of the smart terminal 100. In this way, it is convenient for the user to know the on state of the humming recognition function through multiple display interfaces of the smart terminal 100.
  • FIG. 4B exemplarily shows another user interface 42 for setting the humming recognition function.
  • the user interface 42 includes a display area 410, which is used to display multiple settable options, such as "airplane mode", “Wi-Fi”, “Bluetooth” and so on.
  • the display area 410 also includes multiple switch controls and multiple jump controls.
  • the switch control 412 and the jump control 416 are used to introduce the functions of the two controls. Among them:
  • the switch control 412 can be used to receive a user's operation (for example, click operation, sliding operation, etc.), and in response to the user's operation, the smart terminal 100 changes the function/service/application corresponding to the switch control 412 (ie, humming Recognition function). For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and turns off the humming recognition function.
  • a user's operation for example, click operation, sliding operation, etc.
  • the jump control 416 can be used to receive a user's operation.
  • the smart terminal 100 jumps to the setting interface of the function/service/application (ie, do not disturb mode) corresponding to the jump control 416, which needs to be explained Yes
  • the device interface can include multiple setting options for the "Do Not Disturb Mode" function, for example, the adjustment of the opening state of the Do Not Disturb mode, the setting of the opening time of the Do Not Disturb mode, and the setting of automatic reply in the Do Not Disturb mode. and many more.
  • 5A-5C exemplarily show some user interfaces for setting the humming recognition function.
  • the user interface 51 includes a display area 522, which is similar to the display area 410 included in the user interface 41.
  • the display area 522 is used to display multiple settable options, such as "airplane mode", “Wi-Fi “Bluetooth” and so on.
  • the control corresponding to the humming recognition function is a jump control 520, which can be used to jump the user interface to the "humming recognition" setting interface.
  • the jump control 520 receives a user's operation (for example, a click operation), and in response to the user's operation, the smart terminal 100 jumps from the user interface 51 to the "humming recognition" setting Interface (ie, user interface 52).
  • the user interface 52 includes a return key 530, a switch control 532, text information 534, a switch control 536, a control 538, a control 540, a control 552, a plurality of jump controls (for example, a jump control 554), and a switch Control 556. among them:
  • the return key 530 can be used to receive a user's operation.
  • the smart terminal 100 In response to the user's operation, the smart terminal 100 returns to the previous interface of the current page, that is, the user interface 41 shown in FIG. 5A.
  • the previous interface of an interface is determined when the application program is set.
  • the function of the switch control 532 can refer to the function of the switch control 412 in FIG. 4B, which will not be repeated here.
  • the text information 534 can be used to describe the authority obtained by the smart terminal 100 after the humming recognition function is turned on, so that the user can determine whether to grant the smart terminal 100 the authority for humming recognition according to the description.
  • the expression of the text information 534 can be changed as required, and there is no limitation here.
  • the switch control 536 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 performs an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 536 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation does not have an activation time set, and the humming recognition operation can always be running. Optionally, in this case, the smart terminal does not display the control 538 and the control 540. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal changes the display state of the switch control 536 to "ON", and displays the display control 538 and the control 540.
  • the control 538 is used to receive the activation time of the humming recognition operation input by the user.
  • the smart terminal 100 performs the humming recognition operation provided in the embodiment of the present application after the activation time; the control 540 uses At the end time of receiving the humming recognition operation input by the user, in response to the user's operation, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application after the end time. It should be noted that when the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application, the user can still actively trigger the humming recognition in the manner in the prior art.
  • the control 522 can be used to add voiceprint information that can be used to enable humming recognition.
  • Jump 554 can be used to receive a user's operation.
  • the smart terminal 100 jumps from the user interface 52 to the setting interface of the voiceprint 1.
  • the device interface of the voiceprint 1 may include naming and deleting functions, and so on.
  • the switch control 556 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. For example, if the display state of the switch control 556 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation cannot be used when the smart terminal 100 is locked, that is, the smart terminal 100 does not perform humming recognition when the smart terminal 100 is locked. operating. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 556 to "ON" and adjusts the usable state of the humming recognition function, that is, in the smart terminal 100 The humming recognition operation is also run when locked.
  • the switch control 557 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. When the switch control 557 is in the on state, the smart terminal can obtain its own position during the operation of the humming recognition function. The smart terminal 100 determines whether to stop collecting sounds in the external environment through the audio input module by determining whether the location where it is located is a preset location, or to determine whether to play the recognized audio file from the initial playback position. This determination method will be introduced in the follow-up content, and will not be specifically expanded here.
  • the display screen 194 receives a user's operation (for example, an upward sliding operation), and in response to the user's operation, the smart terminal 100 displays the setting content of "humming recognition" under the switch control 556 .
  • the user interface 52 also includes content for setting the access authority of the humming recognition function.
  • the user interface 52 also includes a jump control 558 and a plurality of switch controls (such as switch controls 560). among them:
  • the jump control 558 can be used to set the type of wireless data that the humming recognition function allows to access, such as off, WLAN, WLAN and cellular mobile data.
  • the switch control 560 can be used to set system functions (ie, location services) that the humming recognition function allows to access. For example, if the display state of the switch control 560 is “OFF” before receiving the user's operation, it indicates that when the humming recognition function is running, the position information of the smart terminal 100 cannot be obtained. After receiving the user's operation on the switch control 560, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 560 to “ON” and allows the humming recognition function to obtain the position information of the smart terminal 100. Similarly, other system functions for humming to identify access to western medicine can also be set by referring to the above method.
  • system functions ie, location services
  • Figures 5D-5F exemplarily show some user interfaces for setting access rights for the humming recognition function.
  • the smart terminal 100 in response to the user's operation of the jump control 524, the smart terminal 100 jumps from the user interface 51 to the user interface 53, and the user interface 53 is used to display multiple system functions, for example, Bluetooth , Location service, microphone, gallery, etc.
  • system functions for example, Bluetooth , Location service, microphone, gallery, etc.
  • one system service corresponds to a jump control (for example, the system service "microphone" corresponds to the jump control 562).
  • the smart terminal 100 in response to the user's operation of the jump control 562, the smart terminal 100 jumps from the user interface 53 to the user interface 54, and the user interface 54 is used to display multiple applications that require access to the microphone .
  • the user can control the permission of the application to access the microphone through the switch control corresponding to the application. For example, if the display state of the switch control 572 is "OFF" before receiving the user's operation, it indicates that the humming recognition function cannot access the microphone. After receiving the user's operation on the switch control 572, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 572 to "ON" and allows the humming recognition function to access the microphone. Similarly, the way other applications access system functions can also refer to the above way.
  • FIG. 5G exemplarily shows a user interface 55 for inputting voiceprint information.
  • the smart terminal 100 in response to the user's operation of the control 552 in the user interface 52, the smart terminal 100 jumps from the user interface 52 to the user interface 55 to enter the voiceprint information that the user wants to add.
  • an indicator 570, text information 572, and a control 574 are included. among them:
  • the indicator 570 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.
  • the text information 572 is the text content that the user needs to read.
  • the smart terminal can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
  • the smart terminal can also instruct the user to sing several pieces of music to enter the voiceprint information.
  • the content of the indicator 570 may be “please press and hold the button and sing the following song fragments to record voiceprint information”, and correspondingly, the text information 572 is a piece of lyrics.
  • the control 574 can be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C.
  • the smart terminal 100 stores the sound signal collected during this period, and extracts the voiceprint information of the collected sound signal, and then the extracted sound Pattern information is stored.
  • the humming recognition operation provided in the embodiments of this application can also be applied to smart home devices (for example, smart speakers, televisions, etc.) and vehicle-mounted devices (for example, vehicle-mounted speakers), and the smart home devices or vehicle-mounted devices can execute this application
  • the humming recognition operation provided by the embodiment In a possible situation, the smart home device or vehicle-mounted device is not equipped with a display screen (for example, smart speakers, vehicle audio), and the user can recognize the humming on the smart home device or vehicle-mounted device through the smart terminal 100 Function to be set.
  • 6A-6B exemplarily show some user interfaces for setting the humming recognition function.
  • these user interfaces may be interfaces in smart home applications.
  • the user interface 61 includes a display area 60.
  • the display area 60 includes instruction information 600, reminder information 602, selection box 610, selection box 614, control 608, and display area 606, in which:
  • the instruction information 600 may be used to indicate the family information set by the user, and may also be text information such as "Annie's Home” and "Jack's Home”.
  • the reminder information 602 may be used to remind the user of some abnormal situations that need to be paid attention to.
  • the smart terminal 100 may generate corresponding reminder information according to the status of each smart home device. For example, if the anti-theft door has not been closed for a long time, the smart terminal 100 may display the reminding message 602. Or, if the remaining amount of the filter element of the air purifier is less than the preset value, the smart terminal 100 may display a reminder message "The filter element of the air purifier needs to be replaced", and so on.
  • the selection box 610 can display multiple optional home statuses for the user to choose, such as "going home”, “leaving home”, “sleeping”, “reading” and “more”.
  • each furniture device may have a preset activation state. For example, if the user performs a selection operation on the selection box of "go home”, in response to the selection operation, the smart terminal 100 controls the chandelier and the air conditioner in the living room to turn on.
  • the user can set the activation state of each home device in each home state, and can also customize more home states.
  • the selection box 604 can display multiple home spaces for the user to select, such as "all", “living room”, “master bedroom”, “second bedroom”, and so on.
  • the selection box 604 can receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the operation, the smart terminal 100 displays in the display box 606 the smart home devices corresponding to the selected home space. For example, if the selection box 604B receives the user's click operation, the smart terminal 100 displays the smart home equipment contained in the "living room" in the display box 606
  • the control 608 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays an interface for adding a smart home device. The user can enter the information of the new smart home device through the add interface.
  • the display area 606 may be used to display information of one or more smart home devices, and the information may include basic information such as pictures, names, and opening states.
  • the display area 606 may also be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays the setting interface of the smart home device corresponding to the operation.
  • the display area 606 receives a user's click operation, and in response to the user's click operation, the smart terminal 100 jumps from the user interface 61 to the user interface 62.
  • the user interface 62 includes a return key 620, a switch control 622, a volume control 626, a switch control 628, a control 630, a switch control 620, a switch control 634, a control 636, a control 638, a control 640, and a jump control 642. among them:
  • the return key 620 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 returns to the previous page of the current page (ie, the user interface 61).
  • the switch control 622 may be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the on or off state of the smart speaker.
  • the electronic device can control the smart speaker by sending a control instruction to instruct the smart speaker to perform an operation corresponding to the control instruction.
  • the volume control control 626 can be used to adjust the volume of the playing audio file.
  • the volume control 626 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 controls the smart speaker to adjust the volume of the audio file played.
  • the smart terminal 100 controls the smart speaker to reduce the volume of the audio file; when the received user operation is a rightward sliding, the smart terminal 100 controls the smart speaker to increase The volume of the audio file being played.
  • the ratio of the distance from the audio file control 626 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the smart speaker.
  • the switch control 628 may be used to receive an operation input by a user, and in response to the user's operation, the smart terminal 100 controls the smart speaker to turn on the sound effect optimization function or turn off the sound effect optimization function.
  • the control 630 may be used to receive the user's input time operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to set the closing time as the time input by the user.
  • the switch control 620 can be used to receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the user's operation, the smart terminal 100 controls the smart speaker to change the on state of the humming recognition function. For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function of the smart speaker is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and controls the smart speaker to turn off the humming recognition function.
  • a user's operation for example, a click operation, a sliding operation, etc.
  • the switch control 634 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to perform an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 634 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation of the smart speaker has not set an activation time, and the humming recognition operation may be running all the time. Optionally, in this case, the smart terminal 100 does not display the control 636 and the control 638. After receiving the user's operation on the switch control 634, in response to the user's operation, the electronic device changes the display state of the switch control 634 to “ON” and displays the display control 636 and the control 638.
  • control 636 is used to receive the activation time of the humming recognition operation input by the user.
  • the smart terminal 100 controls the smart speaker to execute, and sets the activation time of the humming recognition to the activation time input by the user;
  • the control 638 is used to receive the end time of the humming recognition operation input by the user.
  • the smart terminal 100 controls the smart speaker to execute and sets the end time of the humming recognition operation as the end time input by the user.
  • the smart speaker itself may not be able to set the activation time of the humming recognition function.
  • the smart terminal 100 in response to the user's operation on the control 636, the smart terminal 100 sends the smart speaker to the smart speaker at the activation time.
  • the speaker sends an instruction to enable the humming recognition function to control the smart speaker to turn on the humming recognition function; in response to the user's operation on the control 638, the smart terminal 100 sends an instruction to stop the humming recognition function to the smart speaker at the end time to control The smart speaker stops the humming recognition function.
  • the control 640 can be used to add voiceprint information that can be used to enable humming recognition.
  • the jump control 554 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 jumps from the user interface 62 to the setting interface of the voiceprint 1.
  • the device interface of the voiceprint 1 may include naming and deleting functions, and so on.
  • the voiceprint information used for matching in the smart speaker is the voiceprint information stored in the smart terminal 100.
  • the voiceprint information extracted from the voice signal is sent to a smart speaker capable of humming recognition for storage.
  • the smart speaker with humming recognition function can use the voiceprint information stored in the electronic device to match the sound signal.
  • the user interface for entering voiceprint information can refer to Figure 5G.
  • the voiceprint information used for matching in the smart speaker is the voiceprint information extracted from the user's voice signal re-entered by the smart speaker.
  • Fig. 6C exemplarily shows yet another user interface 63 for inputting voiceprint information.
  • the smart terminal 100 jumps from the user interface 62 to the user interface 63.
  • the user interface 63 includes:
  • the instruction information 650 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.
  • the text information 652 is text content that the user needs to read aloud.
  • the electronic device may display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
  • the electronic device can also instruct the user to sing several pieces of music to record the voiceprint information.
  • the content of the indicator 650 may be "please approach the smart speaker, press and hold the play button, and sing the following song fragments to record voiceprint information.”
  • the text information 652 is a piece of lyrics.
  • the play button refers to the play button of the smart speaker, and the play button can be a physical button or a virtual button.
  • smart speakers other smart home devices (not limited to smart home devices without a display screen, but also smart home devices with a display screen) can recognize the humming function in the manner described above Make settings.
  • the function setting of the in-vehicle device can also be performed on the smart terminal 100. In this case, the above-mentioned method can also be referred to.
  • the above has introduced some user interfaces of the smart terminal 100 for setting the humming recognition function on the smart home device.
  • a display screen is provided on the smart home device or the vehicle-mounted device, and its own humming recognition function can be set.
  • the following introduces the setting interface for humming recognition on vehicle equipment.
  • Figures 7A-7B exemplarily show a user interface for displaying and setting the humming recognition function on the vehicle-mounted device.
  • FIG. 7A exemplarily shows a user interface 71 for displaying an application menu on the vehicle-mounted device.
  • the user interface 71 may also be referred to as the main menu.
  • the user interface 71 may include: a calendar widget 700, a status bar 702, a display area 708, and a control 706, among which:
  • the calendar widget 700 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.
  • the status bar 201 may include: a Bluetooth indicator 704, one or more signal strength indicators 705 of a wireless fidelity (wireless fidelity, Wi-Fi) signal, and a time indicator 703.
  • a Bluetooth indicator 704 one or more signal strength indicators 705 of a wireless fidelity (wireless fidelity, Wi-Fi) signal
  • the display area 708 can be used to display multiple application icons, such as navigation icon 708A, phone icon 708B, music icon 708C, video icon 708D, gallery icon 708E, radio icon 708F, locomotive recorder icon 708G, set icon 708H.
  • application icons such as navigation icon 708A, phone icon 708B, music icon 708C, video icon 708D, gallery icon 708E, radio icon 708F, locomotive recorder icon 708G, set icon 708H.
  • the control 706 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps back from the current interface to the user interface 71 (that is, the main menu interface).
  • the user interface 72 is a user interface for displaying the setting menu.
  • the user interface includes multiple setting options, for example, "system setting 720", “user setting 722", “sound effect setting 724", "network setting 726", and "time setting 728". ,and many more.
  • the content displayed in the display area 716 is the setting content corresponding to the setting option.
  • “system setting 720" may be a setting option selected by default.
  • the content displayed in the display area 716 is the setting content corresponding to the system setting.
  • the display area 716 displays the setting content corresponding to the one setting option.
  • the display area 716 can receive a user's operation (for example, an upward or downward sliding operation), and in response to the operation, the display area 716 can display more settings.
  • the content displayed in the display area 716 is the setting content of humming recognition.
  • the display area 716 may include a switch control 710, a control 712, and a control 714.
  • the switch control 710 can be used to turn on or turn off the humming recognition function.
  • the control 712 can be used to add voiceprint information that can be used to enable humming recognition.
  • the control 712 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps to a user interface for inputting voiceprint information, for example, the user interface user interface 73 shown as an example.
  • the user interface 73 will be described in more detail later, which is not specifically expanded here.
  • Jump 714 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps from the user interface 72 to the setting interface of voiceprint 1.
  • the device interface of the voiceprint 1 may include naming and deleting functions, and so on.
  • FIG. 7C exemplarily shows a user interface 73 for inputting voiceprint information.
  • the in-vehicle device in response to the user's operation of the control 712 in the user interface 72, the in-vehicle device jumps from the user interface 72 to the user interface 73 to enter the voiceprint information that the user wants to add.
  • instruction information 730 and text information 572 are included. among them:
  • the instruction information 730 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.
  • the play button is the play button of the speaker. In a possible situation, the play button of the speaker is a physical button around the display screen of the vehicle device.
  • the text information 732 is the text content that the user needs to read aloud.
  • the in-vehicle device can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
  • the vehicle-mounted device may also instruct the user to sing several pieces of music to record the voiceprint information.
  • the content of the indicator 730 may be "please approach the speaker, press and hold the play button, and sing the following song fragments to record voiceprint information.”
  • the text information 732 is a piece of lyrics.
  • the vehicle-mounted device After recording the voice signal input by the user, the vehicle-mounted device can extract the voiceprint information of the collected voice signal, and store the voiceprint information.
  • the above introduces the setting interface for humming recognition on vehicle equipment. It should be noted that it is not limited to the user interface introduced above.
  • the setting interface for humming recognition on vehicle equipment you can also refer to the smart terminal described above.
  • the user interface that displays the humming recognition result in the vehicle-mounted device is further introduced.
  • FIG. 8A exemplarily shows a user interface 81 for displaying recognition results on a vehicle-mounted device.
  • the vehicle-mounted device when the vehicle-mounted device recognizes an audio file according to the music clip hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a notification window 842 on its current interface for displaying to the user
  • the notification window 842 may include: a humming recognition icon 840, a third display area 841, a playback control 843, and a control 844.
  • the humming recognition icon 840 is used to indicate the source of the notification window 842, in order to facilitate the user to quickly understand that the notification window 842 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 840 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.
  • the third display area 841 can be used to display the identification information of the recognized audio file.
  • the third display area 841 contains the name of the song "Across the Ocean to See You".
  • the third display area 841 may also include operation instruction information, for example, "click to stop playback" included in the third display area 841, which can provide the user with an operation reminder and improve the convenience of the user's operation.
  • the third display area 841 may also contain singer information or lyrics information of the currently played audio file.
  • the vehicle-mounted device may also display the lyrics information of the currently playing audio file in the form of a floating window.
  • the floating window is a movable window displayed floating in the display interface of the vehicle-mounted device.
  • the playback control 843 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses or continues to play the audio file. Specifically, after the in-vehicle device recognizes the audio file corresponding to the music segment hummed by the user, it will play the audio file in accordance with the user's singing progress. At this time, the play control 843 displays the first state. In the case where the playback control 843 is displayed in the first state, if the playback control 843 receives a user's operation, the vehicle-mounted device pauses playing the audio file and displays the playback control 843 in the second state. In the case where the playback control 843 is displayed in the second state, if the playback control 843 receives the user's operation, the vehicle-mounted device continues to play the audio file and displays the playback control 843 in the first state.
  • the control 844 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal.
  • the control 844 is not displayed in the in-vehicle device, and the instruction message "recognize the voice call "recognize” can be performed again for humming recognition".
  • the in-vehicle device detects the "re-recognition" of the voice information input by the user, the in-vehicle device will pause playing the audio file and perform humming recognition on the segment that the user hums again. As a result, the notification window is displayed again. In this way, no manual operation by the user is required, and it is convenient for the user to output an instruction to perform humming recognition again during driving.
  • the notification window 842 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, or the like. Or, when the notification window 842 receives the user's upward sliding operation, in response to the operation, the in-vehicle device no longer displays the notification window 842 in the user interface 81. Or, the notification window can disappear after the current song is played.
  • FIG. 8B exemplarily shows a user interface 82 for displaying recognition results on another vehicle-mounted device.
  • the vehicle-mounted device when the vehicle-mounted device recognizes an audio file based on the music segment hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a user interface 82 on its current interface for displaying information to the user
  • the user interface 82 may include: a humming recognition icon 840, a third display area 841, a playback control 843, a control 844, a volume control 851, a control 853, and a control 854.
  • the humming recognition icon 840, the third display area 841, the playback control 843, and the control 844 can all refer to the description in FIG. 8A, and will not be repeated here.
  • the volume control 851 can be used to adjust the volume of playing audio files.
  • the volume control 851 may be used to receive a user's operation, and in response to the operation, the vehicle-mounted device adjusts the volume of the played audio file.
  • the vehicle-mounted device reduces the volume of playing audio files; when the received user operation is a sliding to the right, the vehicle-mounted device increases the volume of playing audio files.
  • the ratio of the distance from the audio file control 851 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.
  • the control 853 can be used to collect the recognized audio files.
  • the control 853 can receive the user's operation.
  • the vehicle-mounted device adds the identified audio file identifier to the preset favorites (or the folder called "favorite music", which is not limited by this application). ), it is convenient for the user to search or play the identified audio file next time.
  • the control 854 can be used to download the recognized audio file.
  • the control 854 can receive a user's operation, and in response to the user's operation, the in-vehicle device downloads the audio resource of the identified audio file from the network.
  • the vehicle-mounted device displays a selection box, and the selection box contains sound quality options such as "standard quality", "high quality", and "lossless quality".
  • the selection box is used to receive a user's selection operation on an option, and in response to the user's selection operation on an option, the vehicle-mounted device downloads audio resources of sound quality corresponding to the option.
  • the control 855 can be used to share the recognized audio file.
  • the control 855 can receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays a sharing frame, which contains multiple sharing objects, for example, one or more terminal devices connected to the vehicle-mounted device via Bluetooth.
  • the sharing box is used to receive a user's selection operation of a sharing object, and in response to the user's selection operation of a sharing object, the vehicle-mounted device sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.
  • the notification window 842 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays the user interface 82.
  • the user interface 82 may receive a user's sliding operation, and in response to the sliding operation, the vehicle-mounted device displays the user interface that was recently displayed before the user interface 82.
  • the system architecture includes electronic equipment and music recognition server. among them:
  • the electronic device may be the smart terminal 100 exemplarily shown in FIG. 1A, specifically it may be a portable electronic device such as a mobile phone or a tablet computer, or a wearable device such as a smart watch or a smart bracelet, and the electronic device may also be the exemplarily shown in FIG. 1C.
  • the smart home device 110 or the in-vehicle device 120 exemplarily shown in FIG. 1D.
  • the electronic device may have an audio input module and an audio output module.
  • the electronic device can collect the sound in the external environment through the audio input module, and send the sound signal to the music recognition server for humming recognition. After that, the electronic device receives the recognized audio file and the playback position from the music recognition server, and then passes the speaker module Play the recognized audio file from the playback position.
  • the electronic device may also include a camera module, the camera module is used to obtain the user's mouth shape information, the electronic device may send the acquired mouth shape information to the music recognition server for use in music
  • the recognition server combines the lip shape information with the voice signal to perform humming recognition.
  • the music recognition server can perform feature extraction on the received sound signal, and use the extracted features (for example, the fundamental frequency sequence) to search, and match it with the user's hum from the pre-stored audio resource library (or called the feature database). The most similar audio information on the album.
  • the music recognition server may be a separate server, and the music recognition server may also be composed of multiple servers.
  • the audio resource library may be stored in the music recognition server, and the audio resource library may also be stored in another device (for example, a database server) that has a connection relationship with the music recognition server.
  • FIG. 9 is a flowchart of a humming recognition method provided by an embodiment of the present application.
  • the humming recognition method provided by the embodiment of the present application includes but is not limited to the following steps.
  • the electronic device collects sounds in the external environment through the audio input module.
  • the electronic device needs to determine whether its own audio input module and/or audio output module is occupied. If its own audio input module and/or audio output module is occupied, for example, playing audio/video, making a phone call, performing voice navigation, etc., the electronic device does not collect sounds in the external environment through the audio input module for humming Recognition operation, it should be noted that the electronic device does not collect sounds in the external environment through the audio input module for humming recognition operation. It does not mean that in this case, the electronic device does not collect the external environment through the audio input module.
  • the operation of sound but the purpose of acquiring sound by the electronic device is not for humming recognition. For example, during a call, an electronic device (for example, a mobile phone) needs to collect sounds in the external environment through an audio input module, in order to obtain voice information input by the user and to obtain environmental sounds for noise reduction.
  • the audio input module and/or audio output module of the electronic device is used to collect sounds in the external environment.
  • the electronic device can collect the external environment through the audio input module In the voice.
  • the priority of the humming recognition operation provided in the embodiment of the present application is lower than the priority of other operations in the electronic device that need to occupy audio resources except the humming recognition operation.
  • the electronic device matches the voiceprint information of the sound with the prestored voiceprint information. If the matching is successful, that is, the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends the first audio file to the music recognition server for humming recognition; if the matching fails, the voiceprint information of the sound If it is inconsistent with the pre-stored voiceprint information, the electronic device continues to collect sounds in the external environment through the audio input module.
  • the pre-stored voiceprint information is the pre-stored voiceprint information extracted from the voice signal input by the user.
  • the electronic device can receive the sound input by the user through the user interface 55, the user interface 63, and the user interface 73 exemplarily shown in the above-mentioned embodiment; then, the electronic device extracts voiceprints from the collected sound Process, and then store the extracted voiceprint information.
  • the voiceprint information of the voice is consistent with the prestored voiceprint information, which does not mean that the voiceprint information of the voice is exactly the same as the prestored voiceprint information; the similarity between the voiceprint information of the voice and the prestored voiceprint information is not When it is less than the preset value (for example, 90%, 95%), it can be determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information.
  • the electronic device may match the voiceprint information of the sound with the pre-stored voiceprint information: the electronic device extracts the voiceprint information from the sound signal, and the electronic device calculates the extracted voiceprint information and The similarity of the predicted voiceprint information.
  • the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information; if the similarity is less than the preset value, the electronic device determines the voiceprint information of the sound Inconsistent with the pre-stored voiceprint information.
  • the music recognition server searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file.
  • the way that the music recognition server finds the second audio file from the audio resource library according to the first audio file may be: extracting features of the first audio file, and using the extracted features (for example, The base frequency sequence) is searched, and the second audio file that is most similar to the first audio file is selected from the pre-stored audio resource library (or called the feature database). That is, the similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library The audio file except the second audio file.
  • the music recognition server can use automatic speech recognition (ASR) technology to convert the first audio file into text information, so as to determine the lyrics information corresponding to the first audio file. Further, the music recognition server can determine the progress of the user's humming music according to the recognized text information, and then determine the initial playback position of the second audio file. The start playback position of the second audio file corresponds to the end position of the first audio file. Therefore, the electronic device plays the second audio file from the start playback position to achieve the effect of playing audio following the user's humming progress.
  • ASR automatic speech recognition
  • the music recognition server sends the second audio file and first indication information to the electronic device, where the first indication information indicates a starting playback position of the second audio file.
  • the electronic device After receiving the second audio file and the first instruction information sent by the music recognition server, the electronic device plays the second audio file from the starting playback position through the audio output module.
  • the electronic device before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether its own humming recognition function is enabled.
  • the electronic device may receive the user's setting of the humming recognition function through the user interface exemplarily shown in the user interface 41, the user interface 42, the user interface 51, the user interface 52, the user interface 62, and the user interface 72 in the foregoing embodiment. If the electronic device determines that its humming recognition function is enabled, the electronic device performs the step of collecting sounds in the external environment through the audio input module; if the electronic device determines that its humming recognition function is not enabled, the electronic device does not perform audio input The steps for the module to collect sounds in the external environment.
  • the electronic device when it is detected that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that after detecting that the electronic device is unlocked, the electronic device can collect sounds in the external environment through the audio input module.
  • the switch control 556 can be used to set the usable state of the humming recognition function. In this way, the collection of environmental sounds can be stopped when the electronic device is in the locked state, which can reduce power consumption and save the power of the electronic device.
  • the electronic device when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that if it is detected that the electronic device is no longer located at the preset location, the electronic device can collect sounds in the external environment through the audio input module.
  • the preset location may be a location set by the user (for example, the location of a company set by the user, etc.), and the preset location may also be a location prestored in the electronic device (for example, a school, a hospital, a theater, etc.).
  • the electronic device can determine its own location through a global positioning system (GPS), Bluetooth (BT) or wireless local area networks (WLAN).
  • GPS global positioning system
  • BT Bluetooth
  • WLAN wireless local area networks
  • switch control 557 refers to the introduction of the switch control 557 in the user interface 52 in the above embodiment.
  • switch control 557 when the “Environment Do Not Disturb” switch control (switch control 557) is in the on state, the electronic device detects in real time (or detects according to a preset period) whether it is located at a preset location, and if it is detected that the electronic device is in a preset location When the location is set, the electronic device stops collecting sounds in the external environment through the audio input module.
  • the preset location is a location that is not suitable for playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided and the power of the electronic device can be saved.
  • the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that when it is detected that the duration of the ambient light brightness is greater than or equal to the preset value and greater than the preset time, the electronic device can collect the sound in the external environment through the audio input module.
  • the electronic device can sense the brightness of the ambient light through the ambient light sensor. It should be noted that the situation where the ambient light brightness of the electronic device is less than the preset value for the duration and greater than the preset time may represent the situation that the electronic device is in the user's pocket or the current time is night. In this case, The electronic device is not suitable for playing audio files. In this way, the problem of playing the second audio file in inappropriate places can be avoided, and the power of the electronic device can be saved.
  • the electronic device stops collecting sounds in the external environment through the audio input module within the first time period.
  • the first time period may be a preset time period (for example, 11 pm to 9 am), and the first time period may also be a time period determined according to the time information input by the user.
  • the situation that the first time period is a time period determined according to the time information input by the user may correspond to the introduction of the switch control 536 in the user interface 52 in the foregoing embodiment.
  • the user can input the start time and end time of the humming recognition function, and the first time period is the time period from the end time to the start time.
  • step S902 before the electronic device determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information, the electronic device may determine whether the voice signal is a human voice. If the electronic device determines that the sound is a human voice, the electronic device then determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information; if the electronic device determines that the sound is not a human voice, the electronic device continues to pass the audio
  • the input module collects sounds in the external environment.
  • the method for the electronic device to determine whether the sound is a human voice may be: the electronic device calculates the frequency of the sound, and if the frequency is within a preset frequency range, the electronic device determines that the sound is a human voice; if the frequency is not within the preset frequency range , The electronic device determines that the sound is not a human voice.
  • the preset frequency range can be set according to requirements. For example, since the reference range of male voices is 64 Hz to 523 Hz and the reference range of female voices is 160 Hz to 1200 Hz, the preset frequency range may be 64 Hz to 1200 Hz.
  • the electronic device may also obtain the user's mouth shape information through a camera.
  • the electronic device may receive the setting of the user's access authority to the camera through the user interface exemplarily shown in the user interface 52 and the user interface 53 in the foregoing embodiment.
  • the on state of the camera of the electronic device may be consistent with the on state of the humming recognition function.
  • the electronic device determines whether the sound is a human voice. If it is determined that the sound is a human voice, the electronic device can obtain the user's mouth shape information through the camera. For the method of judging whether the sound is a human voice, refer to the above description, which will not be repeated here. In this way, the power consumption of the electronic device can be reduced, and the power of the electronic device can be saved.
  • the music recognition server can also receive the lip shape information sent by the humming recognition server.
  • the music recognition server can determine the text information based on the lip shape information, and combine the text information determined by the lip shape and the first audio file to determine the final Recognition results. That is, the similarity between the text information corresponding to the second audio file and the text information corresponding to the lip shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information. In this way, the accuracy of identifying the second audio file can be further improved.
  • the music recognition server determines whether the first audio file is a music fragment.
  • the music recognition server music segment may determine whether the first audio file is a music segment based on text information corresponding to the first audio file and multiple intervals between consecutive texts in the audio file. It should be noted that the music recognition server prestores text information corresponding to an audio file (which can be understood as lyrics), and multiple intervals between consecutive text in the audio file.
  • the similarity between the text information corresponding to the first audio file and the text information corresponding to one or more pre-stored audio files is not less than the preset value, and the multiple intervals between consecutive text in the first audio file are equal to The similarity of multiple intervals between consecutive words in the one or more audio files is not less than a preset value, and then it is determined that the first audio file is a music segment.
  • the music recognition server determines that the sound signal is a music fragment
  • the music recognition server finds the second audio file from the audio resource library according to the first audio file.
  • the second audio file is included in the one or more audio files.
  • the music recognition server feeds back to the electronic device the result that the sound signal is not a music segment.
  • the tag of the second audio file is included in the user tag of the first user.
  • the first user is a user who logs in in an electronic device, or is a user who uses the electronic device, and a user tag of the first user is pre-stored in the music recognition server.
  • the second audio file can be made more in line with the user's preferences, and the user experience can be improved.
  • the electronic device before the electronic device plays the second audio file from the starting playback position through the audio output module, it is necessary to determine whether the location information of the electronic device is consistent with the preset location. Specifically, if the electronic device determines that the location of the electronic device is inconsistent with the preset location, the electronic device plays the second audio file from the starting playback position through the audio output module.
  • the electronic device may only display the humming recognition result, but not play the audio file, and the user interface for displaying the humming recognition result can refer to the above embodiment
  • the introduced user interface 21, user interface 31, user interface 32, user interface 33, user interface 34, user interface 81, and user interface 82 will not be repeated here.
  • switch control 557 in the user interface 52 in the above embodiment.
  • switch control switch control 557 of "Environment Do Not Disturb"
  • the electronic device needs to determine that its location is not a preset location before playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided, and the power of the electronic device can be saved.
  • the environmental volume of the environment where the electronic device is located is determined, and the electronic device determines the volume at which the second audio file is played according to the environmental volume. Specifically, the greater the environmental volume, the greater the volume at which the electronic device plays the second audio file, and the lower the environmental volume, the lower the volume at which the electronic device plays the second audio file.
  • the method further includes: the electronic device displays the identification information of the second audio file, and playing Control; wherein the display state of the playback control is a first state, and the first state indicates that the second audio file is being played; if the electronic device detects a first user acting on the playback control in the first state Operation, in response to the first user operation, the electronic device pauses playing the second audio file, and sets the display state of the playback control to a second state, which indicates that the second audio file is paused.
  • the electronic device detects a second user operation acting on the playback control in the second state, in response to the second user operation, the electronic device continues to play the second audio file, and plays the second audio file
  • the display state of the control is set to the first state.
  • the electronic device displays the identification information of the second audio file and the user interface of the playback control. Refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, the user interface 34, and the user interface introduced in the above embodiment.
  • the interface 81 and the user interface 82 are not repeated here.
  • the electronic device will play the second audio file within the time period from the moment when it starts to play the second audio file to the preset moment (for example, the 5th second, the 6th second, etc.)
  • the volume gradually increases from low to high. For example, gradually increase from the minimum volume value to the volume value set by the user, or gradually increase from 30% of the volume value set by the user to 100% of the volume value set by the user, and there may be other volume levels
  • the increase mode is not limited in the embodiment of this application.
  • the electronic device may also detect whether the second audio file is stored in a pre-stored music folder If yes, the electronic device can play other audio files in the music folder after playing the second audio file.
  • the humming recognition method provided in this application can also be applied in an open platform.
  • the open platform obtains a first audio file, and the first audio file includes sounds in the external environment; if the open platform determines that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open The platform searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file; wherein the characteristics of the second audio file are similar to those of the first audio file The degree of similarity is higher than the similarity between the feature of the third audio file and the feature of the sound.
  • the third audio file is the audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file Corresponds to the end position of the first audio file; the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device to play the second audio from the start playback position file.
  • the open platform is a platform that provides an open application programming interface (API) or function. That is, the open platform may have the function of an application program provided with an API, or the function of a function.
  • the open platform can implement the method executed by the electronic device and the music recognition server in FIG. 9 by calling an API (or function).
  • the open platform may be a voice assistant platform, which may include only the voice assistant on the electronic device side, or a platform directly associated with the voice assistant on the electronic device side and the server side, or only the voice assistant on the server side. Platform, the embodiment of the present invention does not specifically limit it.
  • the way for the open platform to obtain the first audio may be that the open platform obtains the first audio file through the audio input module of the device where it is located, or the open platform receives the first audio file sent by an electronic device connected to itself.
  • the electronic device may actively send the first audio file to the open platform, or the open platform may actively obtain the first audio file from the electronic device.
  • the open platform calls the API (or function) with the voiceprint recognition function to determine whether the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information.
  • the open file calls an API (or function) with a humming recognition function to find the second audio file from the audio resource library according to the first audio file.
  • the open platform plays the second audio file from the start playback position through the audio output module of the device where it is located, or the development platform controls other applications of the electronic device to play the second audio file from the start playback position .
  • the open platform may send a second audio file and first instruction information to the electronic device, where the first instruction information includes the start playback position, and the first instruction information is used to instruct the electronic device to play from the start Position to play the second audio file.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Disclosed in the present application are a humming recognition method and a related device. In the humming recognition method, an electronic device can continuously acquire voice in the external environment, and upon determining that the voice is produced by a preset user, the electronic device sends to a music recognition server a first audio file comprising the voice to perform humming recognition. Upon receiving a recognized second audio file and its start playback position sent by the music recognition server, the electronic device can start playing back the second audio file from the end position of the voice, wherein the start playback position of the second audio file corresponds to the end position of the first audio file. In this way, the steps of operations conducted by a user to trigger a terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved; in addition, the audio can be played back subsequently to the humming of the user, so that the user experience can be improved.

Description

一种哼唱识别方法及相关设备Humming recognition method and related equipment
本申请要求在2019年5月31日提交中国国家知识产权局、申请号为201910472410.9的中国专利申请的优先权,发明名称为“一种哼唱识别方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China with application number 201910472410.9, and the priority of the Chinese patent application with the title of "A humming recognition method and related equipment" on May 31, 2019 , Its entire content is incorporated in this application by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种哼唱识别方法及相关设备。This application relates to the field of computer technology, in particular to a humming recognition method and related equipment.
背景技术Background technique
哼唱识别是目前音频检索领域的研究热点。不同于利用文本(例如,歌曲名、演唱者或者歌词等文本)来检索音频的方式,也不同于利用一段正在播放的音乐来检索音频的方式,哼唱识别可以通过用户哼唱的音乐片段来检索音频。Humming recognition is currently a research hotspot in the field of audio retrieval. Different from using text (for example, song name, singer, or lyrics) to retrieve audio, and different from using a piece of music to retrieve audio, humming recognition can be based on the music segment hummed by the user. Retrieve audio.
现阶段,用户触发终端进行哼唱识别的方式主要有以下两种:第一种方式,用户首先需要查找具有哼唱识别功能的应用程序,再在该应用程序中查找哼唱识别对应的功能控件,之后对该功能控件执行操作,以触发终端进行哼唱识别。第二种方式,用户首先需要通过唤醒词唤醒智能语音助手(例如,siri,天猫精灵等),再输入语音指令以触发终端进行哼唱识别。可以看出,现有技术中,用户触发终端进行哼唱识别的方式较为繁琐。At this stage, the user triggers the terminal to perform humming recognition mainly in the following two ways: the first method, the user first needs to find an application with humming recognition function, and then find the corresponding functional control of humming recognition in the application , And then perform operations on the functional control to trigger the terminal to perform humming recognition. In the second way, the user first needs to wake up the intelligent voice assistant (for example, siri, Tmall Genie, etc.) through a wake-up word, and then input a voice command to trigger the terminal to perform humming recognition. It can be seen that in the prior art, the manner in which the user triggers the terminal to perform humming recognition is relatively complicated.
发明内容Summary of the invention
本申请提供了一种哼唱识别方法及相关设备,可以减少用户触发终端进行哼唱识别的操作步骤,提升哼唱识别的效率,同时,可以实现跟随用户的哼唱播放音频的效果,提升用户体验。This application provides a humming recognition method and related equipment, which can reduce the operation steps of a user triggering a terminal to perform humming recognition, improve the efficiency of humming recognition, and at the same time, can achieve the effect of playing audio following the user's humming, and improve the user Experience.
上述目标和其他目标将通过独立权利要求中的特征来达成。进一步的实现方式在从属权利要求、说明书和附图中体现。The above goals and other goals will be achieved through the features in the independent claims. Further implementations are embodied in the dependent claims, description and drawings.
第一方面,本申请实施例提供了一种哼唱识别方法,该方法可包括:电子设备通过音频输入模块采集外部环境中的声音;若所述电子设备判定所述声音的声纹信息与预存的声纹信息一致,则所述电子设备向音乐识别服务器发送第一音频文件,所述第一音频文件中包含所述声音,所述音乐识别服务器用于根据所述第一音频文件从音频资源库中查找出第二音频文件,以及确定所述第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;所述电子设备接收所述音乐识别服务器发送的所述第二音频文件以及第一指示信息,所述第一指示信息指示所述第二音频文件的起始播放位置;所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件。通过这种方式,可以减少用户触发终端进行哼唱识别的操作步骤,提升哼唱识别的效率,同时,可以实现跟随用户的哼唱播放音频的效果,提升了用户体验。In the first aspect, an embodiment of the present application provides a humming recognition method, which may include: an electronic device collects sound in an external environment through an audio input module; if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is the same, the electronic device sends a first audio file to the music recognition server, and the first audio file contains the sound, and the music recognition server is configured to retrieve the audio resource from the audio resource according to the first audio file. The second audio file is found in the library, and the initial playback position of the second audio file is determined; wherein the similarity between the feature of the second audio file and the feature of the first audio file is higher than that of the third The similarity between the feature of the audio file and the feature of the sound, the third audio file is the audio file excluding the second audio file in the audio resource library, and the start playback position of the second audio file is the same as The end position of the first audio file corresponds; the electronic device receives the second audio file and first indication information sent by the music recognition server, and the first indication information indicates the start of the second audio file The starting playback position; the electronic device plays the second audio file from the starting playback position through the audio output module. In this way, the operation steps for the user to trigger the terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.
结合第一方面,在一种可能的实现方式中,所述方法还包括:所述电子设备通过摄像头获取用户的口型信息;若所述声音的声纹信息与预存的声纹信息一致,则所述电子设备向音 乐识别服务器发送所述口型信息;其中,所述音乐识别服务器还用于将所述口型信息转化为文本信息,所述根据所述第一音频文件从音频资源库中查找出第二音频文件,包括:根据所述第一音频文件和所述口型信息对应的文本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic device obtains the user's mouth shape information through a camera; if the voiceprint information of the voice is consistent with the prestored voiceprint information, then The electronic device sends the lip shape information to a music recognition server; wherein, the music recognition server is also used to convert the lip shape information into text information, and the first audio file is obtained from an audio resource library Finding the second audio file includes: finding the second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
结合第一方面,在一种可能的实现方式中,所述电子设备通过摄像头获取用户的口型信息,包括:若所述电子设备判定所述声音为人声,则通过摄像头获取用户的口型信息。With reference to the first aspect, in a possible implementation manner, the electronic device obtains the user's mouth shape information through a camera, including: if the electronic device determines that the sound is a human voice, acquiring the user's mouth shape information through the camera .
结合第一方面,在一种可能的实现方式中,所述电子设备通过音频输入模块采集外部环境中的声音,包括:若所述电子设备判定所述音频输入模块和/或所述音频输出模块未被占用,则所述电子设备通过所述音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation manner, the electronic device collects sounds in the external environment through an audio input module, including: if the electronic device determines that the audio input module and/or the audio output module If it is not occupied, the electronic device collects sounds in the external environment through the audio input module.
结合第一方面,在一种可能的实现方式中,所述第二音频文件的标签包含于第一用户的用户标签。With reference to the first aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.
结合第一方面,在一种可能的实现方式中,在所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件之后,所述方法还包括:所述电子设备显示第二音频文件的标识信息,以及播放控件;其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;若所述电子设备检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,所述电子设备暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the method further includes: the electronic device displays The identification information of the second audio file, and the playback control; wherein, the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the electronic device detects the function In response to the first user operation of the play control in the first state, the electronic device pauses playing the second audio file and sets the display state of the play control The second state indicates that the second audio file is paused.
结合第一方面,在一种可能的实现方式中,所述方法还包括:当检测到所述电子设备处于锁定状态时,所述电子设备停止通过所述音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation manner, the method further includes: when detecting that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module.
结合第一方面,在一种可能的实现方式中,所述方法还包括:当检测到所述电子设备处于预设地点时,所述电子设备停止通过所述音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation, the method further includes: when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module .
结合第一方面,在一种可能的实现方式中,所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件,包括:若所述电子设备判定所述电子设备的位置与预设地点不一致,则所述电子设备通过所述音频输出模块从所述起始播放位置播放所述第二音频文件。With reference to the first aspect, in a possible implementation manner, the electronic device playing the second audio file from the starting playback position through the audio output module includes: if the electronic device determines that the electronic device is If the location is not consistent with the preset location, the electronic device plays the second audio file from the start playback position through the audio output module.
结合第一方面,在一种可能的实现方式中,所述方法还包括:所述电子设备在第一时间段内停止通过所述音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation manner, the method further includes: the electronic device stops collecting sounds in the external environment through the audio input module within the first time period.
结合第一方面,在一种可能的实现方式中,该电子设备通过音频输入模块采集外部环境中的声音,包括:若电子设备判定自身的哼唱识别功能开启,则电子设备通过音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation manner, the electronic device collects sounds in the external environment through the audio input module, including: if the electronic device determines that its own humming recognition function is enabled, the electronic device collects the sound through the audio input module Sound in the external environment.
结合第一方面,在一种可能的实现方式中,该方法还包括:当检测到环境光亮度小于预设值的持续时间,大于预设时间时,所述电子设备停止通过音频输入模块采集外部环境中的声音。With reference to the first aspect, in a possible implementation manner, the method further includes: when it is detected that the duration of the ambient light brightness is less than the preset value and greater than the preset time, the electronic device stops collecting external data through the audio input module. Sound in the environment.
结合第一方面,在一种可能的实现方式中,该音乐识别服务器还用于当该音乐识别服务器判定该声音信号为音乐片段时,根据所述第一音频文件从音频资源库中查找出第二音频文件。With reference to the first aspect, in a possible implementation manner, the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.
结合第一方面,在一种可能的实现方式中,电子设备从开始播放第二音频文件的时刻到预设时刻(例如,第5秒,第6秒等时间值)的时间段内,将使播放第二音频文件的音量由 低到高逐渐增大。With reference to the first aspect, in a possible implementation manner, the electronic device will use the time period from the time when the second audio file is played to the preset time (for example, the 5th second, the 6th second, etc.) The volume of the second audio file is gradually increased from low to high.
结合第一方面,在一种可能的实现方式中,在所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件之后,该电子设备还可以检测该第二音频文件是否存储在预存的音乐文件夹中,若是,该电子设备可以在播放完该第二音频文件之后,播放该音乐文件夹中的其他音频文件。With reference to the first aspect, in a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the electronic device may also detect the second audio file Whether it is stored in a pre-stored music folder, if so, the electronic device can play other audio files in the music folder after playing the second audio file.
第二方面,本申请实施例提供了一种电子设备,这种电子设备包括音频输入模块,音频输出模块,处理器,存储器,其中:所述存储器用于存储程序指令;所述处理器用于根据所述程序指令执行以下操作:通过音频输入模块采集外部环境中的声音;若判定所述声音的声纹信息与预存的声纹信息一致,则向音乐识别服务器发送第一音频文件,所述第一音频文件中包含所述声音,所述音乐识别服务器用于根据所述第一音频文件从音频资源库中查找出第二音频文件,以及确定第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;接收所述音乐识别服务器发送的所述第二音频文件以及第一指示信息,所述第一指示信息指示所述第二音频文件的起始播放位置;通过音频输出模块从所述起始播放位置播放所述第二音频文件。通过这种电子设备,可以减少用户触哼唱识别的操作步骤,提升哼唱识别的效率,同时,可以实现跟随用户的哼唱播放音频的效果,提升了用户体验。In the second aspect, an embodiment of the present application provides an electronic device. This electronic device includes an audio input module, an audio output module, a processor, and a memory. The memory is used to store program instructions; The program instructions perform the following operations: collect sounds in the external environment through the audio input module; if it is determined that the voiceprint information of the voice is consistent with the prestored voiceprint information, send the first audio file to the music recognition server, and the first audio file is sent to the music recognition server. An audio file contains the sound, and the music recognition server is configured to find a second audio file from an audio resource library according to the first audio file, and determine the starting playback position of the second audio file; where The similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library Except for the audio files of the second audio file, the start playback position of the second audio file corresponds to the end position of the first audio file; receiving the second audio file sent by the music recognition server, and First indication information, where the first indication information indicates the start playback position of the second audio file; the second audio file is played from the start playback position through an audio output module. Through this electronic device, the operation steps of the user's touch and humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be realized, and the user experience can be improved.
结合第二方面,在一种可能的实现方式中,所述电子设备还包括摄像头,所述处理器还用于根据所述程序指令执行以下操作:通过摄像头获取用户的口型信息;若所述声音的声纹信息与预存的声纹信息一致,则向音乐识别服务器发送所述口型信息;其中,所述音乐识别服务器还用于将所述口型信息转化为文本信息;所述音乐识别服务器还具体用于:根据所述第一音频文件和所述口型信息对应的文本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。With reference to the second aspect, in a possible implementation manner, the electronic device further includes a camera, and the processor is further configured to perform the following operations according to the program instructions: obtain the user's mouth shape information through the camera; If the voiceprint information of the voice is consistent with the pre-stored voiceprint information, the lip-shape information is sent to the music recognition server; wherein the music recognition server is also used to convert the lip-shape information into text information; the music recognition The server is also specifically configured to find a second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file is The similarity of the text information corresponding to the lip shape information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
结合第二方面,在一种可能的实现方式中,所述处理器具体用于根据所述程序指令执行以下操作:若判定所述声音为人声,则通过摄像头获取用户的口型信息。With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the sound is a human voice, obtain the user's mouth shape information through a camera.
结合第二方面,在一种可能的实现方式中,所述处理器具体用于根据所述程序指令执行以下操作:若判定所述音频输入模块和/或音频输出模块未被占用,则通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the audio input module and/or the audio output module is not occupied, pass the audio The input module collects sounds in the external environment.
结合第二方面,在一种可能的实现方式中,所述第二音频文件的标签包含于第一用户的用户标签。With reference to the second aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.
结合第二方面,在一种可能的实现方式中,所述电子设备还包括显示屏,所述处理器还用于根据所述程序指令执行以下操作:通过显示屏显示第二音频文件的标识信息,以及播放控件;其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;若检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。With reference to the second aspect, in a possible implementation manner, the electronic device further includes a display screen, and the processor is further configured to perform the following operations according to the program instructions: display identification information of the second audio file on the display screen , And a play control; wherein the display state of the play control is a first state, and the first state indicates that the second audio file is being played; if it is detected that it acts on the play in the first state The first user operation of the control, in response to the first user operation, pause the playback of the second audio file, and set the display state of the playback control to the second state, the second state representing the second The audio file is paused.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以 下操作:当检测到所述电子设备处于锁定状态时,停止通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when detecting that the electronic device is in a locked state, stop collecting the external environment through the audio input module In the voice.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以下操作:当检测到所述电子设备处于预设地点时,停止通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when it is detected that the electronic device is at a preset location, stop collecting external data through the audio input module Sound in the environment.
结合第二方面,在一种可能的实现方式中,所述处理器具体用于根据所述程序指令执行以下操作:若判定所述电子设备的位置与预设地点不一致,则通过音频输出模块从所述起始播放位置播放所述第二音频文件。With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that the location of the electronic device is inconsistent with the preset location, the audio output module The second audio file is played at the start playback position.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以下操作:在第一时间段内停止通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: stop collecting sounds in the external environment through the audio input module within the first time period.
结合第二方面,在一种可能的实现方式中,所述处理器具体用于根据所述程序指令执行以下操作:若判定自身的哼唱识别功能开启,则通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is specifically configured to perform the following operations according to the program instructions: if it is determined that its own humming recognition function is turned on, collect data in the external environment through the audio input module sound.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以下操作:当检测到环境光亮度小于预设值的持续时间,大于预设时间时,停止通过音频输入模块采集外部环境中的声音。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: when it is detected that the duration of the ambient light brightness is less than a preset value and is greater than the preset time, Stop collecting sounds in the external environment through the audio input module.
结合第二方面,在一种可能的实现方式中,该音乐识别服务器还用于当该音乐识别服务器判定该声音信号为音乐片段时,根据所述第一音频文件从音频资源库中查找出第二音频文件。With reference to the second aspect, in a possible implementation manner, the music recognition server is further configured to, when the music recognition server determines that the sound signal is a music fragment, find the first audio file from the audio resource library according to the first audio file. Two audio files.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以下操作:从开始播放第二音频文件的时刻到预设时刻(例如,第5秒,第6秒等时间值)的时间段内,将使播放第二音频文件的音量由低到高逐渐增大。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: from the time when the second audio file starts to be played to a preset time (for example, the 5th second, During the time period of the 6th second, the volume of the second audio file will be gradually increased from low to high.
结合第二方面,在一种可能的实现方式中,所述处理器还用于根据所述程序指令执行以下操作:检测该第二音频文件是否存储在预存的音乐文件夹中,若是,则在播放完该第二音频文件之后,播放该音乐文件夹中的其他音频文件。With reference to the second aspect, in a possible implementation manner, the processor is further configured to perform the following operations according to the program instructions: detecting whether the second audio file is stored in a pre-stored music folder, and if so, After the second audio file is played, other audio files in the music folder are played.
第三方面,本申请实施例提供了又一种哼唱识别方法,该方法包括:开放平台获取第一音频文件,所述第一音频文件中包括外部环境中的声音;若所述开放平台判定所述第一音频文件的声纹信息与预存的声纹信息一致,则所述开放平台根据所述第一音频文件从音频资源库中查找第二音频文件,以及确定所述第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件。通过这种方式,可以减少用户触发哼唱识别的操作步骤,提升哼唱识别的效率,同时,可以实现跟随用户的哼唱播放音频的效果,提升了用户体验。In a third aspect, an embodiment of the present application provides yet another humming recognition method. The method includes: an open platform obtains a first audio file, and the first audio file includes sounds in an external environment; if the open platform determines If the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform searches for the second audio file from the audio resource library according to the first audio file, and determines the value of the second audio file The initial playback position; wherein the similarity between the features of the second audio file and the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound, and the third The audio file is an audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file corresponds to the end position of the first audio file; The second audio file is played at the initial playback position, or the development platform controls other applications of the electronic device to play the second audio file from the initial playback position. In this way, the operation steps for the user to trigger the humming recognition can be reduced, and the efficiency of the humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.
结合第三方面,在一种可能的实现方式中,所述方法还包括:所述开放平台通过所述电子设备获取用户的口型信息;若所述开放平台判定所述第一音频文件的声纹信息与预存的声纹信息一致,所述开放平台将所述口型信息转化为文本信息;所述根据所述第一音频文件从音频资源库中查找出第二音频文件,包括:根据所述第一音频文件和所述口型信息对应的文 本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform obtains the user's mouth shape information through the electronic device; if the open platform determines the sound of the first audio file The pattern information is consistent with the pre-stored voiceprint information, the open platform converts the lip shape information into text information; the searching for the second audio file from the audio resource library according to the first audio file includes: The text information corresponding to the first audio file and the lip shape information finds the second audio file from the audio resource library, wherein the text information corresponding to the second audio file is compared with the text information corresponding to the lip shape information. The similarity is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
结合第三方面,在一种可能的实现方式中,所述开放平台获取用户的口型信息,包括:若所述开放平台判定所述第一音频文件中包括的声音为人声,则通过所述电子设备获取用户的口型信息。With reference to the third aspect, in a possible implementation manner, the open platform obtains the user's mouth shape information, including: if the open platform determines that the voice included in the first audio file is a human voice, using the The electronic device obtains the user's mouth shape information.
结合第三方面,在一种可能的实现方式中,开放平台获取第一音频文件,包括:若所述开放平台判定音频输入模块和/或音频输出模块未被其他应用占用,则所述开放平台获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the open platform acquiring the first audio file includes: if the open platform determines that the audio input module and/or the audio output module is not occupied by other applications, then the open platform Get the first audio file.
结合第三方面,在一种可能的实现方式中,所述第二音频文件的标签包含于第一用户的用户标签。With reference to the third aspect, in a possible implementation manner, the tag of the second audio file is included in the user tag of the first user.
结合第三方面,在一种可能的实现方式中,在所述开放平台从所述起始播放位置播放所述第二音频文件之后,所述方法还包括:所述开放平台通过电子设备显示第二音频文件的标识信息,以及播放控件;其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;若所述开放平台检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,所述开放平台暂停播放所述第二音频文件,或控制电子设备的其他应用程序暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。With reference to the third aspect, in a possible implementation manner, after the open platform plays the second audio file from the start playback position, the method further includes: the open platform displays the second audio file through an electronic device Second, the identification information of the audio file and the playback control; wherein the display state of the playback control is the first state, and the first state indicates that the second audio file is being played; if the open platform detects that it acts on The first user operation of the playback control in the first state, in response to the first user operation, the open platform pauses the second audio file, or controls other applications of the electronic device to pause the playback. The second audio file is set, and the display state of the playback control is set to the second state, and the second state indicates that the second audio file is paused.
结合第三方面,在一种可能的实现方式中,所述方法还包括:当检测到所述电子设备处于锁定状态时,所述开放平台停止获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the method further includes: when detecting that the electronic device is in a locked state, the open platform stops acquiring the first audio file.
结合第三方面,在一种可能的实现方式中,所述方法还包括:当检测到所述电子设备处于预设地点时,所述开放平台停止获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the method further includes: when it is detected that the electronic device is at a preset location, the open platform stops acquiring the first audio file.
结合第三方面,在一种可能的实现方式中,所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件,包括:若所述开放平台判定所述电子设备的位置与预设地点不一致,则所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件。With reference to the third aspect, in a possible implementation manner, the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device from the start Playing the second audio file at the playback position includes: if the open platform determines that the position of the electronic device is inconsistent with a preset location, then the open platform plays the second audio file from the starting playback position, Or the development platform controls another application program of the electronic device to play the second audio file from the start playback position.
结合第三方面,在一种可能的实现方式中,所述方法还包括:所述开放平台在第一时间段内停止获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the method further includes: the open platform stops acquiring the first audio file within the first time period.
结合第三方面,在一种可能的实现方式中,该开放平台获取第一音频文件,包括:若所述电子设备的哼唱识别功能开启,则开放平台获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the open platform acquiring the first audio file includes: if the humming recognition function of the electronic device is enabled, the open platform acquiring the first audio file.
结合第三方面,在一种可能的实现方式中,该方法还包括:当检测到所述电子设备的环境光亮度小于预设值的持续时间,大于预设时间时,所述开放平台停止获取第一音频文件。With reference to the third aspect, in a possible implementation manner, the method further includes: when it is detected that the ambient light brightness of the electronic device is less than the duration of the preset value and greater than the preset time, the open platform stops acquiring The first audio file.
结合第三方面,在一种可能的实现方式中,该开放平台还用于当判定该第一音频文件为音乐片段时,根据所述第一音频文件从音频资源库中查找出第二音频文件。With reference to the third aspect, in a possible implementation manner, the open platform is also used to find a second audio file from an audio resource library according to the first audio file when it is determined that the first audio file is a music fragment .
结合第三方面,在一种可能的实现方式中,开放平台从开始播放第二音频文件的时刻到预设时刻(例如,第5秒,第6秒等时间值)的时间段内,将使播放第二音频文件的音量由低到高逐渐增大。In combination with the third aspect, in a possible implementation manner, the open platform will use the time period from the moment when the second audio file starts to play to the preset moment (for example, the 5th second, the 6th second, etc.) The volume of the second audio file is gradually increased from low to high.
结合第三方面,在一种可能的实现方式中,在所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件之后,该开放平台还可以检测该第二音频文件 是否存储在所述电子设备的预存的音乐文件夹中,若是,该开放平台可以在控制电子设备的其他应用程序播放完该第二音频文件之后,控制电子设备的其他应用程序播放该音乐文件夹中的其他音频文件。With reference to the third aspect, in a possible implementation manner, after the development platform controls other applications of the electronic device to play the second audio file from the start playback position, the open platform may also detect the second audio file. Second, whether the audio file is stored in the pre-stored music folder of the electronic device. If so, the open platform can control other applications of the electronic device to play the second audio file after other applications of the electronic device have finished playing the second audio file Other audio files in the music folder.
第四方面,本申请实施例提供一种包含指令的计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如使得上述电子设备执行如第一方面中任一可能的实现方式,或者当所述计算机程序产品在开放平台上运行时,使得上述开放平台执行如上述第三方面中任一可能的实现方式。In a fourth aspect, the embodiments of the present application provide a computer program product containing instructions, when the computer program product is run on an electronic device, the electronic device is caused to execute, such as causing the above-mentioned electronic device to execute as any one of the first aspect. A possible implementation manner, or when the computer program product runs on an open platform, such that the open platform executes any possible implementation manner in the third aspect.
第五方面,本申请实施例提供一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如使得上述电子设备执行如第一方面中任一可能的实现方式,或者当所述指令在开放平台上运行时,使得上述开放平台执行如上述第三方面中任一可能的实现方式。In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions run on an electronic device, the electronic device is caused to execute such as the first Any possible implementation manner in the aspect, or when the instruction runs on an open platform, causes the open platform to execute any possible implementation manner in the third aspect.
在本申请提供的哼唱识别的方法中,电子设备可以持续获取外部环境中的声音,在判定该声音为预设用户发出的声音时,该电子设备向音乐识别服务器发送包含该声音的第一音频文件,以进行哼唱识别。在电子设备接收了音乐识别服务器发送的,识别出的第二音频文件以及它的起始播放位置之后,能够从该声音的结束位置开始播放该第二音频文件。其中,该第二音频文件的起始播放位置与第一音频文件的结束位置相对应。通过这种方式,可以减少用户触发终端进行哼唱识别的操作步骤,提升哼唱识别的效率,同时,可以实现跟随用户的哼唱播放音频的效果,提升用户体验。In the method for humming recognition provided in the present application, the electronic device can continuously acquire the sound in the external environment. When determining that the sound is the sound made by the preset user, the electronic device sends the first sound containing the sound to the music recognition server. Audio files for humming recognition. After the electronic device receives the recognized second audio file and its starting position from the music recognition server, it can start playing the second audio file from the ending position of the sound. Wherein, the start playback position of the second audio file corresponds to the end position of the first audio file. In this way, the operation steps for the user to trigger the terminal to perform humming recognition can be reduced, and the efficiency of humming recognition can be improved. At the same time, the effect of playing audio following the user's humming can be achieved, and the user experience can be improved.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.
图1A是本申请实施例提供的一种智能终端的结构示意图;FIG. 1A is a schematic structural diagram of a smart terminal provided by an embodiment of the present application;
图1B是本申请实施例提供的一种智能终端的软件结构框图;FIG. 1B is a software structure block diagram of a smart terminal provided by an embodiment of the present application;
图1C是本申请实施例提供的一种智能家居设备的结构示意图;FIG. 1C is a schematic structural diagram of a smart home device provided by an embodiment of the present application;
图1D是本申请实施例提供的一种车载设备的结构示意图;FIG. 1D is a schematic structural diagram of a vehicle-mounted device provided by an embodiment of the present application;
图2是本申请实施例提供的一种智能终端上的用于显示应用程序菜单的用户界面;Figure 2 is a user interface for displaying application menus on a smart terminal provided by an embodiment of the present application;
图3A-图3B是本申请实施例提供的一些显示识别结果的用户界面;3A-3B are some user interfaces that display recognition results provided by embodiments of the present application;
图3C是本申请实施例提供的一种智能终端处于锁定状态下显示的用户界面;FIG. 3C is a user interface displayed when a smart terminal is in a locked state according to an embodiment of the present application;
图3D-图3F是本申请实施例提供的又一些显示识别结果的用户界面;3D-FIG. 3F are other user interfaces that display recognition results provided by embodiments of the present application;
图3G是本申请实施例提供的一种用于哼唱识别的用户界面;FIG. 3G is a user interface for humming recognition provided by an embodiment of the present application;
图4A-图4B是本申请实施例提供的一些用于设置哼唱识别功能的用户界面;4A-4B are some user interfaces for setting the humming recognition function provided by embodiments of the present application;
图5A-图5C是本申请实施例提供的又一些用于设置哼唱识别功能的用户界面;5A-5C are some other user interfaces for setting the humming recognition function provided by the embodiments of the present application;
图5D-图5F是本申请实施例提供的一些用于设置哼唱识别功能的访问权限的用户界面;Figures 5D-5F are some user interfaces for setting access rights for the humming recognition function provided by embodiments of the present application;
图5G是本申请实施例提供的一种用于录入声纹信息的用户界面;FIG. 5G is a user interface for entering voiceprint information provided by an embodiment of the present application;
图6A-图6B是本申请实施例提供的又一些用于设置哼唱识别功能的用户界面;6A-6B are other user interfaces for setting the humming recognition function provided by the embodiments of the present application;
图6C是本申请实施例提供的又一种用于录入声纹信息的用户界面;FIG. 6C is another user interface for entering voiceprint information provided by an embodiment of the present application;
图7A-图7B是本申请实施例提供的一些车载设备上用于设置哼唱识别功能的用户界面;7A-7B are user interfaces for setting the humming recognition function on some vehicle-mounted devices provided by the embodiments of the present application;
图7C是本申请实施例提供的又一种用于录入声纹信息的用户界面;FIG. 7C is another user interface for entering voiceprint information provided by an embodiment of the present application;
图8A-图8B是本申请实施例提供的一些车载设备上用于显示识别结果的用户界面;8A-8B are user interfaces for displaying recognition results on some vehicle-mounted devices provided by embodiments of the present application;
图9是本申请实施例提供的一种哼唱识别方法的流程图。Fig. 9 is a flowchart of a humming recognition method provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。Among them, in the description of the embodiments of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this document is only a description of related objects The association relationship of indicates that there can be three relationships, for example, A and/or B, which can indicate: A alone exists, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.
首先,对本申请涉及到的一些概念进行具体的介绍。First, some concepts involved in this application are specifically introduced.
哼唱识别,是一种通过用户哼唱的音乐片段来进行音频检索的方式。哼唱识别的工作原理是:电子设备获取用户哼唱的一个音乐片段,然后再将该音乐片段发送给服务器,服务器通过相似度匹配出和用户哼唱片段最相似的音频文件,之后,该服务器将该音频文件反馈给电子设备。可选的,服务器通过从音乐片段中提取特征(例如,基频序列),然后利用该特征进行检索,从预存的音频资源库中匹配出和用户哼唱片段最相似的音频文件。由于用户哼唱的片段和库中实际音频文件的片段不可能完全相似,所以哼唱识别是一种模糊匹配。针对模糊匹配,可以利用字符串编辑距离,以及动态时间规整(dynamic time warping,DTW)算法等提升识别的准确性。Humming recognition is a way to perform audio retrieval through music fragments hummed by users. The working principle of humming recognition is: the electronic device obtains a music piece hummed by the user, and then sends the music piece to the server. The server matches the audio file that is most similar to the user's humming record through the similarity. Then, the server The audio file is fed back to the electronic device. Optionally, the server extracts a feature (for example, a fundamental frequency sequence) from a music segment, and then uses the feature to perform a search, and matches an audio file that is most similar to the user's humming segment from a pre-stored audio resource library. Since the user's humming segment cannot be completely similar to the actual audio file segment in the library, humming recognition is a fuzzy match. For fuzzy matching, string edit distance and dynamic time warping (DTW) algorithms can be used to improve the accuracy of recognition.
用户界面(user interface,UI),是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。应用程序的用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在电子设备300设备上经过解析,渲染,最终呈现为用户可以识别的内容,比如图片、文字、按钮等控件。控件(control),是用户界面的基本元素,典型的控件有按钮(button)、小工具(widget)、工具栏(toolbar)、菜单栏(menu bar)、文本框(text box)、滚动条(scrollbar)、图片和文本。界面中的控件的属性和内容是通过标签或者节点来定义的,比如XML通过<Textview>、<ImgView>、<VideoView>等节点来规定界面所包含的控件。一个节点对应界面中一个控件或属性,节点经过解析和渲染之后呈现为用户可视的内容。此外,很多应用程序,比如混合应用(hybrid application)的界面中通常还包含有网页。网页,也称为页面,可以理解为内嵌在应用程序界面中的一个特殊的控件,网页是通过特定计算机语言编写的源代码,例如超文本标记语言(hyper text markup language,HTML),层叠样式表(cascading style sheets,CSS),java脚本(JavaScript,JS)等,网页源代码可以由浏览器或与浏览器功能类似的网页显示组件加载和显示为用户可识别的内容。网页所包含的具体内容也是通过网页源代码中的标签或者节点来定义的,比如HTML通过<p>、<img>、<video>、<canvas>来定义网页的元素和属性。User interface (UI) is a medium interface for interaction and information exchange between applications or operating systems and users. It realizes the conversion between the internal form of information and the form acceptable to users. The user interface of the application is the source code written in a specific computer language such as java, extensible markup language (XML), etc. The interface source code is parsed and rendered on the electronic device 300, and finally presented to the user can be recognized Content, such as pictures, text, buttons and other controls. Controls are the basic elements of the user interface. Typical controls include buttons, widgets, toolbars, menu bars, text boxes, and scroll bars. scrollbar), pictures and text. The attributes and content of the controls in the interface are defined by tags or nodes. For example, XML specifies the controls contained in the interface through nodes such as <Textview>, <ImgView>, and <VideoView>. A node corresponds to a control or attribute in the interface, and the node is parsed and rendered as user-visible content. In addition, many applications, such as hybrid applications, usually include web pages in their interfaces. A webpage, also called a page, can be understood as a special control embedded in the application program interface. The webpage is source code written in a specific computer language, such as hypertext markup language (HTML), cascading style Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc., web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with similar functions. The specific content contained in a web page is also defined by tags or nodes in the source code of the web page. For example, HTML defines the elements and attributes of the web page through <p>, <img>, <video>, and <canvas>.
用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、 窗口、控件等界面元素。The commonly used form of user interface is a graphical user interface (GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display screen of the electronic device.
本申请以下实施例提供了一种哼唱识别方法及电子设备,可以使得电子设备在用户哼唱音乐片段的过程中,能够跟随用户的哼唱播放该音乐片段对应的音频文件,减少用户触发终端进行哼唱识别的操作步骤,提升哼唱识别的效率。The following embodiments of the present application provide a humming recognition method and electronic device, which can enable the electronic device to follow the user's humming and play the audio file corresponding to the music fragment while the user is humming a music fragment, thereby reducing the user triggering the terminal Perform the operation steps of humming recognition to improve the efficiency of humming recognition.
本申请以下实施例中,电子设备(例如,智能终端、智能家居、车载设备等等)在具有哼唱识别的权限的情况下,会执行本申请实施例提供的哼唱识别操作。该哼唱识别操作的实施流程可参照以下步骤:首先,电子设备通过音频输入模块(例如,麦克风)采集外部环境中的声音;接着,若所述电子设备判定所述声音的声纹信息与预存的声纹信息一致,则电子设备将包含这段声音的第一音频文件发送给音乐识别服务器进行哼唱识别,以识别出和用户哼唱的音乐片段相匹配的音频文件,以及确定音频文件的起始播放位置。其中,该识别出的音频文件的起始播放位置与所述第一音频文件的结束位置相对应。在电子设备接收音乐识别服务器反馈的音频文件以及包含该起始播放位置的信息之后,电子设备可以从该起始播放位置播放该音频文件,从而达到跟随用户的哼唱播放音频文件的目的。后续内容将对执行该哼唱识别操作的系统架构以及实施流程作进一步的介绍,此处不具体展开。In the following embodiments of the present application, electronic devices (for example, smart terminals, smart homes, in-vehicle devices, etc.) will perform the humming recognition operation provided in the embodiments of the present application when they have the authority for humming recognition. The implementation process of the humming recognition operation can refer to the following steps: first, the electronic device collects the sound in the external environment through the audio input module (for example, a microphone); then, if the electronic device determines that the voiceprint information of the sound is pre-stored If the voiceprint information is consistent, the electronic device sends the first audio file containing this sound to the music recognition server for humming recognition, so as to identify the audio file that matches the music segment hummed by the user, and determine the audio file Start playback position. Wherein, the start playback position of the recognized audio file corresponds to the end position of the first audio file. After the electronic device receives the audio file fed back by the music recognition server and the information containing the initial play position, the electronic device can play the audio file from the initial play position, thereby achieving the purpose of playing the audio file following the user's humming. The following content will further introduce the system architecture and implementation process for performing the humming recognition operation, which will not be specifically expanded here.
本申请以下实施例中,在电子设备执行本申请实施例提供的哼唱识别操作之前,需要判定自身的音频输入模块和/或音频输出模块是否被占用,若自身的音频输入模块和/或音频输出模块被占用,例如,播放音频/视频、拨打电话、进行语音导航等等,则电子设备不执行本申请实施例提供的哼唱识别操作;若该电子设备自身的音频输入模块和/或音频输出模块未被占用,则电子设备执行本申请实施例提供的哼唱识别操作。可选的,在电子设备的音频输入模块和/或音频输出模块被释放之后,例如,音频/视频播放结束、电话挂断、语音导航结束等等,该电子设备可执行本申请实施例提供的哼唱识别操作。也可以理解为,本申请实施例提供的哼唱识别操作的优先级,低于该电子设备中除该哼唱识别操作的其他需占用音频输入模块和/或音频输出模块的操作的优先级。In the following embodiments of the present application, before the electronic device performs the humming recognition operation provided by the embodiments of the present application, it needs to determine whether its own audio input module and/or audio output module is occupied, if its own audio input module and/or audio The output module is occupied, for example, to play audio/video, make a call, perform voice navigation, etc., the electronic device does not perform the humming recognition operation provided in the embodiment of this application; if the electronic device’s own audio input module and/or audio If the output module is not occupied, the electronic device performs the humming recognition operation provided in the embodiment of the present application. Optionally, after the audio input module and/or audio output module of the electronic device is released, for example, the audio/video playback ends, the phone hangs up, the voice navigation ends, etc., the electronic device can execute the Humming recognition operation. It can also be understood that the priority of the humming recognition operation provided in the embodiments of the present application is lower than the priority of the operations of the electronic device other than the humming recognition operation that need to occupy the audio input module and/or the audio output module.
本申请以下实施例中,在该电子设备执行本申请实施例提供的哼唱识别操作的过程中,若检测到其他需占用音频输入模块和/或音频输出模块的操作的对音频资源的请求,该电子设备调用该音频输入模块和/或音频输出模块执行该请求对应的操作。特殊的,若该请求需占用音频输出模块的时间小于预设值(例如,1秒),举例而言,该请求可以为发出通知提示音(例如,短消息提示音、应用程序推送提示音),该哼唱识别操作可占用音频输入模块,同时,该请求对应的操作占用音频输出模块。In the following embodiments of the present application, when the electronic device performs the humming recognition operation provided by the embodiments of the present application, if it detects other requests for audio resources that require the operation of the audio input module and/or audio output module, The electronic device invokes the audio input module and/or audio output module to perform an operation corresponding to the request. In particular, if the time required for the request to occupy the audio output module is less than the preset value (for example, 1 second), for example, the request can be a notification sound (for example, a short message sound, an application push sound) , The humming recognition operation can occupy the audio input module, and at the same time, the operation corresponding to the request occupies the audio output module.
本申请以下实施例中,用于执行本申请实施例提供的哼唱识别操作的可以是电子设备的一个系统应用程序或者第三方应用程序。在一种可能的实现方式中,该系统应用程序或者该第三方应用程序可以专用于执行本申请实施例提供的哼唱识别操作;在又一种可能的实现方式中,该系统应用程序或者该第三方应用程序还可以执行其他的服务(或功能),本申请实施例提供的哼唱识别操作仅作为一种服务(或功能)集成在该系统应用程序或者该第三方应用程序之中。In the following embodiments of the present application, the humming recognition operation provided by the embodiments of the present application may be a system application or a third-party application of the electronic device. In a possible implementation, the system application or the third-party application may be dedicated to performing the humming recognition operation provided in the embodiment of the present application; in another possible implementation, the system application or the third-party application The third-party application can also execute other services (or functions). The humming recognition operation provided in the embodiment of the present application is only integrated into the system application or the third-party application as a service (or function).
可以理解的是,“哼唱识别”只是本实施例中所使用的名称,其代表的含义在本实施例中已经记载,其名称并不能对本实施例构成任何限制。例如,在一些可能的实施方式中,“哼唱识别”还可以被称为“听歌识曲”“哼唱检索”等名称。It is understandable that "humming recognition" is only a name used in this embodiment, and its representative meaning has been recorded in this embodiment, and its name does not constitute any limitation to this embodiment. For example, in some possible implementations, "humming recognition" may also be referred to as "listening to song recognition", "humming retrieval" and other names.
在本申请实施例中,执行哼唱识别操作的电子设备可以是智能终端,还可以是智能家居 设备,还可以是车载设备。接下来首先介绍本申请以下实施例中提供的示例性的智能终端100。In the embodiment of the present application, the electronic device that performs the humming recognition operation may be a smart terminal, a smart home device, or a vehicle-mounted device. The following first introduces an exemplary smart terminal 100 provided in the following embodiments of the present application.
图1A示出了智能终端100的结构示意图。FIG. 1A shows a schematic diagram of the structure of the smart terminal 100.
智能终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像模组193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,加速度传感器180C,距离传感器180D,接近光传感器180E,指纹传感器180F,触摸传感器180G,环境光传感器180H等。The smart terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2. , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera module 193, display 194 , And subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an acceleration sensor 180C, a distance sensor 180D, a proximity light sensor 180E, a fingerprint sensor 180F, a touch sensor 180G, an ambient light sensor 180H, and so on.
可以理解的是,本申请实施例示意的结构并不构成对智能终端100的具体限定。在本申请另一些实施例中,智能终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the smart terminal 100. In other embodiments of the present application, the smart terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),中央处理器(central processing unit,CPU),图形处理器(graphics processing unit,GPU),神经网络处理器(neural-network processing unit,NPU),调制解调处理器,图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。在一些实施例中,智能终端100也可以包括一个或多个处理器110。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a central processing unit (CPU), and a graphics processing unit (GPU). , Neural network processor (neural-network processing unit, NPU), modem processor, image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor) processor, DSP), baseband processor, etc. Among them, the different processing units may be independent devices or integrated in one or more processors. In some embodiments, the smart terminal 100 may also include one or more processors 110.
其中,控制器可以是智能终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the smart terminal 100. The controller can generate operation control signals according to the instruction operation code and timing signals, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了智能终端100的效率。A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated access is avoided, the waiting time of the processor 110 is reduced, and the efficiency of the smart terminal 100 is improved.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像模组193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现智能终端100的触摸功能。The I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may couple the touch sensor 180K, charger, flash, camera module 193, etc., through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the smart terminal 100.
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理 器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,从而实现通过蓝牙耳机播放识别出的音频文件的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,从而实现通过蓝牙耳机播放识别出的音频文件的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放识别出的音频文件的功能。The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing the recognized audio file through the Bluetooth headset.
MIPI接口可以被用于连接处理器110与显示屏194,摄像模组193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像模组193通过CSI接口通信,实现智能终端100的摄像功能,从而获取用户的口型信息。处理器110和显示屏194通过DSI接口通信,实现智能终端100的显示功能。The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera module 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera module 193 communicate through a CSI interface to implement the camera function of the smart terminal 100, so as to obtain the user's mouth shape information. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the smart terminal 100.
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像模组193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera module 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为智能终端100充电,也可以用于智能终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他智能终端,例如AR设备等。The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the smart terminal 100, and can also be used to transfer data between the smart terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect to other smart terminals, such as AR devices.
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对智能终端100的结构限定。在另一些实施例中,智能终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the smart terminal 100. In other embodiments, the smart terminal 100 may also adopt different interface connection modes in the above-mentioned embodiments, or a combination of multiple interface connection modes.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过智能终端100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为智能终端供电。The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some embodiments of wireless charging, the charging management module 140 may receive the wireless charging input through the wireless charging coil of the smart terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the smart terminal through the power management module 141.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像模组193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera module 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
智能终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模 块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the smart terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。智能终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the smart terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块150可以提供应用在智能终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the smart terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 may receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在智能终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。示例性地,无线通信模块160可以包括蓝牙模块、Wi-Fi模块等。在一种可能的实现方式中,智能终端可以通过无线通信模块160确定自身所在的位置。The wireless communication module 160 can provide applications on the smart terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2. Exemplarily, the wireless communication module 160 may include a Bluetooth module, a Wi-Fi module, and the like. In a possible implementation manner, the smart terminal can determine its own location through the wireless communication module 160.
在一些实施例中,智能终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得智能终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the smart terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the smart terminal 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
智能终端100通过GPU,显示屏194,以及应用处理器等可以实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图 形渲染。处理器110可包括一个或多个GPU,其执行指令以生成或改变显示信息。The smart terminal 100 can implement a display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphical rendering. The processor 110 may include one or more GPUs, which execute instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,智能终端100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.
智能终端100可以通过摄像模组193,ISP,视频编解码器,GPU,显示屏194以及应用处理器AP、神经网络处理器NPU等实现摄像功能。The smart terminal 100 can realize a camera function through a camera module 193, an ISP, a video codec, a GPU, a display screen 194, an application processor AP, a neural network processor NPU, and the like.
摄像模组193可用于采集拍摄对象的彩色图像数据。ISP可用于处理摄像模组193采集的彩色图像数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像模组193中。The camera module 193 can be used to collect color image data of the subject. The ISP can be used to process the color image data collected by the camera module 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera module 193.
在一些实施例中,彩色摄像模组的摄像头的感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。In some embodiments, the photosensitive element of the camera of the color camera module may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats.
在一些实施例中,智能终端100可以包括1个或N个摄像模组193,N为大于1的正整数。具体的,智能终端100可以包括1个前置摄像模组193以及1个后置摄像模组193。其中,前置摄像模组193通常可用于采集面对显示屏194的拍摄者自己的彩色图像数据,后置摄像模组193可用于采集拍摄者所面对的拍摄对象(如人物、风景等)的彩色图像数据。In some embodiments, the smart terminal 100 may include 1 or N camera modules 193, and N is a positive integer greater than 1. Specifically, the smart terminal 100 may include a front camera module 193 and a rear camera module 193. Among them, the front camera module 193 can usually be used to collect the photographer's own color image data facing the display screen 194, and the rear camera module 193 can be used to collect the photographic objects (such as people, landscapes, etc.) faced by the photographer. Color image data.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当智能终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the smart terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
视频编解码器用于对数字视频压缩或解压缩。智能终端100可以支持一种或多种视频编解码器。这样,智能终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)-1,MPEG-2,MPEG-3,MPEG-4等。Video codecs are used to compress or decompress digital video. The smart terminal 100 may support one or more video codecs. In this way, the smart terminal 100 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG)-1, MPEG-2, MPEG-3, MPEG-4, etc.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现智能终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information and can continuously learn by itself. Through the NPU, applications such as intelligent cognition of the smart terminal 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展智能终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音频文件、照片、视频等数据保存在外部存储卡中。The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the smart terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save audio files, photos, videos and other data in an external memory card.
内部存储器121可以用于存储一个或多个计算机程序,该一个或多个计算机程序包括指令。处理器110可以通过运行存储在内部存储器121的上述指令,从而使得智能终端100执行本申请一些实施例中所提供的智能终端的拍照预览方法,以及各种功能应用以及数据处理等。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统;该存储程序区还可以存储一个或多个应用程序(比如图库、联系人等)等。存储数据区可存储 智能终端100使用过程中所创建的数据(比如照片,联系人等)。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。The internal memory 121 may be used to store one or more computer programs, and the one or more computer programs include instructions. The processor 110 can run the above-mentioned instructions stored in the internal memory 121 to enable the smart terminal 100 to execute the smart terminal photographing preview method provided in some embodiments of the present application, as well as various functional applications and data processing. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store the operating system; the storage program area can also store one or more application programs (such as a gallery, contacts, etc.) and so on. The data storage area can store data (such as photos, contacts, etc.) created during the use of the smart terminal 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
智能终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The smart terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
音频输出模块170A,也称为“扬声器”“喇叭”,用于将音频电信号转换为声音信号。智能终端100可以通过扬声器170A收听音乐,或收听免提通话。The audio output module 170A, also called "speaker" and "speaker", is used to convert audio electrical signals into sound signals. The smart terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
音频输出模块170B,也称为“受话器”“听筒”,用于将音频电信号转换成声音信号。当智能终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The audio output module 170B, also called "receiver" and "earpiece", is used to convert audio electrical signals into sound signals. When the smart terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
音频输入模块170C,也称为“话筒”“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。智能终端100可以设置至少一个麦克风170C。在另一些实施例中,智能终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,智能终端100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。在本申请实施例中,在智能终端100开启了“哼唱识别”功能的情况下,麦克风170C可以采集智能终端100附近的声音信号。The audio input module 170C, also called a "microphone" and a "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C. The smart terminal 100 may be provided with at least one microphone 170C. In other embodiments, the smart terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the smart terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions. In the embodiment of the present application, when the “humming recognition” function is enabled on the smart terminal 100, the microphone 170C can collect sound signals near the smart terminal 100.
在一些实施例中,处理器110中的CPU或数字处理器或音频处理器可以对麦克风170C所采集的声音进行处理。在一种实施例中,当处理器110判定预设时间内收集到的声音为人声时,处理器110从该声音中提取出声纹信息,若所述声音的声纹信息与预存的声纹信息一致,再将包含该声音的第一音频文件通过移动通信模块150或者无线通信模块160发送给音乐识别服务器。In some embodiments, the CPU or digital processor or audio processor in the processor 110 may process the sound collected by the microphone 170C. In one embodiment, when the processor 110 determines that the voice collected within the preset time is a human voice, the processor 110 extracts voiceprint information from the voice. If the voiceprint information of the voice is compared with the pre-stored voiceprint If the information is consistent, the first audio file containing the sound is sent to the music recognition server through the mobile communication module 150 or the wireless communication module 160.
在一些实施例中,处理器110中包含用户画像模块,该用户画像模块可以采集使用该智能终端的用户的用户信息,该用户信息可以包括用户的属性(年龄、性别、职业等)生活习惯、用户行为等信息。在一种可能的实现方式中,智能终端根据这些用户信息可抽象出来形成用户标签,并发送给服务器进行存储。在又一种可能的实现方式中,智能终端可以将这些用户信息发送给服务器,服务器分析这些用户信息,形成用户标签并存储。其中,该用户标签与使用该智能终端的用户的用户账号(或称为用户ID)具有对应关系。在本申请实施例中,可以根据用户播放音频文件的习惯或偏好,对用户抽象形成标签,例如,摇滚、民谣、流行等等,还可以对喜欢的歌手进行记录形成标签,例如,李宗盛、梁静茹、陈奕迅等等。在一种可能的实现方式中,音频文件识别出的第二音频文件的标签包含于第一用户的用户标签,该第一用户可以为使用该智能终端的用户,也可以为登录在该智能终端上的用户账号对应的用户。In some embodiments, the processor 110 includes a user portrait module. The user portrait module can collect user information of the user who uses the smart terminal. The user information may include the user's attributes (age, gender, occupation, etc.), life habits, User behavior and other information. In a possible implementation, the smart terminal can abstract the user information based on the user information to form a user tag, and send it to the server for storage. In another possible implementation manner, the smart terminal may send the user information to the server, and the server analyzes the user information to form a user tag and store it. Wherein, the user tag has a corresponding relationship with the user account (or called user ID) of the user who uses the smart terminal. In the embodiments of this application, users can be abstracted into labels based on their habit or preference for playing audio files, such as rock, folk songs, pop, etc., and favorite singers can also be recorded to form labels, for example, Li Zongsheng, Liang Jingru , Eason Chan and so on. In a possible implementation, the tag of the second audio file identified by the audio file is included in the user tag of the first user, and the first user may be a user who uses the smart terminal, or may be logged in to the smart terminal. The user corresponding to the user account on.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动智能终端平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中, 压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。智能终端100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,智能终端100根据压力传感器180A检测所述触摸操作强度。智能终端100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The smart terminal 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the smart terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The smart terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
陀螺仪传感器180B可以用于确定智能终端100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定智能终端100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测智能终端100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消智能终端100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。在一些实施例中,智能终端100可以通过陀螺仪传感器180B确定自身的移动方向,以提升确定自身的位置的准确性。The gyro sensor 180B may be used to determine the movement posture of the smart terminal 100. In some embodiments, the angular velocity of the smart terminal 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the smart terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the smart terminal 100 through a reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes. In some embodiments, the smart terminal 100 may determine its own moving direction through the gyro sensor 180B, so as to improve the accuracy of determining its own position.
加速度传感器180C可检测智能终端100在各个方向上(一般为三轴)加速度的大小。当智能终端100静止时可检测出重力的大小及方向。还可以用于识别智能终端100的姿态,应用于横竖屏切换,计步器等应用。在一些可能的实现方式中,以下实施例中所示例的用户界面可以随着智能终端的姿态的变换,进行横竖屏的切换。The acceleration sensor 180C can detect the magnitude of the acceleration of the smart terminal 100 in various directions (generally three-axis). When the smart terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of the smart terminal 100, and be used in applications such as horizontal and vertical screen switching, pedometers, etc. In some possible implementation manners, the user interface exemplified in the following embodiments may switch between horizontal and vertical screens as the posture of the smart terminal changes.
距离传感器180D,用于测量距离。智能终端100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,智能终端100可以利用距离传感器180D测距以实现快速对焦,提升获取到的口型信息的准确性。Distance sensor 180D, used to measure distance. The smart terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the smart terminal 100 may use the distance sensor 180D to measure distances to achieve rapid focusing and improve the accuracy of the acquired lip information.
接近光传感器180E可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。智能终端100通过发光二极管向外发射红外光。智能终端100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定智能终端100附近有物体。当检测到不充分的反射光时,智能终端100可以确定智能终端100附近没有物体。智能终端100可以利用接近光传感器180E检测用户手持智能终端100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。The proximity light sensor 180E may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The smart terminal 100 emits infrared light to the outside through the light emitting diode. The smart terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the smart terminal 100. When insufficient reflected light is detected, the smart terminal 100 can determine that there is no object near the smart terminal 100. The smart terminal 100 may use the proximity light sensor 180E to detect that the user holds the smart terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
环境光传感器180F用于感知环境光亮度。智能终端100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180F也可用于拍照时自动调节白平衡。环境光传感器180F还可以与接近光传感器180G配合,检测智能终端100是否在口袋里,以防误触。在一种可能的实现方式中,当智能终端通过环境光传感器180F检测到,环境光亮度小于预设值的持续时间大于预设时间时,所述智能终端停止通过音频输入模块采集外部环境中的声音。指纹传感器180G用于采集指纹。智能终端100可以利用采集的指纹特性实现指纹解锁,以解除该智能终端100的锁定状态。The ambient light sensor 180F is used to sense the brightness of the ambient light. The smart terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180F can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180F can also cooperate with the proximity light sensor 180G to detect whether the smart terminal 100 is in a pocket to prevent accidental touch. In a possible implementation, when the smart terminal detects through the ambient light sensor 180F that the duration of the ambient light brightness being less than the preset value is greater than the preset time, the smart terminal stops collecting the external environment through the audio input module. sound. The fingerprint sensor 180G is used to collect fingerprints. The smart terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking to release the locked state of the smart terminal 100.
触摸传感器180H,也可称触控面板或触敏表面。触摸传感器180H可以设置于显示屏194,由触摸传感器180H与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180H用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确 定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180H也可以设置于智能终端100的表面,与显示屏194所处的位置不同。The touch sensor 180H can also be called a touch panel or a touch-sensitive surface. The touch sensor 180H may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180H and the display screen 194, which is also called a “touch screen”. The touch sensor 180H is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180H may also be disposed on the surface of the smart terminal 100, which is different from the position of the display screen 194.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。智能终端100可以接收按键输入,产生与智能终端100的用户设置以及功能控制有关的键信号输入。The button 190 includes a power button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The smart terminal 100 may receive key input, and generate key signal input related to user settings and function control of the smart terminal 100.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photographing, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和智能终端100的接触和分离。智能终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。智能终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,智能终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在智能终端100中,不能和智能终端100分离。The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the smart terminal 100. The smart terminal 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The smart terminal 100 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the smart terminal 100 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the smart terminal 100 and cannot be separated from the smart terminal 100.
图1A示例性所示的智能终端100可以通过显示屏194显示以下各个实施例中所描述的各个用户界面。智能终端100可以通过触摸传感器180H在各个用户界面中检测触控操作,例如在各个用户界面中的点击操作(如在图标上的触摸操作、双击操作),又例如在各个用户界面中的向上或向下的滑动操作,或执行画圆圈手势的操作,等等。在一些实施例中,智能终端100可以通过陀螺仪传感器180B、加速度传感器180C等检测用户手持智能终端100执行的运动手势,例如晃动智能终端。在一些实施例中,智能终端100可以通过摄像模组193(如3D摄像头、深度摄像头)检测非触控的手势操作。The smart terminal 100 exemplarily shown in FIG. 1A can display various user interfaces described in the following embodiments through a display screen 194. The smart terminal 100 can detect touch operations in each user interface through the touch sensor 180H, such as a click operation in each user interface (such as a touch operation on an icon, a double-click operation), and for example, up or down in each user interface. Swipe down, or perform circle-drawing gestures, etc. In some embodiments, the smart terminal 100 may detect a motion gesture performed by the user holding the smart terminal 100 by hand, for example, shaking the smart terminal through a gyroscope sensor 180B, an acceleration sensor 180C, etc. In some embodiments, the smart terminal 100 can detect non-touch gesture operations through the camera module 193 (such as a 3D camera, a depth camera).
智能终端100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明智能终端100的软件结构。The software system of the smart terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the smart terminal 100 by way of example.
图1B是本申请实施例提供的一种智能终端100的软件结构框图。FIG. 1B is a software structure block diagram of a smart terminal 100 provided by an embodiment of the present application.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图1B所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。As shown in Figure 1B, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer. The application framework layer includes some predefined functions.
如图1B所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。As shown in Figure 1B, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。The window manager is used to manage window programs. The window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls that display text and controls that display pictures. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
电话管理器用于提供智能终端100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the smart terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,智能终端振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, smart terminals vibrate, and indicator lights flash.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,G.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, G.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
2D图形引擎是2D绘图的绘图引擎。The 2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
图1B所示的软件系统涉及到使用分享能力的应用呈现(如图库,文件管理器),提供分享能力的即时分享模块,提供打印能力的打印服务(print service)和打印后台服务(print spooler),以及应用框架层提供打印框架、WLAN服务、蓝牙服务,以及内核和底层提供WLAN蓝牙能力和基本通信协议。The software system shown in Figure 1B involves the presentation of applications that use sharing capabilities (such as gallery and file manager), instant sharing modules that provide sharing capabilities, and print services and print spooler that provide printing capabilities. , And the application framework layer provides printing framework, WLAN service, Bluetooth service, and the core and bottom layer provide WLAN Bluetooth capabilities and basic communication protocols.
下面结合一种设置哼唱识别权限的场景,示例性说明智能终端100软件以及硬件的工作流程。In the following, in conjunction with a scenario of setting the humming recognition authority, the workflow of the software and hardware of the smart terminal 100 is exemplified.
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操 作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸触摸操作,该触摸操作所对应的控件为哼唱识别功能的开关控件为例,哼唱识别应用调用应用框架层的接口,启动哼唱识别应用,进而通过调用内核层启动麦克风驱动,通过麦克风170C采集外部环境中的声音。When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch touch operation, and the control corresponding to the touch operation is the switch control of the humming recognition function as an example, the humming recognition application calls the interface of the application framework layer, starts the humming recognition application, and then starts by calling the kernel layer The microphone is driven, and the sound in the external environment is collected through the microphone 170C.
图1C示例性示出了本申请实施例提供的智能家居设备110的结构示意图。FIG. 1C exemplarily shows a schematic structural diagram of a smart home device 110 provided in an embodiment of the present application.
举例而言,该智能家居设备可以是智能音箱或者智能电视等设备。如图1C所示,智能家居设备110可包括处理器102、存储器103、无线通信处理模块104、电源开关105、RJ45通信处理模块106、USB接口模块107、音频输入模块108和音频输出模块109。这些部件可以通过总线连接。其中:For example, the smart home device may be a device such as a smart speaker or a smart TV. As shown in FIG. 1C, the smart home device 110 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, an RJ45 communication processing module 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:
处理器102可用于读取和执行计算机可读指令。具体实现中,处理器102可主要包括控制器、运算器和寄存器。其中,控制器主要负责指令译码,并为指令对应的操作发出控制信号。运算器主要负责执行定点或浮点算数运算操作、移位操作以及逻辑操作等,也可以执行地址运算和转换。寄存器主要负责保存指令执行过程中临时存放的寄存器操作数和中间操作结果等。具体实现中,处理器102的硬件架构可以是专用集成电路(Application Specific Integrated Circuits,ASIC)架构、MIPS架构、ARM架构或者NP架构等等。The processor 102 can be used to read and execute computer readable instructions. In specific implementation, the processor 102 may mainly include a controller, an arithmetic unit, and a register. Among them, the controller is mainly responsible for instruction decoding, and sends out control signals for the operation corresponding to the instruction. The arithmetic unit is mainly responsible for performing fixed-point or floating-point arithmetic operations, shift operations and logical operations, etc., and can also perform address operations and conversions. The register is mainly responsible for storing the register operands and intermediate operation results temporarily stored during the execution of the instruction. In a specific implementation, the hardware architecture of the processor 102 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, etc.
在一些实施例中,处理器102可以用于解析无线通信处理模块104接收到的信号,例如,智能终端100发送的修改设置信息的请求,音乐识别服务器发送的识别出的音频文件以及用于指示起始播放位置的指示信息等等。处理器102可以用于根据解析结果进行相应的处理操作,如根据请求修改智能家居设备110的设置信息,又如从播放位置播放识别出的音频文件等等。In some embodiments, the processor 102 may be used to parse the signal received by the wireless communication processing module 104, for example, a request to modify setting information sent by the smart terminal 100, the recognized audio file sent by the music recognition server, and instructions Indication information of the starting playback position, etc. The processor 102 may be configured to perform corresponding processing operations according to the analysis result, such as modifying the setting information of the smart home device 110 according to the request, or playing the recognized audio file from the playback position, and so on.
在一些实施例中,处理器102还可以用于处理智能家居设备110采集到的外部环境中的声音。例如,处理器102可以提取该声音中的声纹信息,若所述处理器102判定所述声音的声纹信息与预存的声纹信息一致,则通过无线通信模块104将该包含该声音的第一音频文件发送给音乐识别服务器。In some embodiments, the processor 102 may also be used to process sounds in the external environment collected by the smart home device 110. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.
在一些实施例中,处理器102还可以用于生成无线通信处理模块104向外发送的信号,如向智能终端100发送的用于反馈识别状态(如识别成功、识别失败等)的信号。In some embodiments, the processor 102 may also be used to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the smart terminal 100 for feedback of the recognition status (such as successful recognition, recognition failure, etc.).
存储器103与处理器102耦合,用于存储各种软件程序和/或多组指令。具体实现中,存储器103可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器103可以存储操作系统,例如DuerOS、AliGenie等嵌入式操作系统。存储器103还可以存储通信程序,该通信程序可用于与智能终端100,一个或多个服务器(例如,音乐识别服务器),或附加设备进行通信。The memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as an embedded operating system such as DuerOS and AliGenie. The memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.
无线通信处理模块104可以包括蓝牙(BT)通信处理模块104A、WLAN通信处理模块104B中的一项或多项。The wireless communication processing module 104 may include one or more of the Bluetooth (BT) communication processing module 104A and the WLAN communication processing module 104B.
在一些实施例中,蓝牙(BT)通信处理模块、WLAN通信处理模块中的一项或多项可以监听到其他设备(智能终端100)发射的信号,如播放请求、更改设置信息的请求等等,并可以发送响应信号,如请求响应等,使得其他设备(如智能终端100)可以发现智能家居设备110,并与其他设备建立无线通信连接,通过蓝牙或WLAN中的一种或多种无线通信技术与其他设备进行通信。In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover smart home device 110 and establish a wireless communication connection with other devices through one or more of Bluetooth or WLAN wireless communication Technology to communicate with other devices.
在另一些实施例中,蓝牙(BT)通信处理模块、WLAN通信处理模块中的一项或多项也可以发射信号,如广播蓝牙信号、信标信号,使得其他设备(如智能终端100)可以发现智能家居设备110,并与其他设备(如智能终端100)建立无线通信连接,通过蓝牙或WLAN中的一种或多种无线通信技术与其他设备(如智能终端100)进行通信。In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the smart home device 110, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.
无线通信处理模块104还可以包括蜂窝移动通信处理模块(未示出)。蜂窝移动通信处理模块可以通过蜂窝移动通信技术与其他设备(如服务器)进行通信。The wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown). The cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.
电源开关105可用于控制电源向智能家居设备110的供电。The power switch 105 can be used to control the power supply to the smart home device 110.
RJ45通信处理模块106可以用于处理通过RJ45接口接收或发送的数据。RJ45接口主要用来联接modem调制解调器。The RJ45 communication processing module 106 may be used to process data received or sent through the RJ45 interface. RJ45 interface is mainly used to connect modem modem.
USB接口107可用于通过数据线与其他设备(例如,计算机、笔记本电脑等)进行通信。The USB interface 107 can be used to communicate with other devices (for example, a computer, a notebook computer, etc.) through a data cable.
音频输入模块108可用于采集外部环境中的声音,并将该声音转换为电信号。在一种可能的实现方式中,智能家居设备110可以通过音频输入模块108接收用户输入的语音指令,响应于该语音指令,智能家居设备执行该语音指令对应的操作。The audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals. In a possible implementation manner, the smart home device 110 may receive a voice command input by the user through the audio input module 108, and in response to the voice command, the smart home device performs an operation corresponding to the voice command.
音频输出模块109用于将音频电信号转换为声音信号,智能家居设备100可以通过音频输出模块109播放声音信号。The audio output module 109 is used to convert audio electrical signals into sound signals, and the smart home device 100 can play the sound signals through the audio output module 109.
在一种可能的实现方式中,该智能家居设备110还可以包括显示屏110(未示出),显示屏110可用于显示图像,视频等。显示屏110包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,智能家居设备110可以包括1个或N个显示屏110,N为大于1的正整数。In a possible implementation, the smart home device 110 may further include a display screen 110 (not shown), and the display screen 110 may be used to display images, videos, and the like. The display screen 110 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.
可以理解的是,图1C示意的结构并不构成对智能家居设备110的具体限定。在本申请另一些实施例中,智能家居设备110可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in FIG. 1C does not constitute a specific limitation on the smart home device 110. In other embodiments of the present application, the smart home device 110 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
图1D示例性示出了本申请提供的车载设备120的结构示意图。FIG. 1D exemplarily shows a schematic structural diagram of a vehicle-mounted device 120 provided in the present application.
举例而言,该车载设备可以是车载音箱或者车载电脑等设备。如图1C所示,车载设备120可包括处理器102、存储器103、无线通信处理模块104、电源开关105、显示屏106、USB接口模块107、音频输入模块108和音频输出模块109。这些部件可以通过总线连接。其中:For example, the vehicle-mounted device may be a vehicle-mounted speaker or a vehicle-mounted computer. As shown in FIG. 1C, the vehicle-mounted device 120 may include a processor 102, a memory 103, a wireless communication processing module 104, a power switch 105, a display screen 106, a USB interface module 107, an audio input module 108, and an audio output module 109. These components can be connected via a bus. among them:
处理器102可用于读取和执行计算机可读指令。具体实现中,处理器102可主要包括控制器、运算器和寄存器。其中,控制器主要负责指令译码,并为指令对应的操作发出控制信号。运算器主要负责执行定点或浮点算数运算操作、移位操作以及逻辑操作等,也可以执行地址运算和转换。寄存器主要负责保存指令执行过程中临时存放的寄存器操作数和中间操作结果等。具体实现中,处理器102的硬件架构可以是专用集成电路(Application Specific Integrated Circuits,ASIC)架构、MIPS架构、ARM架构或者NP架构等等。The processor 102 can be used to read and execute computer readable instructions. In specific implementation, the processor 102 may mainly include a controller, an arithmetic unit, and a register. Among them, the controller is mainly responsible for instruction decoding, and sends out control signals for the operation corresponding to the instruction. The arithmetic unit is mainly responsible for performing fixed-point or floating-point arithmetic operations, shift operations and logical operations, etc., and can also perform address operations and conversions. The register is mainly responsible for storing the register operands and intermediate operation results temporarily stored during the execution of the instruction. In a specific implementation, the hardware architecture of the processor 102 may be an application specific integrated circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, or an NP architecture, etc.
在一些实施例中,处理器102可以用于解析无线通信处理模块104接收到的信号,例如,智能终端100发送的修改设置信息的请求,音乐识别服务器发送的识别出的音频文件以及用 于指示起始播放位置的指示信息等等。处理器102可以用于根据解析结果进行相应的处理操作,如根据请求修改智能家居设备110的设置信息,又如从播放位置播放识别出的音频文件等等。In some embodiments, the processor 102 may be used to parse the signal received by the wireless communication processing module 104, for example, a request to modify setting information sent by the smart terminal 100, the recognized audio file sent by the music recognition server, and instructions Indication information of the starting playback position, etc. The processor 102 may be configured to perform corresponding processing operations according to the analysis result, such as modifying the setting information of the smart home device 110 according to the request, or playing the recognized audio file from the playback position, and so on.
在一些实施例中,处理器102还可以用于处理车载设备120采集到的外部环境中的声音。例如,处理器102可以提取该声音中的声纹信息,若所述处理器102判定所述声音的声纹信息与预存的声纹信息一致,则通过无线通信模块104将该包含该声音的第一音频文件发送给音乐识别服务器。In some embodiments, the processor 102 may also be used to process the sound in the external environment collected by the vehicle-mounted device 120. For example, the processor 102 may extract the voiceprint information of the sound. If the processor 102 determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the wireless communication module 104 will use the wireless communication module 104 to extract the voiceprint information of the voice. An audio file is sent to the music recognition server.
在一些实施例中,处理器102还可以用于生成无线通信处理模块104向外发送的信号,如向智能终端100发送的用于反馈识别状态(如识别成功、识别失败等)的信号。In some embodiments, the processor 102 may also be used to generate a signal sent by the wireless communication processing module 104, such as a signal sent to the smart terminal 100 for feedback of the recognition status (such as successful recognition, recognition failure, etc.).
存储器103与处理器102耦合,用于存储各种软件程序和/或多组指令。具体实现中,存储器103可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器103可以存储操作系统,例如uCLinux、GENIVI、ecos等嵌入式操作系统。存储器103还可以存储通信程序,该通信程序可用于与智能终端100,一个或多个服务器(例如,音乐识别服务器),或附加设备进行通信。The memory 103 is coupled with the processor 102, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 103 may store an operating system, such as embedded operating systems such as uCLinux, GENIVI, and ecos. The memory 103 may also store a communication program, which may be used to communicate with the smart terminal 100, one or more servers (for example, a music recognition server), or additional devices.
无线通信处理模块104可以包括蓝牙(BT)通信处理模块104A、WLAN通信处理模块104B中的一项或多项。The wireless communication processing module 104 may include one or more of the Bluetooth (BT) communication processing module 104A and the WLAN communication processing module 104B.
在一些实施例中,蓝牙(BT)通信处理模块、WLAN通信处理模块中的一项或多项可以监听到其他设备(智能终端100)发射的信号,如播放请求、更改设置信息的请求等等,并可以发送响应信号,如请求响应等,使得其他设备(如智能终端100)可以发现车载设备120,并与其他设备建立无线通信连接,通过蓝牙或WLAN中的一种或多种无线通信技术与其他设备进行通信。In some embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can monitor signals transmitted by other devices (smart terminal 100), such as a playback request, a request to change setting information, etc. , And can send response signals, such as request response, so that other devices (such as smart terminal 100) can discover the vehicle-mounted device 120 and establish a wireless communication connection with other devices through one or more wireless communication technologies in Bluetooth or WLAN Communicate with other devices.
在另一些实施例中,蓝牙(BT)通信处理模块、WLAN通信处理模块中的一项或多项也可以发射信号,如广播蓝牙信号、信标信号,使得其他设备(如智能终端100)可以发现车载设备120,并与其他设备(如智能终端100)建立无线通信连接,通过蓝牙或WLAN中的一种或多种无线通信技术与其他设备(如智能终端100)进行通信。In other embodiments, one or more of the Bluetooth (BT) communication processing module and the WLAN communication processing module can also transmit signals, such as broadcast Bluetooth signals and beacon signals, so that other devices (such as the smart terminal 100) can Discover the in-vehicle device 120, establish a wireless communication connection with other devices (such as the smart terminal 100), and communicate with other devices (such as the smart terminal 100) through one or more wireless communication technologies in Bluetooth or WLAN.
无线通信处理模块104还可以包括蜂窝移动通信处理模块(未示出)。蜂窝移动通信处理模块可以通过蜂窝移动通信技术与其他设备(如服务器)进行通信。The wireless communication processing module 104 may also include a cellular mobile communication processing module (not shown). The cellular mobile communication processing module can communicate with other devices (such as servers) through cellular mobile communication technology.
电源开关105可用于控制电源向车载设备120的供电。The power switch 105 can be used to control the power supply to the vehicle-mounted device 120 from the power source.
显示屏110可用于显示图像,视频等。显示屏110包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,智能家居设备110可以包括1个或N个显示屏110,N为大于1的正整数。The display screen 110 can be used to display images, videos, etc. The display screen 110 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the smart home device 110 may include 1 or N display screens 110, and N is a positive integer greater than 1.
USB接口107可用于通过数据线与其他设备例如显示器、智能终端100或者音频外放设备进行通信。The USB interface 107 can be used to communicate with other devices such as a display, the smart terminal 100 or an audio external device through a data line.
音频输入模块108可用于采集外部环境中的声音,并将该声音转换为电信号。在一种可能的实现方式中,车载设备120可以通过音频输入模块108接收用户输入的语音指令,响应于该语音指令,车载设备执行该语音指令对应的操作。The audio input module 108 can be used to collect sounds in the external environment and convert the sounds into electrical signals. In a possible implementation manner, the in-vehicle device 120 may receive a voice instruction input by the user through the audio input module 108, and in response to the voice instruction, the in-vehicle device performs an operation corresponding to the voice instruction.
音频输出模块109用于将音频电信号转换为声音信号,车载设备120可以通过音频输出 模块109播放声音信号。The audio output module 109 is used to convert audio electrical signals into sound signals, and the vehicle-mounted device 120 can play the sound signals through the audio output module 109.
在一些实施例中,车载设备120还可以包括RS-232接口等串行接口。该串行接口可连接至其他设备,如音箱等音频外放设备,使得音频外放设备协作播放识别出的音频文件。In some embodiments, the in-vehicle device 120 may also include a serial interface such as an RS-232 interface. The serial interface can be connected to other devices, such as speakers and other audio playback devices, so that the audio playback devices cooperate to play the recognized audio files.
可以理解的是,图1C示意的结构并不构成对车载设备120的具体限定。在本申请另一些实施例中,车载设备120可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in FIG. 1C does not constitute a specific limitation on the in-vehicle device 120. In other embodiments of the present application, the in-vehicle device 120 may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
下面介绍智能终端100上的用于显示应用程序菜单的示例性用户界面。The following describes an exemplary user interface on the smart terminal 100 for displaying application menus.
图2示例性示出了智能终端100用于显示应用程序菜单的用户界面21。如图2所示,用户界面21可包括:状态栏201,具有常用应用程序图标的托盘217,日历小工具213,天气小工具215,以及其他应用程序图标。其中:FIG. 2 exemplarily shows a user interface 21 of the smart terminal 100 for displaying an application menu. As shown in FIG. 2, the user interface 21 may include: a status bar 201, a tray 217 with icons of commonly used applications, a calendar widget 213, a weather widget 215, and other application icons. among them:
状态栏201可包括:移动通信信号(又可称为蜂窝信号)的一个或多个信号强度指示符203、无线高保真(wireless fidelity,Wi-Fi)信号的一个或多个信号强度指示符205,电池状态指示符209、时间指示符211。The status bar 201 may include: one or more signal strength indicators 203 for mobile communication signals (also called cellular signals), one or more signal strength indicators 205 for wireless fidelity (Wi-Fi) signals , Battery status indicator 209, time indicator 211.
日历小工具213可用于指示当前时间,例如日期、星期几、时分信息等。The calendar widget 213 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.
天气小工具215可用于指示天气类型,例如多云转晴、小雨等,还可以用于指示气温等信息。The weather widget 215 can be used to indicate the type of weather, such as cloudy to clear, light rain, etc., and can also be used to indicate information such as temperature.
具有常用应用程序图标的托盘217可展示:电话图标219、联系人图标221、短信图标223、相机图标225。The tray 217 with icons of commonly used application programs can display: a phone icon 219, a contact icon 221, a short message icon 223, and a camera icon 225.
其他应用程序图标可例如:微信(Wechat)的图标227、QQ的图标229、推特(Twitter)的图标231、脸书(Facebook)的图标233、邮箱的图标235、云共享的图标237、备忘录的图标239、支付宝的图标221、图库的图标225、设置的图标227。用户界面21还可包括页面指示符229。其他应用程序图标可分布在其他页面,页面指示符229可用于指示页面数量,以及用户当前浏览的是哪一个页面,比如页面指示符229显示3个小圆点,并且第2个小圆点是黑色的,另外两个小圆点是白色的,表明当前手机包括3个页面,且用户正在浏览第2个页面。此外,用户可以在当前页面左右滑动,来浏览其他页面中的应用程序图标。在一些实施例中,图2示例性所示的用户界面21可以为主界面(Home screen)中的一个用户界面。Other application icons can be for example: Wechat icon 227, QQ icon 229, Twitter icon 231, Facebook icon 233, mailbox icon 235, cloud sharing icon 237, memo The icon 239 of, the icon 221 of Alipay, the icon 225 of gallery, and the icon 227 of settings. The user interface 21 may also include a page indicator 229. Other application icons can be distributed on other pages. The page indicator 229 can be used to indicate the number of pages and which page the user is currently browsing. For example, the page indicator 229 displays 3 small dots, and the second dot is Black, the other two small dots are white, indicating that the current mobile phone includes 3 pages, and the user is browsing the second page. In addition, users can swipe left and right on the current page to browse application icons on other pages. In some embodiments, the user interface 21 exemplarily shown in FIG. 2 may be a user interface in the home screen.
在其他一些实施例中,智能终端100还可以包括主屏幕键。该主屏幕键可以是实体按键,也可以是虚拟按键。该主屏幕键可用于接收用户的指令,响应于该用户的指令,将当前显示的UI返回到主界面,这样可以方便用户随时查看主屏幕。上述指令具体可以是用户单次按下主屏幕键的操作指令,也可以是用户在短时间内连续两次按下主屏幕键的操作指令,还可以是用户在预定时间内长按主屏幕键的操作指令。在本申请其他一些实施例中,主屏幕键还可以集成指纹识别器,以便用于在按下主屏幕键的时候,随之进行指纹采集和识别。In some other embodiments, the smart terminal 100 may also include a home screen key. The main screen key can be a physical key or a virtual key. The home screen key can be used to receive a user's instruction, and in response to the user's instruction, return the currently displayed UI to the main interface, so that the user can view the home screen at any time. The above instruction can be an operation instruction for the user to press the home screen key once, or an operation instruction for the user to press the home screen key twice in a short period of time, or the user long press the home screen key within a predetermined time Operation instructions. In some other embodiments of the present application, the home screen key can also be integrated with a fingerprint recognizer, so that when the home screen key is pressed, fingerprints are collected and recognized.
可以理解的是,图2仅仅示例性示出了智能终端100上的用户界面,不应构成对本申请实施例的限定。It is understandable that FIG. 2 only exemplarily shows the user interface on the smart terminal 100, and should not constitute a limitation to the embodiment of the present application.
接下来,将介绍本申请实施例提供的一些哼唱识别的实施例。Next, some embodiments of humming recognition provided in the embodiments of the present application will be introduced.
在本申请实施例中,智能终端100可以跟随用户的哼唱进度播放该识别出的音频文件,另外,智能终端100可以通过显示屏194显示识别结果。该识别结果可能在智能终端100处于使用状态下显示,还可能在智能终端100处于锁定状态下显示,以下将这种两种应用场景 下的实施例作进一步的介绍。需要说明的是,在本申请实施例中,智能终端100处于使用状态表示智能终端100正在被用户使用,智能终端100的显示屏194处于常亮状态,该显示屏194可以显示桌面,应用程序界面,下拉通知栏,负一屏等用户界面。智能终端100处于锁定状态表示智能终端的屏幕被锁定,大多数情况下,智能终端100锁定后需要接收用户输入的密码或验证其他解锁方式(例如,指纹解锁,面部解锁等等)才能解除锁定。通常情况下,用户可以通过单击智能终端100的电源键,或者点击“锁定屏幕”的虚拟控件,使智能终端100熄屏,并且进入锁定状态。另外,锁屏界面是指智能终端100进入锁定状态后,未解除锁定之前,智能终端100显示的用户界面。智能终端100处于锁定状态的过程中,智能终端可以显示锁屏界面,还可以为熄屏(或称为黑屏)状态。In the embodiment of the present application, the smart terminal 100 can play the recognized audio file following the user's humming progress. In addition, the smart terminal 100 can display the recognition result through the display screen 194. The recognition result may be displayed when the smart terminal 100 is in use, or may be displayed when the smart terminal 100 is in a locked state. The following examples of these two application scenarios will be further introduced. It should be noted that in the embodiment of the present application, the smart terminal 100 is in use, which means that the smart terminal 100 is being used by the user, and the display screen 194 of the smart terminal 100 is always on. The display screen 194 can display the desktop and the application program interface. , Pull down notification bar, negative one screen and other user interfaces. When the smart terminal 100 is in the locked state, it means that the screen of the smart terminal is locked. In most cases, the smart terminal 100 needs to receive a password input by the user or verify other unlocking methods (for example, fingerprint unlocking, face unlocking, etc.) before unlocking. Generally, the user can turn off the screen of the smart terminal 100 and enter the locked state by clicking the power button of the smart terminal 100 or clicking the virtual control of "lock screen". In addition, the lock screen interface refers to a user interface displayed by the smart terminal 100 after the smart terminal 100 enters the locked state and before the lock is unlocked. While the smart terminal 100 is in the locked state, the smart terminal may display a lock screen interface, or it may be in the off-screen (or referred to as black screen) state.
首先对智能终端100处于使用状态下显示识别结果的实施例进行介绍。First, an embodiment of displaying the recognition result when the smart terminal 100 is in use is introduced.
图3A示例性示出了在智能终端100的应用程序的使用界面中,显示识别结果的用户界面31。需要说明的是,执行哼唱识别操作的应用程序与该处于使用过程中应用程序可以是同一个应用程序,也可以是不同的应用程序,本申请实施例不作限制。另外,本申请对该处于使用过程的应用程序不作限制,可以为微信、QQ、微博、邮箱等应用程序,示例性的,图3A中以微信使用过程中的聊天界面为例。如图3A所示,用户界面31可包括:显示区域318,输入区域319,通知窗口315。其中:FIG. 3A exemplarily shows the user interface 31 displaying the recognition result in the use interface of the application program of the smart terminal 100. It should be noted that the application program that performs the humming recognition operation and the application program in use may be the same application program or different application programs, which is not limited in the embodiment of the present application. In addition, this application does not limit the application in use, which can be WeChat, QQ, Weibo, mailbox and other applications. Illustratively, the chat interface during the use of WeChat is taken as an example in FIG. 3A. As shown in FIG. 3A, the user interface 31 may include: a display area 318, an input area 319, and a notification window 315. among them:
显示区域318,可以用于显示聊天内容,该聊天内容可包括使用本智能终端100的用户与另一社交账号的用户的文字/语音的交流内容。The display area 318 may be used to display chat content, and the chat content may include text/voice communication content between the user using the smart terminal 100 and the user of another social account.
输入区域319,可以用于输入聊天内容,输入区域319可以包括第一控件319A,第二控件319B,第三控件319C,第四控件319D。其中,该第一控件319A,用于接收用户的操作,响应于该用户的操作,智能终端100显示语音输入按钮,通常,用户可以通过长按该语音输入按钮以录入语音信息。需要说明的是,当该语音输入按钮接收到用户的操作时,智能终端100需采集用户输入的语音信息,音频输入模块将被该社交应用程序的语音输入服务占用,智能终端100不执行本申请实施例提供的哼唱识别操作。第二控件319B,用于接收用户的操作,响应于该用户的操作,智能终端100显示键盘/手写板,通常,智能终端100可以通过键盘/手写板接收用户输入的文字信息。第三控件319C,用于接收用户的操作,响应于该用户的操作,智能终端100显示多个表情图案/动图供用户进行选择。第四控件319D,用于接收用户的操作,响应于该用户的操作,智能终端100显示多个输入类型的选择框,例如,图片,拍摄,文档,红包,视频通话等等,供用户进行选择。相似的,当“拍摄”或者“视频通话”选择框接收到用户的操作时,智能终端100需采集用户输入的音视频信息,音频输入模块和/或音频输出模块将被该社交应用程序的音视频输入服务占用,智能终端100不执行本申请实施例提供的哼唱识别操作。The input area 319 can be used to input chat content. The input area 319 can include a first control 319A, a second control 319B, a third control 319C, and a fourth control 319D. The first control 319A is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays a voice input button. Generally, the user can input voice information by long pressing the voice input button. It should be noted that when the voice input button receives the user's operation, the smart terminal 100 needs to collect the voice information input by the user. The audio input module will be occupied by the voice input service of the social application, and the smart terminal 100 will not execute this application. The humming recognition operation provided by the embodiment. The second control 319B is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays a keyboard/handwriting pad. Generally, the smart terminal 100 can receive text information input by the user through the keyboard/handwriting pad. The third control 319C is used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a plurality of emoticons/motion pictures for the user to select. The fourth control 319D is used to receive a user's operation. In response to the user's operation, the smart terminal 100 displays multiple input type selection boxes, such as pictures, shooting, documents, red envelopes, video calls, etc., for the user to choose . Similarly, when the "shooting" or "video call" selection box receives the user's operation, the smart terminal 100 needs to collect the audio and video information input by the user, and the audio input module and/or audio output module will be used by the social application. The video input service is occupied, and the smart terminal 100 does not perform the humming recognition operation provided in the embodiment of the present application.
通知窗口315,用于显示对用户哼唱的音乐片段的识别结果,通知窗口315可包括:哼唱识别图标316,第一显示区域314,播放控件310,控件312。The notification window 315 is used to display the recognition result of the music segment hummed by the user. The notification window 315 may include: a humming recognition icon 316, a first display area 314, a playback control 310, and a control 312.
其中,哼唱识别图标316用于指示通知窗口315的来源,为了方便用户快捷的了解到通知窗口315是哼唱识别服务(或者称为功能、应用程序)输出的识别结果。需要说明的是,哼唱识别图标316仅为示例图标,在具体的实施过程中,哼唱识别图标还可以为其他的图案,例如,音符或者其他样式的图标,本申请实施例不作限制。Among them, the humming recognition icon 316 is used to indicate the source of the notification window 315, in order to facilitate the user to quickly understand that the notification window 315 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 316 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.
第一显示区域314,可以用于显示识别出的音频文件的标识信息,能够为用户提供更多的有关于该识别出的音频文件信息。其中,该音频文件的标识信息可以为音频文件的歌名, 歌词,演唱者名字,专辑名称,专辑封面图片,演唱者海报,等等。如图所示,第一显示区域314中包含了歌曲的名称《漂洋过海来看你》)。第一显示区域314还可以包括操作指示信息(例如,第一显示区域314中包含的“可单击停止播放”),可为用户提供操作的提醒,提升用户操作的便利性。可选的,第一显示区域314还可以包含演唱者信息或者当前播放的音频文件的歌词信息。在又一种可能的情况下,该智能终端100还可以以悬浮窗的形式显示当前播放的音频文件的歌词信息。该悬浮窗是在智能终端100的显示界面中悬浮显示的一个可以移动的窗口。The first display area 314 can be used to display the identification information of the recognized audio file, and can provide the user with more information about the recognized audio file. Wherein, the identification information of the audio file may be the song name, lyrics, artist name, album name, album cover picture, artist poster, etc. of the audio file. As shown in the figure, the first display area 314 contains the name of the song "Across the Ocean to See You"). The first display area 314 may also include operation instruction information (for example, "click to stop playback" contained in the first display area 314), which may provide the user with operation reminders and improve the convenience of user operations. Optionally, the first display area 314 may also contain singer information or lyrics information of the currently played audio file. In another possible situation, the smart terminal 100 may also display the lyrics information of the currently played audio file in the form of a floating window. The floating window is a movable window displayed floating in the display interface of the smart terminal 100.
播放控件310,可以用于接收用户的操作,响应于该用户的操作,该智能终端100暂停播放或者继续播放音频文件。具体的,该智能终端100识别出用户哼唱的音乐片段对应的音频文件后,将跟随用户的演唱进度播放该音频文件,此时的播放控件310显示第一状态,该第一状态表示该音频文件正在被播放。可选的,在识别出的音频文件的播放过程中,智能终端100不再执行本申请实施例提供的哼唱识别的操作。在播放控件310显示为第一状态的情况下,若播放控件310接收到用户的操作,则该智能终端100暂停播放该音频文件,并将播放控件310显示为第二状态,该第二状态表示该音频文件暂停播放。可以理解的是,在播放310显示为第二状态的情况下,若播放控件310接收到用户的操作,则该智能终端100继续播放该音频文件,并将播放控件310显示为第一状态。The play control 310 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 pauses or continues to play the audio file. Specifically, after the smart terminal 100 recognizes the audio file corresponding to the music clip hummed by the user, it will play the audio file following the user’s singing progress. At this time, the playback control 310 displays the first state, which indicates the audio The file is being played. Optionally, during the playback of the recognized audio file, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application. When the play control 310 is displayed in the first state, if the play control 310 receives a user's operation, the smart terminal 100 pauses playing the audio file and displays the play control 310 in the second state, which indicates The audio file is paused. It is understandable that when the player 310 is displayed in the second state, if the player control 310 receives a user's operation, the smart terminal 100 continues to play the audio file and displays the player control 310 in the first state.
控件312,可以用于接收用户的操作,响应于该用户的操作,智能终端100暂停播放音频文件重新获取用户的声音信号,并对重新获取的声音信号进行哼唱识别。在一种可能的情况下,在重新获取声音信号的过程中,智能终端100可以显示提示信息(例如,“正在识别中……”),该提示信息用于指示该智能终端100正在重新获取声音信号以进行哼唱识别。在又一种可能的情况下,响应于该用户针对控件312的操作,智能终端100暂停播放音频文件,并跳转到用于显示哼唱识别的用户界面35。该用户界面35在后续会作具体的介绍,此处不展开说明。The control 312 may be used to receive a user's operation. In response to the user's operation, the smart terminal 100 pauses playing the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal. In a possible situation, in the process of reacquiring the sound signal, the smart terminal 100 may display a prompt message (for example, "recognizing..."), which is used to indicate that the smart terminal 100 is reacquiring the sound. Signal for humming recognition. In another possible situation, in response to the user's operation on the control 312, the smart terminal 100 pauses playing the audio file and jumps to the user interface 35 for displaying humming recognition. The user interface 35 will be specifically introduced later, and the description will not be expanded here.
在一种可能的实现方式中,通知窗口315在显示预设时间后消失,该预设时间可以为4秒,5秒等时间值。或者,当通知窗口315接收到用户的向上滑动的操作时,响应于该操作,智能终端100不再在用户界面31中显示通知窗口315。In a possible implementation manner, the notification window 315 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, and so on. Or, when the notification window 315 receives the user's upward sliding operation, the smart terminal 100 no longer displays the notification window 315 in the user interface 31 in response to the operation.
在一种可能的实现方式中,通知窗口315还可以在下拉通知栏中显示,这种显示方式可参见图3B。如图3B所示,当检测到在状态栏201上的向下滑动手势时,响应于该手势,智能终端100可以在用户界面21上显示下拉通知栏318,下拉通知栏318中包括通知窗口315和控制窗口313,其中:In a possible implementation manner, the notification window 315 may also be displayed in a drop-down notification bar. For this display manner, refer to FIG. 3B. As shown in FIG. 3B, when a downward sliding gesture on the status bar 201 is detected, in response to the gesture, the smart terminal 100 may display a pull-down notification bar 318 on the user interface 21, and the pull-down notification bar 318 includes a notification window 315. And control window 313, where:
通知窗口315可参照图3A中的描述,此处不再赘述。For the notification window 315, refer to the description in FIG. 3A, which will not be repeated here.
控制窗口313,可以显示多个开关控件,例如显示“哼唱识别”的开关控件317,还可以显示有其他功能(如Wi-Fi、蓝牙、手电筒等等)的开关控件。控制窗口313将在后续介绍哼唱识别的设置界面中作详细的介绍,此处不具体展开。The control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.). The control window 313 will be described in detail in the subsequent introduction to the setting interface of humming recognition, and will not be specifically expanded here.
在一种可能的情况下,在哼唱识别功能开启的过程中,状态栏201中显示哼唱识别的图标311。可以理解的是,在智能终端100的多个显示界面中均可包括状态栏201。通过这种方式,可以方便用户通过智能终端100的多个显示界面知晓哼唱识别功能的开启状态。In a possible situation, in the process of turning on the humming recognition function, an icon 311 for humming recognition is displayed in the status bar 201. It can be understood that the status bar 201 may be included in multiple display interfaces of the smart terminal 100. In this way, it is convenient for the user to know the on state of the humming recognition function through multiple display interfaces of the smart terminal 100.
接下来对智能终端100处于锁定状态下显示识别结果的实施例进行介绍。Next, an embodiment of displaying the recognition result when the smart terminal 100 is in the locked state is introduced.
图3C示例性示出了智能终端100处于锁定状态下显示的用户界面32,用户界面32也可 以称为锁屏界面。如图3C所示,用户界面32包括状态栏201,日历小工具213以及锁屏壁纸523。其中:Fig. 3C exemplarily shows the user interface 32 displayed when the smart terminal 100 is in the locked state. The user interface 32 may also be referred to as a lock screen interface. As shown in FIG. 3C, the user interface 32 includes a status bar 201, a calendar widget 213, and a lock screen wallpaper 523. among them:
状态栏201可参照图2中的描述,此处不再赘述。特别的,此处的状态栏201包括了哼唱识别图标311以及锁定图标323,哼唱识别图标311用于指示哼唱识别功能处于开启状态,锁定图标323用于指示智能终端100处于锁定状态。The status bar 201 can refer to the description in FIG. 2, which will not be repeated here. In particular, the status bar 201 here includes a humming recognition icon 311 and a lock icon 323. The humming recognition icon 311 is used to indicate that the humming recognition function is on, and the lock icon 323 is used to indicate that the smart terminal 100 is in a locked state.
日历小工具213可参照图2中的描述,此处不再赘述。可选的,用户界面32还可以包括天气小工具215。The calendar widget 213 can refer to the description in FIG. 2, which will not be repeated here. Optionally, the user interface 32 may also include a weather widget 215.
锁屏壁纸523可以为用户设定的图片,也可以为智能终端100预设的图片,或者为智能终端100从网络中下载的图片。The lock screen wallpaper 523 may be a picture set by the user, or a picture preset by the smart terminal 100, or a picture downloaded by the smart terminal 100 from the network.
图3D示例性示出了又一种显示识别结果的用户界面32。FIG. 3D exemplarily shows yet another user interface 32 displaying the recognition result.
如图3D所示,在智能终端处于锁定状态的情况下,当智能终端100识别出用户哼唱的音乐片段的音频文件时,智能终端100在用户界面32上方显示通知窗口324,通知窗口324可包括:哼唱识别图标316,第二显示区域322,播放控件310,控件312,音量控件328。As shown in FIG. 3D, when the smart terminal is in the locked state, when the smart terminal 100 recognizes the audio file of the music clip hummed by the user, the smart terminal 100 displays a notification window 324 above the user interface 32, and the notification window 324 can be Including: humming recognition icon 316, second display area 322, playback control 310, control 312, and volume control 328.
其中,哼唱识别图标316,播放控件310以及控件312可参照图3A中的相关描述,此处不再赘述。For the humming recognition icon 316, the playback control 310, and the control 312, please refer to the related description in FIG. 3A, which will not be repeated here.
第二显示区域322与图3A中的第一显示区域314的作用相同,均可以显示识别出的音频文件的标识信息。不同的是,此处的第二显示区域322中不仅包括了音频文件的名称《漂洋过海来看你》,还包括了音频文件的演唱者“李宗盛”,以及当前播放的音频文件的歌词信息“陌生的城市啊,熟悉的角落里……”,其中歌词信息中加粗的部分“不管将会面对”为用户当前演唱的歌词部分。可以理解的是,该歌词信息会随着音频文件的播放进度而改变,以使得歌词信息与音频文件的播放保持同步。The second display area 322 has the same function as the first display area 314 in FIG. 3A, and both can display the identification information of the recognized audio file. The difference is that the second display area 322 here not only includes the name of the audio file "Across the Ocean to See You", but also includes the singer of the audio file "Li Zongsheng", and the lyrics of the currently playing audio file. The message "A strange city, in a familiar corner...", in which the bold part of the lyrics message "No matter what you will face" is the lyrics part of the user's current singing. It is understandable that the lyric information will change with the playing progress of the audio file, so that the lyric information and the playing of the audio file are kept synchronized.
在一种可能的实现方式中,第二音频文件的标签包含于第一用户的用户标签,该第一用户的含义可参照上述的介绍。不同的智能终端对同一首歌曲的哼唱片段的识别结果可能不同,例如,用户1在演唱《漂洋过海来看你》的过程中,用户1的智能终端识别出的音频文件可能是李宗盛演唱的版本;用户2在演唱《漂洋过海来看你》的过程中,用户2的智能终端识别出的音频文件可能是梁静茹演唱的版本。可以理解的是,识别结果不同是由于用户1和用户2的用户标签不同。In a possible implementation manner, the tag of the second audio file is included in the user tag of the first user, and the meaning of the first user can refer to the above introduction. Different smart terminals may have different recognition results for the humming section of the same song. For example, when user 1 is singing "Across the Ocean to See You", the audio file recognized by the smart terminal of user 1 may be Li Zongsheng Singing version; while user 2 is singing "Across the Ocean to See You", the audio file recognized by user 2's smart terminal may be the version sung by Jingru Liang. It is understandable that the different recognition results are due to the different user tags of user 1 and user 2.
音量控件328,可以用于调整播放音频文件的音量。音量控件328可以用于接收用户的操作,响应于该操作,智能终端100调整播放音频文件的音量。可选的,当接收到的用户操作为向左的滑动时,智能终端100降低播放音频文件的音量;当接收到的用户操作为向右的滑动时,智能终端100提高播放音频文件的音量。在一种可能的实现方式中,音频文件控件328到所在线段的左端点的距离与所在线段的线段长的比值,和当前音量与系统播放的最大音量的比值具有对应关系。The volume control 328 can be used to adjust the volume of the playing audio file. The volume control 328 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 adjusts the volume of the played audio file. Optionally, when the received user operation is sliding to the left, the smart terminal 100 reduces the volume of playing the audio file; when the received user operation is the sliding to the right, the smart terminal 100 increases the volume of playing the audio file. In a possible implementation, the ratio of the distance from the audio file control 328 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.
在一种可能的实现方式中,智能终端100从开始播放音频文件的时刻到预设时刻(例如,第5秒,第6秒等时间值)的时间段内,将使播放音频文件的音量由低到高逐渐增大。例如,从音量的最小值逐渐增大到用户设定的音量值,或者,从用户设定的音量值的30%逐渐增大到用户设定的音量值的100%,还可以存在其他的音量增大方式,本申请实施例不作限制。需要说明的是,用户设定的音量值为音频文件控件328指示的音量值,可选的,离当前最近一次的,用户调整音量的结果为用户设定的音量值。In a possible implementation manner, the smart terminal 100 will make the volume of the played audio file change from the time when it starts to play the audio file to the preset time (for example, the 5th second, the 6th second, etc.) Gradually increase from low to high. For example, gradually increasing from the minimum value of the volume to the volume value set by the user, or gradually increasing from 30% of the volume value set by the user to 100% of the volume value set by the user, there may be other volume levels The increase mode is not limited in the embodiment of this application. It should be noted that the volume value set by the user is the volume value indicated by the audio file control 328. Optionally, the volume value adjusted by the user is the volume value set by the user, which is the most recent one.
在一种可能的实现方式中,通知窗口324在识别出的音频文件播放完毕后消失。In a possible implementation manner, the notification window 324 disappears after the recognized audio file is played.
在一种可能的实现方式中,智能终端100还可以把通知窗口324的内容以一个用户界面的方式显示在用户界面32上方,该用户界面可如图3E的用户界面33所示。在图3E中,用户界面33中包含通知窗口324中包含的内容,例如,哼唱识别图标316,第二显示区域322,播放控件310,控件312,音量控件328。可选的,用户界面32还可以显示背景图片,例如,该背景图片可以是歌曲演唱者的海报,识别出的音频文件所收录的专辑的图片,等等。可选的,当用户界面33接收到用户的操作(例如向右的滑动操作),智能终端100显示用户界面33下方的用户界面32(也即锁屏界面)。In a possible implementation manner, the smart terminal 100 may also display the content of the notification window 324 above the user interface 32 in the form of a user interface, and the user interface may be as shown in the user interface 33 of FIG. 3E. In FIG. 3E, the user interface 33 contains the content contained in the notification window 324, for example, the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328. Optionally, the user interface 32 may also display a background picture. For example, the background picture may be a poster of a song singer, a picture of an album included in the recognized audio file, and so on. Optionally, when the user interface 33 receives a user's operation (for example, a sliding operation to the right), the smart terminal 100 displays the user interface 32 below the user interface 33 (that is, the lock screen interface).
图3F示例性示出了又一种显示识别结果的用户界面34。FIG. 3F exemplarily shows yet another user interface 34 displaying the recognition result.
在一个实施例中,在用户界面31或者用户界面21中,若检测到用户输入的针对通知窗口315的操作(例如,点击操作,长按操作,按压操作,等等),或者,在用户界面32中,若检测到用户输入的针对通知窗口324的操作(例如,点击操作,长按操作,按压操作,等等),则智能终端100显示用户界面34。可选的,在用户界面32跳转到用户界面34之前,智能终端100接收用户输入的解锁操作(例如,指纹解锁,密码解锁,人脸解锁,等等),在解锁成功的情况下,智能终端100执行从用户界面32跳转到用户界面34的操作。In one embodiment, in the user interface 31 or the user interface 21, if a user input operation for the notification window 315 is detected (for example, a click operation, a long press operation, a press operation, etc.), or in the user interface In 32, if an operation (for example, a click operation, a long press operation, a press operation, etc.) input by the user for the notification window 324 is detected, the smart terminal 100 displays the user interface 34. Optionally, before the user interface 32 jumps to the user interface 34, the smart terminal 100 receives an unlocking operation input by the user (for example, fingerprint unlocking, password unlocking, face unlocking, etc.). In the case of successful unlocking, the smart terminal 100 The terminal 100 performs an operation of jumping from the user interface 32 to the user interface 34.
用户界面34中包括:哼唱识别图标316,第二显示区域322,播放控件310,控件312,音量控件328,控件330,控件332,控件334,其中:The user interface 34 includes: a humming recognition icon 316, a second display area 322, a playback control 310, a control 312, a volume control 328, a control 330, a control 332, and a control 334, of which:
其中,哼唱识别图标316,第二显示区域322,播放控件310,控件312,音量控件328可参照上述描述,此处不再赘述。Among them, the humming recognition icon 316, the second display area 322, the playback control 310, the control 312, and the volume control 328 can refer to the above description, and will not be repeated here.
控件330,可以用于收藏识别出的音频文件。控件330可以接收用户的操作,响应于该用户的操作,智能终端100将识别出的音频文件的标识,添加到预设的收藏夹(或者称为“喜欢的音乐”的文件夹,本申请不作限制)中,便于用户下一次查找或播放该识别出的音频文件。The control 330 can be used to collect the recognized audio files. The control 330 can receive a user's operation. In response to the user's operation, the smart terminal 100 adds the identified audio file identifier to a preset favorite (or a folder called "favorite music", which is not included in this application). Restrictions), it is convenient for the user to find or play the identified audio file next time.
控件332,可以用于下载识别出的音频文件。控件332可以接收用户的操作,响应于该用户的操作,智能终端100从网络中下载该识别出的音频文件的音频资源。可选的,响应于该用户的操作,智能终端100显示选择框,该选择框中包含“标准品质”“高品质”“无损品质”等音质选项。该选择框用于接收用户对一个选项的选择操作,响应于该用户对一个选项的选择操作,智能终端100下载该选项对应的音质的音频资源。The control 332 can be used to download the recognized audio file. The control 332 can receive a user's operation, and in response to the user's operation, the smart terminal 100 downloads the audio resource of the identified audio file from the network. Optionally, in response to the user's operation, the smart terminal 100 displays a selection box that contains sound quality options such as "standard quality", "high quality", and "lossless quality". The selection box is used to receive a user's selection operation of an option, and in response to the user's selection operation of an option, the smart terminal 100 downloads audio resources of sound quality corresponding to the option.
控件334,可以用于分享识别出的音频文件。控件334可以接收用户的操作,响应于该用户的操作,智能终端100显示分享框,该分享框中包含多个分享对象,例如,QQ,微信,微博,推特,等等。该分享框用于接收用户对一个分享对象的选择操作,响应于该用户对一个分享对象的选择操作,智能终端100将该音频文件的标识或者音频资源发送给该选择操作对应的分享对象。The control 334 can be used to share the recognized audio file. The control 334 can receive a user's operation, and in response to the user's operation, the smart terminal 100 displays a sharing box, which contains multiple sharing objects, such as QQ, WeChat, Weibo, Twitter, and so on. The sharing box is used to receive a user's selection operation of a sharing object. In response to the user's selection operation of a sharing object, the smart terminal 100 sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.
图3G示例性示出了一种用于哼唱识别的用户界面35。Fig. 3G exemplarily shows a user interface 35 for humming recognition.
在一个实施例中,在用户界面21,用户界面31,用户界面32,用户界面33以及用户界面34中,若检测到用户针对控件312的操作(例如,点击操作),则智能终端100显示用于哼唱识别的用户界面35。In one embodiment, in the user interface 21, the user interface 31, the user interface 32, the user interface 33, and the user interface 34, if a user operation (for example, a click operation) on the control 312 is detected, the smart terminal 100 displays User interface 35 for humming recognition.
用户界面35中包括哼唱识别图标316,指示符350,控件352以及控件354。其中:The user interface 35 includes a humming recognition icon 316, an indicator 350, a control 352, and a control 354. among them:
哼唱识别图标316可参照上述描述,此处不再赘述。The humming recognition icon 316 can refer to the above description, and will not be repeated here.
指示符350,可以指示用户已哼唱的音乐片段的时间信息,该时间信息随着用户哼唱音频文件的时间的增长而改变,与用户哼唱的时间长度保持同步。指示符350还可以指示用户 录入声音信号的操作提示信息(例如,指示符350中包含的“多哼唱几句识别更准确”),可以为用户提供操作的提醒,以便于提升哼唱识别的准确性。该操作指示信息还可以是其他的内容,举例而言,当检测到用户的声音的音量较小时,可显示例如“增大音量(或者靠近设备发声)识别更准确”的操作指示信息。The indicator 350 may indicate the time information of the music segment that the user has hummed. The time information changes as the time the user hums the audio file increases, and is synchronized with the time length of the user humming. The indicator 350 can also instruct the user to enter the operation prompt information of the voice signal (for example, the "more accurate recognition of humming a few sentences" contained in the indicator 350), and can provide the user with operation reminders, so as to improve the recognition of humming accuracy. The operation instruction information may also be other content. For example, when it is detected that the volume of the user's voice is low, the operation instruction information such as "increase the volume (or sound near the device) for more accurate recognition" may be displayed.
控件352,可以用于接收用户的操作(例如,长按操作),响应于该用户的操作,智能终端100通过麦克风170C采集用户的输入的声音信号。当检测到用户的手指离开显示屏194时,智能终端100根据采集到的声音信号进行哼唱识别。可选的,当智能终端100从音乐识别服务器中接收到识别出音频文件时,智能终端100可显示用于显示识别结果的用户界面34。The control 352 may be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C. When detecting that the user's finger leaves the display screen 194, the smart terminal 100 performs humming recognition according to the collected sound signal. Optionally, when the smart terminal 100 receives the recognized audio file from the music recognition server, the smart terminal 100 may display the user interface 34 for displaying the recognition result.
以上介绍了一些在智能终端100中用于显示识别结果以及用于进行哼唱识别的用户界面,在本申请实施例中,在智能终端100可以实施哼唱识别功能之前,用户可以通过智能终端100的设置界面对哼唱识别的功能进行开启或关闭。以下将对一些哼唱识别的设置界面进行介绍。The above introduces some user interfaces for displaying recognition results and performing humming recognition in the smart terminal 100. In this embodiment of the present application, before the smart terminal 100 can implement the humming recognition function, the user can use the smart terminal 100 The setting interface of the humming recognition function can be turned on or off. The following will introduce some setting interfaces of humming recognition.
图4A示例性示出了一种用于设置哼唱识别功能的用户界面41。FIG. 4A exemplarily shows a user interface 41 for setting the humming recognition function.
与上述介绍的显示下拉通知栏318的方式相似,当检测到在状态栏201上的向下滑动手势时,响应于该手势,智能终端100可以在用户界面41上显示下拉通知栏401,下拉通知栏401中包括控制窗口313,其中:Similar to the way of displaying the pull-down notification bar 318 described above, when a downward sliding gesture on the status bar 201 is detected, in response to the gesture, the smart terminal 100 can display the pull-down notification bar 401 on the user interface 41, and the pull-down notification The column 401 includes a control window 313, in which:
控制窗口313,可以显示多个开关控件,例如显示“哼唱识别”的开关控件317,还可以显示有其他功能(如Wi-Fi、蓝牙、手电筒等等)的开关控件。开关控件317存在两种显示状态,第一显示状态(也可以称为“开(ON)”状态)表明哼唱识别功能开启,第二显示状态(也可称为“关(OFF)”状态)表明哼唱识别功能关闭。在开关控件317的显示状态为第二显示状态的情况下,当智能终端100检测到在控制窗口318中的开关控件317上的操作(如在开关控件317上的触摸操作)时,响应于该操作,智能终端100可以开启“哼唱识别”,并将开关控件317的显示状态调整为第一显示状态。在开关控件317的显示状态为第一显示状态的情况下,当智能终端100检测到在控制窗口318中的开关控件317上的操作时,响应于该操作,智能终端100可以关闭“哼唱识别”,并将开关控件317的显示状态调整为第二显示状态。通过这种方式,可以方便用户开启/关闭哼唱识别的功能。The control window 313 may display multiple switch controls, for example, the switch control 317 displaying "humming recognition", and may also display switch controls with other functions (such as Wi-Fi, Bluetooth, flashlight, etc.). The switch control 317 has two display states. The first display state (also known as "ON" state) indicates that the humming recognition function is turned on, and the second display state (also known as "OFF" state) Indicates that the humming recognition function is off. When the display state of the switch control 317 is the second display state, when the smart terminal 100 detects an operation on the switch control 317 in the control window 318 (such as a touch operation on the switch control 317), it responds to the In operation, the smart terminal 100 can turn on "humming recognition" and adjust the display state of the switch control 317 to the first display state. When the display state of the switch control 317 is the first display state, when the smart terminal 100 detects an operation on the switch control 317 in the control window 318, in response to the operation, the smart terminal 100 can turn off the "humming recognition" ", and adjust the display state of the switch control 317 to the second display state. In this way, it is convenient for users to turn on/off the humming recognition function.
在一种可能的情况下,在哼唱识别功能开启的过程中,状态栏201中显示哼唱识别的图标311。可以理解的是,在智能终端100的多个显示界面中均可包括状态栏201。通过这种方式,可以方便用户通过智能终端100的多个显示界面知晓哼唱识别功能的开启状态。In a possible situation, in the process of turning on the humming recognition function, an icon 311 for humming recognition is displayed in the status bar 201. It can be understood that the status bar 201 may be included in multiple display interfaces of the smart terminal 100. In this way, it is convenient for the user to know the on state of the humming recognition function through multiple display interfaces of the smart terminal 100.
图4B示例性示出了又一种用于设置哼唱识别功能的用户界面42。FIG. 4B exemplarily shows another user interface 42 for setting the humming recognition function.
如图4B所示,用户界面42包括显示区域410,显示区域410用于显示多个可设置的选项,例如“飞行模式”“Wi-Fi”“蓝牙”等等。显示区域410还包括多个开关控件以及多个跳转控件,以开关控件412和跳转控件416介绍两种控件的作用,其中:As shown in FIG. 4B, the user interface 42 includes a display area 410, which is used to display multiple settable options, such as "airplane mode", "Wi-Fi", "Bluetooth" and so on. The display area 410 also includes multiple switch controls and multiple jump controls. The switch control 412 and the jump control 416 are used to introduce the functions of the two controls. Among them:
开关控件412,可用于接收用户的操作(例如,点击操作,滑动操作,等等),响应于该用户的操作,智能终端100改变开关控件412对应的功能/服务/应用程序(即,哼唱识别功能)的开启状态。例如,在接收用户的操作之前,开关控件412的显示状态为“ON”,则表明此时哼唱识别功能处于开启状态。若开关控件412接收到用户的操作,响应于该用户的操作,智能终端100将开关控件412的显示状态调整为“OFF”,并关闭哼唱识别功能。The switch control 412 can be used to receive a user's operation (for example, click operation, sliding operation, etc.), and in response to the user's operation, the smart terminal 100 changes the function/service/application corresponding to the switch control 412 (ie, humming Recognition function). For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and turns off the humming recognition function.
跳转控件416,可用于接收用户的操作,响应于该用户的操作,智能终端100跳转到跳转控件416对应的功能/服务/应用程序(即,勿扰模式)的设置界面,需要说明的是,该设备 界面可以包括“勿扰模式”这个功能的多个设置选项,例如,勿扰模式的开启状态的调整,勿扰模式开启时间的设定,勿扰模式下自动回复的设置,等等。The jump control 416 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 jumps to the setting interface of the function/service/application (ie, do not disturb mode) corresponding to the jump control 416, which needs to be explained Yes, the device interface can include multiple setting options for the "Do Not Disturb Mode" function, for example, the adjustment of the opening state of the Do Not Disturb mode, the setting of the opening time of the Do Not Disturb mode, and the setting of automatic reply in the Do Not Disturb mode. and many more.
图5A-图5C示例性示出了又一些用于设置哼唱识别功能的用户界面。5A-5C exemplarily show some user interfaces for setting the humming recognition function.
如图5A所示,用户界面51包括显示区域522,显示区域与用户界面41中包含的显示区域410相似,显示区域522用于显示多个可设置的选项,例如“飞行模式”“Wi-Fi”“蓝牙”等等。As shown in FIG. 5A, the user interface 51 includes a display area 522, which is similar to the display area 410 included in the user interface 41. The display area 522 is used to display multiple settable options, such as "airplane mode", "Wi-Fi "Bluetooth" and so on.
与显示区域410不同的是,该哼唱识别功能对应的控件为跳转控件520,跳转控件520可用于跳转用户界面至“哼唱识别”设置界面。如图5A-图5B示例性所示,跳转控件520接收用户的操作(例如,点击操作),响应于该用户的操作,智能终端100从用户界面51跳转到“哼唱识别”的设置界面(即,用户界面52)。Different from the display area 410, the control corresponding to the humming recognition function is a jump control 520, which can be used to jump the user interface to the "humming recognition" setting interface. As exemplarily shown in FIGS. 5A-5B, the jump control 520 receives a user's operation (for example, a click operation), and in response to the user's operation, the smart terminal 100 jumps from the user interface 51 to the "humming recognition" setting Interface (ie, user interface 52).
如图5B所示,用户界面52包括返回键530,开关控件532,文本信息534,开关控件536,控件538,控件540,控件552,多个跳转控件(例如,跳转控件554),开关控件556。其中:As shown in FIG. 5B, the user interface 52 includes a return key 530, a switch control 532, text information 534, a switch control 536, a control 538, a control 540, a control 552, a plurality of jump controls (for example, a jump control 554), and a switch Control 556. among them:
返回键530,可用于接收用户的操作,响应于该用户的操作,智能终端100返回当前页面的上一个界面,即图5A所示的用户界面41。本领域技术人员应知,一个界面的上一个界面在应用程序设定时便已确定。The return key 530 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 returns to the previous interface of the current page, that is, the user interface 41 shown in FIG. 5A. Those skilled in the art should know that the previous interface of an interface is determined when the application program is set.
开关控件532的功能可参照图4B中的开关控件412的功能,此处不再赘述。The function of the switch control 532 can refer to the function of the switch control 412 in FIG. 4B, which will not be repeated here.
文本信息534,可用于对哼唱识别功能开启之后,智能终端100所获得的权限进行说明,方便用户根据该说明判定是否赋予智能终端100哼唱识别的权限。文本信息534的表述可按照需求进行更改,此处不做限定。The text information 534 can be used to describe the authority obtained by the smart terminal 100 after the humming recognition function is turned on, so that the user can determine whether to grant the smart terminal 100 the authority for humming recognition according to the description. The expression of the text information 534 can be changed as required, and there is no limitation here.
开关控件536,可用于接收用户的操作,响应于该用户的操作,智能终端100执行设定哼唱识别的启用时间段的操作。举例而言,若在接收用户的操作之前,开关控件536的显示状态为“OFF”,表明该哼唱识别操作没有设置启用时间,哼唱识别操作可以一直在运行过程中。可选的,在这种情况下,智能终端不显示控件538和控件540。在接收用户对开关控件536的操作之后,响应于该用户的操作,智能终端将开关控件536的显示状态转变为“ON”,并显示显示控件538和控件540。其中,控件538用于接收用户输入的哼唱识别操作的启用时间,响应于该用户的操作,智能终端100在该启用时间之后,执行本申请实施例提供的哼唱识别的操作;控件540用于接收用户输入的哼唱识别操作的结束时间,响应于该用户的操作,智能终端100在该结束时间之后,不再执行本申请实施例提供的哼唱识别的操作。需要说明的是,在智能终端100不再执行本申请实施例提供的哼唱识别的操作的过程中,用户依然可以按照现有技术中的方式主动触发哼唱识别。The switch control 536 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 performs an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 536 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation does not have an activation time set, and the humming recognition operation can always be running. Optionally, in this case, the smart terminal does not display the control 538 and the control 540. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal changes the display state of the switch control 536 to "ON", and displays the display control 538 and the control 540. The control 538 is used to receive the activation time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 performs the humming recognition operation provided in the embodiment of the present application after the activation time; the control 540 uses At the end time of receiving the humming recognition operation input by the user, in response to the user's operation, the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application after the end time. It should be noted that when the smart terminal 100 no longer performs the humming recognition operation provided in the embodiment of the present application, the user can still actively trigger the humming recognition in the manner in the prior art.
控件522,可以用于添加,可用于启用哼唱识别的声纹信息。The control 522 can be used to add voiceprint information that can be used to enable humming recognition.
跳转554,可用于接收用户的操作,响应于该用户的操作,智能终端100从用户界面52跳转至声纹1的设置界面。该声纹1的设备界面可以包括命名和删除功能,等等。Jump 554 can be used to receive a user's operation. In response to the user's operation, the smart terminal 100 jumps from the user interface 52 to the setting interface of the voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.
开关控件556,可用于接收用户的操作,响应于该用户的操作,智能终端100设置哼唱识别功能的可使用状态。举例而言,若在接收用户的操作之前,开关控件556的显示状态为“OFF”,表明该哼唱识别操作在智能终端100锁定时不可使用,即智能终端100在锁定时不执行哼唱识别操作。在接收用户对开关控件536的操作之后,响应于该用户的操作,智能终端100将开关控件556的显示状态转变为“ON”,并调整哼唱识别功能的可使用状态,即在智能终端100锁定时也运行哼唱识别操作。The switch control 556 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. For example, if the display state of the switch control 556 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation cannot be used when the smart terminal 100 is locked, that is, the smart terminal 100 does not perform humming recognition when the smart terminal 100 is locked. operating. After receiving the user's operation on the switch control 536, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 556 to "ON" and adjusts the usable state of the humming recognition function, that is, in the smart terminal 100 The humming recognition operation is also run when locked.
开关控件557,可用于接收用户的操作,响应于该用户的操作,智能终端100设置哼唱识别功能的可使用状态。在该开关控件557处于开启状态时,在哼唱识别功能运行的过程中,智能终端可以获取自身所处的位置。智能终端100通过判定自身所处的位置是否为预设地点,来确定是否停止停止通过音频输入模块采集外部环境中的声音,或者确定是否从起始播放位置播放识别出的音频文件。这种判定方式将在后续内容中进行介绍,此处不具体展开。The switch control 557 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 sets the usable state of the humming recognition function. When the switch control 557 is in the on state, the smart terminal can obtain its own position during the operation of the humming recognition function. The smart terminal 100 determines whether to stop collecting sounds in the external environment through the audio input module by determining whether the location where it is located is a preset location, or to determine whether to play the recognized audio file from the initial playback position. This determination method will be introduced in the follow-up content, and will not be specifically expanded here.
如图5B-图5C示例性所示,显示屏194接收用户的操作(例如,上滑操作),响应于该用户的操作,智能终端100显示开关控件556下方的“哼唱识别”的设置内容。如图5C所示,用户界面52中还包括设置哼唱识别功能的访问权限的内容,具体的,用户界面52中还包括跳转控件558,以及多个开关控件(例如开关控件560)。其中:As exemplarily shown in FIGS. 5B-5C, the display screen 194 receives a user's operation (for example, an upward sliding operation), and in response to the user's operation, the smart terminal 100 displays the setting content of "humming recognition" under the switch control 556 . As shown in FIG. 5C, the user interface 52 also includes content for setting the access authority of the humming recognition function. Specifically, the user interface 52 also includes a jump control 558 and a plurality of switch controls (such as switch controls 560). among them:
跳转控件558,可用于设置哼唱识别功能允许访问的无线数据的类型,例如,关闭、WLAN、WLAN与蜂窝移动数据。The jump control 558 can be used to set the type of wireless data that the humming recognition function allows to access, such as off, WLAN, WLAN and cellular mobile data.
开关控件560,可用于设置哼唱识别功能允许访问的系统功能(即,定位服务)。举例而言,若在接收用户的操作之前,开关控件560的显示状态为“OFF”,表明哼唱识别功能运行时,不可以获取智能终端100的位置信息。在接收用户对开关控件560的操作之后,响应于该用户的操作,智能终端100将开关控件560的显示状态转变为“ON”,并允许哼唱识别功能获取智能终端100的位置信息。相似的,其他哼唱识别西药访问的系统功能也可以参照上述方式设置。The switch control 560 can be used to set system functions (ie, location services) that the humming recognition function allows to access. For example, if the display state of the switch control 560 is “OFF” before receiving the user's operation, it indicates that when the humming recognition function is running, the position information of the smart terminal 100 cannot be obtained. After receiving the user's operation on the switch control 560, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 560 to “ON” and allows the humming recognition function to obtain the position information of the smart terminal 100. Similarly, other system functions for humming to identify access to western medicine can also be set by referring to the above method.
图5D-图5F示例性示出了一些设置哼唱识别功能的访问权限的用户界面。Figures 5D-5F exemplarily show some user interfaces for setting access rights for the humming recognition function.
如图5D-图5E示例性所示,响应于用户对跳转控件524的操作,智能终端100从用户界面51跳转至用户界面53,用户界面53用于显示多个系统功能,例如,蓝牙、定位服务、麦克风、图库,等等。其中,一个系统服务对应一个跳转控件(例如,“麦克风”这个系统服务对应于跳转控件562)。As exemplarily shown in FIGS. 5D-5E, in response to the user's operation of the jump control 524, the smart terminal 100 jumps from the user interface 51 to the user interface 53, and the user interface 53 is used to display multiple system functions, for example, Bluetooth , Location service, microphone, gallery, etc. Among them, one system service corresponds to a jump control (for example, the system service "microphone" corresponds to the jump control 562).
如图5E-图5F示例性所示,响应于用户对跳转控件562的操作,智能终端100从用户界面53跳转至用户界面54,用户界面54用于显示要求访问麦克风的多个应用程序。用户可以通过应用程序对应的开关控件,控制应用程序访问麦克风的权限。举例而言,若在接收用户的操作之前,开关控件572的显示状态为“OFF”,表明哼唱识别功能不可以访问麦克风。在接收用户对开关控件572的操作之后,响应于该用户的操作,智能终端100将开关控件572的显示状态转变为“ON”,并允许哼唱识别功能访问麦克风。相似的,其他应用程序访问系统功能的方式也可以参照上述方式。As exemplarily shown in FIGS. 5E-5F, in response to the user's operation of the jump control 562, the smart terminal 100 jumps from the user interface 53 to the user interface 54, and the user interface 54 is used to display multiple applications that require access to the microphone . The user can control the permission of the application to access the microphone through the switch control corresponding to the application. For example, if the display state of the switch control 572 is "OFF" before receiving the user's operation, it indicates that the humming recognition function cannot access the microphone. After receiving the user's operation on the switch control 572, in response to the user's operation, the smart terminal 100 changes the display state of the switch control 572 to "ON" and allows the humming recognition function to access the microphone. Similarly, the way other applications access system functions can also refer to the above way.
图5G示例性示出了一种用于录入声纹信息的用户界面55。FIG. 5G exemplarily shows a user interface 55 for inputting voiceprint information.
在一个实施例中,响应于用户针对用户界面52中的控件552的操作,智能终端100从用户界面52跳转至用户界面55,以录入用户想要添加的声纹信息。在用户界面55中,包括指示符570,文本信息572,控件574。其中:In one embodiment, in response to the user's operation of the control 552 in the user interface 52, the smart terminal 100 jumps from the user interface 52 to the user interface 55 to enter the voiceprint information that the user wants to add. In the user interface 55, an indicator 570, text information 572, and a control 574 are included. among them:
指示符570,可以用于为用户提供提示信息,以指示用户进行声纹信息的录入。The indicator 570 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.
文本信息572,为用户需要朗读的文字内容。可选的,智能终端可以多次显示不同的文本信息供用户朗读。这样,可以录入更多用户的声音信号,以提升声纹信息的准确性。The text information 572 is the text content that the user needs to read. Optionally, the smart terminal can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
在一种可能的情况下,智能终端也可以指示用户演唱几段音乐片段,以进行声纹信息的录入。这种情况下,指示符570的内容可以是“请长按按钮,并演唱以下歌曲片段以录入声纹信息”,对应的,文本信息572为一段歌词。In a possible situation, the smart terminal can also instruct the user to sing several pieces of music to enter the voiceprint information. In this case, the content of the indicator 570 may be “please press and hold the button and sing the following song fragments to record voiceprint information”, and correspondingly, the text information 572 is a piece of lyrics.
控件574,可以用于接收用户的操作(例如,长按操作),响应于该用户的操作,智能终 端100通过麦克风170C采集用户的输入的声音信号。当检测到用户的手指离开显示屏194时,智能终端100存储这段过程中采集到的声音信号,并对该提取该采集到的声音信号作的声纹信息,之后,再将提取出的声纹信息进行存储。The control 574 can be used to receive a user's operation (for example, a long press operation), and in response to the user's operation, the smart terminal 100 collects the user's input sound signal through the microphone 170C. When detecting that the user's finger leaves the display screen 194, the smart terminal 100 stores the sound signal collected during this period, and extracts the voiceprint information of the collected sound signal, and then the extracted sound Pattern information is stored.
本申请实施例提供的哼唱识别操作还可以应用于智能家居设备(例如,智能音箱,电视机,等等)以及车载设备(例如,车载音箱),该智能家居设备或者车载设备可以执行本申请实施例提供的哼唱识别操作。在一种可能的情况下,该智能家居设备或者车载设备上并不配置有显示屏(例如,智能音箱,车载音响),用户可以通过智能终端100对智能家居设备或者车载设备上的哼唱识别功能进行设置。The humming recognition operation provided in the embodiments of this application can also be applied to smart home devices (for example, smart speakers, televisions, etc.) and vehicle-mounted devices (for example, vehicle-mounted speakers), and the smart home devices or vehicle-mounted devices can execute this application The humming recognition operation provided by the embodiment. In a possible situation, the smart home device or vehicle-mounted device is not equipped with a display screen (for example, smart speakers, vehicle audio), and the user can recognize the humming on the smart home device or vehicle-mounted device through the smart terminal 100 Function to be set.
图6A-图6B示例性示出了又一些设置哼唱识别功能的用户界面。可选的,这些用户界面可以是智能家居类的应用程序中的界面。6A-6B exemplarily show some user interfaces for setting the humming recognition function. Optionally, these user interfaces may be interfaces in smart home applications.
如图6A所示,用户界面61包括显示区域60,显示区域60包括指示信息600,提醒信息602,选择框610,选择框614,控件608,显示区域606,其中:As shown in FIG. 6A, the user interface 61 includes a display area 60. The display area 60 includes instruction information 600, reminder information 602, selection box 610, selection box 614, control 608, and display area 606, in which:
指示信息600,可以用于指示用户设置的家庭信息,还可以为“安妮的家”“杰克的家”等文字信息。The instruction information 600 may be used to indicate the family information set by the user, and may also be text information such as "Annie's Home" and "Jack's Home".
提醒信息602,可以用于提示用户需要注意的一些异常情况,智能终端100可以根据各个智能家居设备的状态,生成对应的提醒信息。例如,若防盗门长时间未关闭,则智能终端100可以显示提醒信息602。或者,若空气净化器的滤芯剩余量小于预设值,则智能终端100可以显示提醒信息“空气净化器的滤芯需要更换”,等等。The reminder information 602 may be used to remind the user of some abnormal situations that need to be paid attention to. The smart terminal 100 may generate corresponding reminder information according to the status of each smart home device. For example, if the anti-theft door has not been closed for a long time, the smart terminal 100 may display the reminding message 602. Or, if the remaining amount of the filter element of the air purifier is less than the preset value, the smart terminal 100 may display a reminder message "The filter element of the air purifier needs to be replaced", and so on.
选择框610,可以显示多个可选的家居状态供用户进行选择,例如“回家”“离家”“睡眠”“阅读”以及“更多”。对应于一个家居状态,各个家具设备可存在预设的启用状态。举例而言,若用户对“回家”这个选择框执行了选择操作,响应于该选择操作,智能终端100控制客厅的吊灯以及空调开启。可选的,用户可以对各个家居状态下,各个家居设备的启用状态进行设定,还可以自定义更多的家居状态。The selection box 610 can display multiple optional home statuses for the user to choose, such as "going home", "leaving home", "sleeping", "reading" and "more". Corresponding to a home state, each furniture device may have a preset activation state. For example, if the user performs a selection operation on the selection box of "go home", in response to the selection operation, the smart terminal 100 controls the chandelier and the air conditioner in the living room to turn on. Optionally, the user can set the activation state of each home device in each home state, and can also customize more home states.
选择框604,可以显示多个家居空间供用户进行选择,例如“全部”“客厅”“主卧”“次卧”,等等。该选择框604可接收用户的操作(例如,点击操作,滑动操作,等等),响应于该操作,智能终端100在显示框606中显示与选择的家居空间对应包含的智能家居设备。例如,选择框604B接收到用户的点击操作,则智能终端100在显示框606中显示“客厅”中包含的智能家居设备The selection box 604 can display multiple home spaces for the user to select, such as "all", "living room", "master bedroom", "second bedroom", and so on. The selection box 604 can receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the operation, the smart terminal 100 displays in the display box 606 the smart home devices corresponding to the selected home space. For example, if the selection box 604B receives the user's click operation, the smart terminal 100 displays the smart home equipment contained in the "living room" in the display box 606
控件608,可用于接收用户的操作,响应于该用户的操作,智能终端100显示智能家居设备添加界面。用户可通过该添加界面录入新的智能家居设备的信息。The control 608 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays an interface for adding a smart home device. The user can enter the information of the new smart home device through the add interface.
显示区域606,可用于显示一个或多个智能家居设备的信息,这些信息可以包括图片、名称、开启状态等基本信息。显示区域606还可以用于接收用户的操作,响应于该用户的操作,智能终端100显示该操作对应的智能家居设备的设置界面。The display area 606 may be used to display information of one or more smart home devices, and the information may include basic information such as pictures, names, and opening states. The display area 606 may also be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 displays the setting interface of the smart home device corresponding to the operation.
如图6A-图6B示例性所示,显示区域606接收用户的点击操作,响应于该用户的点击操作,智能终端100从用户界面61跳转到用户界面62。用户界面62中包括返回键620,开关控件622,音量控件626,开关控件628,控件630,开关控件620,开关控件634,控件636,控件638,控件640和跳转控件642。其中:As exemplarily shown in FIGS. 6A-6B, the display area 606 receives a user's click operation, and in response to the user's click operation, the smart terminal 100 jumps from the user interface 61 to the user interface 62. The user interface 62 includes a return key 620, a switch control 622, a volume control 626, a switch control 628, a control 630, a switch control 620, a switch control 634, a control 636, a control 638, a control 640, and a jump control 642. among them:
返回键620,可以用于接收用户的操作,响应于该用户的操作,该智能终端100返回当前页面的上一个页面(即,用户界面61)。The return key 620 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 returns to the previous page of the current page (ie, the user interface 61).
开关控件622,可以用于接收用户的操作,响应于该用户的操作,智能终端100控制智能音箱的开启或者关闭状态。电子设备控制智能音箱的方式可以是发送控制指令,以指示智能音箱执行该控制指令对应的操作。The switch control 622 may be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the on or off state of the smart speaker. The electronic device can control the smart speaker by sending a control instruction to instruct the smart speaker to perform an operation corresponding to the control instruction.
音量控件控件626,可以用于调整播放音频文件的音量。音量控件626可以用于接收用户的操作,响应于该操作,智能终端100控制智能音箱调整播放音频文件的音量。可选的,当接收到的用户操作为向左的滑动时,智能终端100控制智能音箱降低播放音频文件的音量;当接收到的用户操作为向右的滑动时,智能终端100控制智能音箱提高播放音频文件的音量。在一种可能的实现方式中,音频文件控件626到所在线段的左端点的距离与所在线段的线段长的比值,和当前音量与智能音箱播放的最大音量的比值具有对应关系。The volume control control 626 can be used to adjust the volume of the playing audio file. The volume control 626 may be used to receive a user's operation, and in response to the operation, the smart terminal 100 controls the smart speaker to adjust the volume of the audio file played. Optionally, when the received user operation is a leftward sliding, the smart terminal 100 controls the smart speaker to reduce the volume of the audio file; when the received user operation is a rightward sliding, the smart terminal 100 controls the smart speaker to increase The volume of the audio file being played. In a possible implementation, the ratio of the distance from the audio file control 626 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the smart speaker.
开关控件628,可以用于接收用户输入的操作,响应于该用户的操作,智能终端100控制智能音箱开启音效优化功能,或者关闭音效优化功能。The switch control 628 may be used to receive an operation input by a user, and in response to the user's operation, the smart terminal 100 controls the smart speaker to turn on the sound effect optimization function or turn off the sound effect optimization function.
控件630,可以用于接收用户输入时间的操作,响应于该用户的操作,智能终端100控制智能音箱设定关闭时间为用户输入的时间。The control 630 may be used to receive the user's input time operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to set the closing time as the time input by the user.
开关控件620,可用于接收用户的操作(例如,点击操作,滑动操作,等等),响应于该用户的操作,智能终端100控制智能音箱改变哼唱识别功能的开启状态。举例而言,在接收用户的操作之前,开关控件412的显示状态为“ON”,则表明此时智能音箱的哼唱识别功能处于开启状态。若开关控件412接收到用户的操作,响应于该用户的操作,智能终端100将开关控件412的显示状态调整为“OFF”,并控制智能音箱关闭哼唱识别功能。The switch control 620 can be used to receive a user's operation (for example, a click operation, a sliding operation, etc.), and in response to the user's operation, the smart terminal 100 controls the smart speaker to change the on state of the humming recognition function. For example, before receiving the user's operation, the display state of the switch control 412 is "ON", which indicates that the humming recognition function of the smart speaker is in the on state at this time. If the switch control 412 receives a user's operation, in response to the user's operation, the smart terminal 100 adjusts the display state of the switch control 412 to "OFF" and controls the smart speaker to turn off the humming recognition function.
开关控件634,可用于接收用户的操作,响应于该用户的操作,智能终端100控制智能音箱执行设定哼唱识别的启用时间段的操作。举例而言,若在接收用户的操作之前,开关控件634的显示状态为“OFF”,表明该智能音箱的哼唱识别操作没有设置启用时间,哼唱识别操作可以一直在运行过程中。可选的,在这种情况下,智能终端100不显示控件636和控件638。在接收用户对开关控件634的操作之后,响应于该用户的操作,电子设备将开关控件634的显示状态转变为“ON”,并显示显示控件636和控件638。其中,控件636用于接收用户输入的哼唱识别操作的启用时间,响应于该用户的操作,智能终端100控制智能音箱执行,设定哼唱识别的启用时间为用户输入的启用时间的操作;控件638用于接收用户输入的哼唱识别操作的结束时间,响应于该用户的操作,智能终端100控制智能音箱执行,设定哼唱识别的结束时间为用户输入的结束时间的操作。The switch control 634 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 controls the smart speaker to perform an operation of setting an active time period for humming recognition. For example, if the display state of the switch control 634 is "OFF" before receiving the user's operation, it indicates that the humming recognition operation of the smart speaker has not set an activation time, and the humming recognition operation may be running all the time. Optionally, in this case, the smart terminal 100 does not display the control 636 and the control 638. After receiving the user's operation on the switch control 634, in response to the user's operation, the electronic device changes the display state of the switch control 634 to “ON” and displays the display control 636 and the control 638. Wherein, the control 636 is used to receive the activation time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 controls the smart speaker to execute, and sets the activation time of the humming recognition to the activation time input by the user; The control 638 is used to receive the end time of the humming recognition operation input by the user. In response to the user's operation, the smart terminal 100 controls the smart speaker to execute and sets the end time of the humming recognition operation as the end time input by the user.
在一种可能的实现方式中,智能音箱有可能自身并不能设置哼唱识别功能的启用时间,在这种情况下,响应于该用户的针对控件636的操作,智能终端100在开启时间向智能音箱发送启用哼唱识别功能的指令,以控制智能音箱开启哼唱识别功能;响应于该用户针对控件638的操作,智能终端100在结束时间向智能音箱发送停止哼唱识别功能的指令,以控制智能音箱停止哼唱识别功能。In a possible implementation, the smart speaker itself may not be able to set the activation time of the humming recognition function. In this case, in response to the user's operation on the control 636, the smart terminal 100 sends the smart speaker to the smart speaker at the activation time. The speaker sends an instruction to enable the humming recognition function to control the smart speaker to turn on the humming recognition function; in response to the user's operation on the control 638, the smart terminal 100 sends an instruction to stop the humming recognition function to the smart speaker at the end time to control The smart speaker stops the humming recognition function.
控件640,可以用于添加,可用于启用哼唱识别的声纹信息。The control 640 can be used to add voiceprint information that can be used to enable humming recognition.
跳转控件554,可用于接收用户的操作,响应于该用户的操作,智能终端100从用户界面62跳转至声纹1的设置界面。该声纹1的设备界面可以包括命名和删除功能,等等。The jump control 554 can be used to receive a user's operation, and in response to the user's operation, the smart terminal 100 jumps from the user interface 62 to the setting interface of the voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.
在一种可能的情况下,智能音箱中用于匹配的声纹信息为智能终端100中存储的声纹信息。可选的,在智能终端100中接收了用户输入的声音信号之后,将该提取该声音信号的声纹信息发送给,可进行哼唱识别的智能音箱进行存储。这样,具有哼唱识别功能的智能音箱可以使用电子设备中存储的声纹信息进行声音信号的匹配。这种情况下录入声纹信息的用户 界面可参照图5G。In a possible situation, the voiceprint information used for matching in the smart speaker is the voiceprint information stored in the smart terminal 100. Optionally, after the smart terminal 100 receives the voice signal input by the user, the voiceprint information extracted from the voice signal is sent to a smart speaker capable of humming recognition for storage. In this way, the smart speaker with humming recognition function can use the voiceprint information stored in the electronic device to match the sound signal. In this case, the user interface for entering voiceprint information can refer to Figure 5G.
在又一种可能的情况下,智能音箱中用于匹配的声纹信息为,该智能音箱重新录入的用户的声音信号中提取出的声纹信息。图6C示例性示出了又一种用于录入声纹信息的用户界面63。响应于用户对控件640的操作,智能终端100从用户界面62跳转到用户界面63。用户界面63包括:In another possible situation, the voiceprint information used for matching in the smart speaker is the voiceprint information extracted from the user's voice signal re-entered by the smart speaker. Fig. 6C exemplarily shows yet another user interface 63 for inputting voiceprint information. In response to the user's operation of the control 640, the smart terminal 100 jumps from the user interface 62 to the user interface 63. The user interface 63 includes:
指示信息650,可以用于为用户提供提示信息,以指示用户进行声纹信息的录入。The instruction information 650 may be used to provide prompt information for the user to instruct the user to enter voiceprint information.
文本信息652,为用户需要朗读的文字内容。可选的,电子设备可以多次显示不同的文本信息供用户朗读。这样,可以录入更多用户的声音信号,以提升声纹信息的准确性。The text information 652 is text content that the user needs to read aloud. Optionally, the electronic device may display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
在一种可能的情况下,电子设备也可以指示用户演唱几段音乐片段,以进行声纹信息的录入。这种情况下,指示符650的内容可以是“请靠近智能音箱,长按播放键,并演唱以下歌曲片段以录入声纹信息”,对应的,文本信息652为一段歌词。需要说明的是,该播放键是指智能音箱的播放键,该播放键可以为物理按键,也可以为虚拟按键。In a possible situation, the electronic device can also instruct the user to sing several pieces of music to record the voiceprint information. In this case, the content of the indicator 650 may be "please approach the smart speaker, press and hold the play button, and sing the following song fragments to record voiceprint information." Correspondingly, the text information 652 is a piece of lyrics. It should be noted that the play button refers to the play button of the smart speaker, and the play button can be a physical button or a virtual button.
需要说明的是,不限于智能音箱,其他智能家居设备(不局限于没有显示屏的智能家居设备,也可以是配置有显示屏的智能家居设备)均可按照上述介绍的方式对哼唱识别功能进行设置。相似的,车载设备的功能设置也可以在智能终端100上进行,这种情况也可参照上述介绍的方式。It should be noted that, not limited to smart speakers, other smart home devices (not limited to smart home devices without a display screen, but also smart home devices with a display screen) can recognize the humming function in the manner described above Make settings. Similarly, the function setting of the in-vehicle device can also be performed on the smart terminal 100. In this case, the above-mentioned method can also be referred to.
以上介绍了一些智能终端100对智能家居设备上的哼唱识别功能进行设置的用户界面。在一种可能的实现方式中,智能家居设备或者车载设备上设置有显示屏,可对自身的哼唱识别功能进行设置。以下对车载设备上用于哼唱识别的设置界面进行介绍。The above has introduced some user interfaces of the smart terminal 100 for setting the humming recognition function on the smart home device. In a possible implementation manner, a display screen is provided on the smart home device or the vehicle-mounted device, and its own humming recognition function can be set. The following introduces the setting interface for humming recognition on vehicle equipment.
图7A-图7B示例性示出了车载设备上用于显示设置哼唱识别功能的用户界面。Figures 7A-7B exemplarily show a user interface for displaying and setting the humming recognition function on the vehicle-mounted device.
图7A示例性示出了车载设备上用于显示应用程序菜单的用户界面71。用户界面71也可称为主菜单。如图7A所示,用户界面71可包括:日历小工具700,状态栏702,显示区域708,控件706,其中:FIG. 7A exemplarily shows a user interface 71 for displaying an application menu on the vehicle-mounted device. The user interface 71 may also be referred to as the main menu. As shown in FIG. 7A, the user interface 71 may include: a calendar widget 700, a status bar 702, a display area 708, and a control 706, among which:
日历小工具700可用于指示当前时间,例如日期、星期几、时分信息等。The calendar widget 700 can be used to indicate the current time, such as date, day of the week, hour and minute information, etc.
状态栏201可包括:蓝牙指示符704、无线高保真(wireless fidelity,Wi-Fi)信号的一个或多个信号强度指示符705,时间指示符703。The status bar 201 may include: a Bluetooth indicator 704, one or more signal strength indicators 705 of a wireless fidelity (wireless fidelity, Wi-Fi) signal, and a time indicator 703.
显示区域708,可用于显示多个应用程序图标,例如,导航的图标708A、电话的图标708B、音乐的图标708C、视频的图标708D、图库的图标708E、收音机的图标708F、机车记录仪的图标708G、设置的图标708H。The display area 708 can be used to display multiple application icons, such as navigation icon 708A, phone icon 708B, music icon 708C, video icon 708D, gallery icon 708E, radio icon 708F, locomotive recorder icon 708G, set icon 708H.
控件706,可用于接收用户的操作,响应于用户的操作,车载设备从当前界面跳转回用户界面71(即主菜单界面)。The control 706 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps back from the current interface to the user interface 71 (that is, the main menu interface).
如图7A-图7B所示,当接收到用户对设置的图标708H的操作时,响应于该用户的操作,车载设备从用户界面71跳转至用户界面72。用户界面72为用于显示设置菜单的用户界面,该用户界面中包括多个设置选项,例如,“系统设置720”“用户设置722”“音效设置724”“网络设置726”“时间设置728”,等等。显示区域716中显示的内容为设置选项对应的设置内容。可选的,“系统设置720”可以为默认选择的设置选项,这种情况在显示区域716中显示的内容为系统设置对应的设置内容。可选的,若一个设置选项接收到用户的操作,响应于该用户的操作,显示区域716显示该一个设置选项对应的设置内容。As shown in FIGS. 7A-7B, when a user's operation on the set icon 708H is received, in response to the user's operation, the in-vehicle device jumps from the user interface 71 to the user interface 72. The user interface 72 is a user interface for displaying the setting menu. The user interface includes multiple setting options, for example, "system setting 720", "user setting 722", "sound effect setting 724", "network setting 726", and "time setting 728". ,and many more. The content displayed in the display area 716 is the setting content corresponding to the setting option. Optionally, "system setting 720" may be a setting option selected by default. In this case, the content displayed in the display area 716 is the setting content corresponding to the system setting. Optionally, if a setting option receives a user's operation, in response to the user's operation, the display area 716 displays the setting content corresponding to the one setting option.
在一种可能的实现方式中,显示区域716可接收用户的操作(例如,向上或向下的滑动 操作),响应于该操作,显示区域716可显示更多的设置内容。如图7B所示,显示区域716中显示的内容为哼唱识别的设置内容。In a possible implementation manner, the display area 716 can receive a user's operation (for example, an upward or downward sliding operation), and in response to the operation, the display area 716 can display more settings. As shown in FIG. 7B, the content displayed in the display area 716 is the setting content of humming recognition.
显示区域716中可包括开关控件710,控件712,控件714。The display area 716 may include a switch control 710, a control 712, and a control 714.
开关控件710,可用于开启或者关闭哼唱识别的功能。The switch control 710 can be used to turn on or turn off the humming recognition function.
控件712,可以用于添加,可用于启用哼唱识别的声纹信息。可选的,控件712可用于接收用户的操作,响应于该用户的操作,车载设备跳转到用于录入声纹信息的用户界面,例如,示例性示出的用户界面用户界面73。之后将会对用户界面73作更详细的说明,此处不具体展开。The control 712 can be used to add voiceprint information that can be used to enable humming recognition. Optionally, the control 712 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps to a user interface for inputting voiceprint information, for example, the user interface user interface 73 shown as an example. The user interface 73 will be described in more detail later, which is not specifically expanded here.
跳转714,可用于接收用户的操作,响应于该用户的操作,车载设备从用户界面72跳转至声纹1的设置界面。该声纹1的设备界面可以包括命名和删除功能,等等。Jump 714 can be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device jumps from the user interface 72 to the setting interface of voiceprint 1. The device interface of the voiceprint 1 may include naming and deleting functions, and so on.
图7C示例性示出了一种用于录入声纹信息的用户界面73。FIG. 7C exemplarily shows a user interface 73 for inputting voiceprint information.
在一个实施例中,响应于用户针对用户界面72中的控件712的操作,车载设备从用户界面72跳转至用户界面73,以录入用户想要添加的声纹信息。在用户界面73中,包括指示信息730和文本信息572。其中:In one embodiment, in response to the user's operation of the control 712 in the user interface 72, the in-vehicle device jumps from the user interface 72 to the user interface 73 to enter the voiceprint information that the user wants to add. In the user interface 73, instruction information 730 and text information 572 are included. among them:
指示信息730,可以用于为用户提供提示信息,以指示用户进行声纹信息的录入。需要说明的是,播放键为音箱的播放键,在一种可能的情况下,音箱的播放键为车载设备显示屏周围的一个物理按键。The instruction information 730 may be used to provide prompt information for the user to instruct the user to enter voiceprint information. It should be noted that the play button is the play button of the speaker. In a possible situation, the play button of the speaker is a physical button around the display screen of the vehicle device.
文本信息732,为用户需要朗读的文字内容。可选的,车载设备可以多次显示不同的文本信息供用户朗读。这样,可以录入更多用户的声音信号,以提升声纹信息的准确性。The text information 732 is the text content that the user needs to read aloud. Optionally, the in-vehicle device can display different text information multiple times for the user to read. In this way, more voice signals of users can be recorded to improve the accuracy of voiceprint information.
在一种可能的情况下,车载设备也可以指示用户演唱几段音乐片段,以进行声纹信息的录入。这种情况下,指示符730的内容可以是“请靠近音箱,长按播放键,并演唱以下歌曲片段以录入声纹信息”,对应的,文本信息732为一段歌词。In a possible situation, the vehicle-mounted device may also instruct the user to sing several pieces of music to record the voiceprint information. In this case, the content of the indicator 730 may be "please approach the speaker, press and hold the play button, and sing the following song fragments to record voiceprint information." Correspondingly, the text information 732 is a piece of lyrics.
在录入用户输入的声音信号后,车载设备可以提取采集的声音信号的声纹信息,并将该声纹信息进行存储。After recording the voice signal input by the user, the vehicle-mounted device can extract the voiceprint information of the collected voice signal, and store the voiceprint information.
以上介绍了车载设备上用于哼唱识别的设置界面,需要说明的是,不限于以上介绍的用户界面,车载设备上用于哼唱识别的设置界面,还可以参照上述介绍的智能终端中的用户界面51、用户界面52、用户界面53以及用户界面54。接下来对车载设备中显示哼唱识别结果的用户界面进行进一步的介绍。The above introduces the setting interface for humming recognition on vehicle equipment. It should be noted that it is not limited to the user interface introduced above. For the setting interface for humming recognition on vehicle equipment, you can also refer to the smart terminal described above. User interface 51, user interface 52, user interface 53, and user interface 54. Next, the user interface that displays the humming recognition result in the vehicle-mounted device is further introduced.
图8A示例性示出了一种车载设备上用于显示识别结果的用户界面81。如图8A所示,当车载设备根据用户哼唱的音乐片段识别出音频文件时,车载设备跟随用户哼唱的进度播放音频文件,并在自身当前界面上显示通知窗口842,用于显示对用户哼唱的音乐片段的识别结果,通知窗口842可包括:哼唱识别图标840,第三显示区域841,播放控件843,控件844。FIG. 8A exemplarily shows a user interface 81 for displaying recognition results on a vehicle-mounted device. As shown in FIG. 8A, when the vehicle-mounted device recognizes an audio file according to the music clip hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a notification window 842 on its current interface for displaying to the user For the recognition result of the hummed music segment, the notification window 842 may include: a humming recognition icon 840, a third display area 841, a playback control 843, and a control 844.
其中,哼唱识别图标840用于指示通知窗口842的来源,为了方便用户快捷的了解到通知窗口842是哼唱识别服务(或者称为功能、应用程序)输出的识别结果。需要说明的是,哼唱识别图标840仅为示例图标,在具体的实施过程中,哼唱识别图标还可以为其他的图案,例如,音符或者其他样式的图标,本申请实施例不作限制。The humming recognition icon 840 is used to indicate the source of the notification window 842, in order to facilitate the user to quickly understand that the notification window 842 is the recognition result output by the humming recognition service (or called a function or application). It should be noted that the humming recognition icon 840 is only an example icon. In a specific implementation process, the humming recognition icon may also be other patterns, such as musical notes or icons of other styles, which are not limited in the embodiment of the present application.
第三显示区域841,可以用于显示识别出的音频文件的标识信息。例如,第三显示区域841中包含了歌曲的名称《漂洋过海来看你》。第三显示区域841还可以包括操作指示信息,例如,第三显示区域841中包含的“可单击停止播放”,可为用户提供操作的提醒,提升用户 操作的便利性。可选的,第三显示区域841还可以包含演唱者信息或者当前播放的音频文件的歌词信息。在又一种可能的情况下,该车载设备还可以以悬浮窗的形式显示当前播放的音频文件的歌词信息。该悬浮窗是在车载设备的显示界面中悬浮显示的一个可以移动的窗口。The third display area 841 can be used to display the identification information of the recognized audio file. For example, the third display area 841 contains the name of the song "Across the Ocean to See You". The third display area 841 may also include operation instruction information, for example, "click to stop playback" included in the third display area 841, which can provide the user with an operation reminder and improve the convenience of the user's operation. Optionally, the third display area 841 may also contain singer information or lyrics information of the currently played audio file. In another possible situation, the vehicle-mounted device may also display the lyrics information of the currently playing audio file in the form of a floating window. The floating window is a movable window displayed floating in the display interface of the vehicle-mounted device.
播放控件843,可以用于接收用户的操作,响应于该用户的操作,该车载设备暂停播放或者继续播放音频文件。具体的,该车载设备识别出用户哼唱的音乐片段对应的音频文件后,将跟随用户的演唱进度播放该音频文件,此时的播放控件843显示第一状态。在播放控件843显示为第一状态的情况下,若播放控件843接收到用户的操作,则该车载设备暂停播放该音频文件,并将播放控件843显示为第二状态。在播放843显示为第二状态的情况下,若播放控件843接收到用户的操作,则该车载设备继续播放该音频文件,并将播放控件843显示为第一状态。The playback control 843 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses or continues to play the audio file. Specifically, after the in-vehicle device recognizes the audio file corresponding to the music segment hummed by the user, it will play the audio file in accordance with the user's singing progress. At this time, the play control 843 displays the first state. In the case where the playback control 843 is displayed in the first state, if the playback control 843 receives a user's operation, the vehicle-mounted device pauses playing the audio file and displays the playback control 843 in the second state. In the case where the playback control 843 is displayed in the second state, if the playback control 843 receives the user's operation, the vehicle-mounted device continues to play the audio file and displays the playback control 843 in the first state.
控件844,可以用于接收用户的操作,响应于该用户的操作,车载设备暂停播放音频文件重新获取用户的声音信号,并对重新获取的声音信号进行哼唱识别。The control 844 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device pauses the audio file to re-acquire the user's sound signal, and performs humming recognition on the re-acquired sound signal.
在一种可能的实施方式中,车载设备中不显示控件844,可显示指示信息“语音呼出“重新识别”即可再次进行哼唱识别”。这种情况下,若车载设备检测到用户输入的语音信息“重新识别”,车载设备将暂停播放音频文件,并对用户紧接着再次哼唱的片段进行哼唱识别,之后,在根据重新识别的结果再次显示通知窗口。这种方式,无需用户进行手动操作,便于在用户驾驶过程中输出重新进行哼唱识别的指令。In a possible implementation manner, the control 844 is not displayed in the in-vehicle device, and the instruction message "recognize the voice call "recognize" can be performed again for humming recognition". In this case, if the in-vehicle device detects the "re-recognition" of the voice information input by the user, the in-vehicle device will pause playing the audio file and perform humming recognition on the segment that the user hums again. As a result, the notification window is displayed again. In this way, no manual operation by the user is required, and it is convenient for the user to output an instruction to perform humming recognition again during driving.
在一种可能的实现方式中,通知窗口842在显示预设时间后消失,该预设时间可以为4秒,5秒等时间值。或者,当通知窗口842接收到用户的向上滑动的操作时,响应于该操作,车载设备不再在用户界面81中显示通知窗口842。或者,该通知窗口可以在当前歌曲播放完毕后消失。In a possible implementation manner, the notification window 842 disappears after displaying a preset time, and the preset time may be 4 seconds, 5 seconds, or the like. Or, when the notification window 842 receives the user's upward sliding operation, in response to the operation, the in-vehicle device no longer displays the notification window 842 in the user interface 81. Or, the notification window can disappear after the current song is played.
图8B示例性示出了又一种车载设备上用于显示识别结果的用户界面82。如图8B所示,当车载设备根据用户哼唱的音乐片段识别出音频文件时,车载设备跟随用户哼唱的进度播放音频文件,并在自身当前界面上显示用户界面82,用于显示对用户哼唱的音乐片段的识别结果,用户界面82可包括:哼唱识别图标840,第三显示区域841,播放控件843,控件844,音量控件851,控件853和控件854。FIG. 8B exemplarily shows a user interface 82 for displaying recognition results on another vehicle-mounted device. As shown in FIG. 8B, when the vehicle-mounted device recognizes an audio file based on the music segment hummed by the user, the vehicle-mounted device plays the audio file following the progress of the user's humming, and displays a user interface 82 on its current interface for displaying information to the user As a result of the recognition of the hummed music segment, the user interface 82 may include: a humming recognition icon 840, a third display area 841, a playback control 843, a control 844, a volume control 851, a control 853, and a control 854.
其中,哼唱识别图标840,第三显示区域841,播放控件843,控件844均可参照上述图8A中的描述,此处不再赘述。Among them, the humming recognition icon 840, the third display area 841, the playback control 843, and the control 844 can all refer to the description in FIG. 8A, and will not be repeated here.
音量控件851,可以用于调整播放音频文件的音量。音量控件851可以用于接收用户的操作,响应于该操作,车载设备调整播放音频文件的音量。可选的,当接收到的用户操作为向左的滑动时,车载设备降低播放音频文件的音量;当接收到的用户操作为向右的滑动时,车载设备提高播放音频文件的音量。在一种可能的实现方式中,音频文件控件851到所在线段的左端点的距离与所在线段的线段长的比值,和当前音量与系统播放的最大音量的比值具有对应关系。The volume control 851 can be used to adjust the volume of playing audio files. The volume control 851 may be used to receive a user's operation, and in response to the operation, the vehicle-mounted device adjusts the volume of the played audio file. Optionally, when the received user operation is sliding to the left, the vehicle-mounted device reduces the volume of playing audio files; when the received user operation is a sliding to the right, the vehicle-mounted device increases the volume of playing audio files. In a possible implementation, the ratio of the distance from the audio file control 851 to the left end of the line segment to the line segment length of the line segment has a corresponding relationship with the ratio of the current volume to the maximum volume played by the system.
控件853,可以用于收藏识别出的音频文件。控件853可以接收用户的操作,响应于该用户的操作,车载设备将识别出的音频文件的标识,添加到预设的收藏夹(或者称为“喜欢的音乐”的文件夹,本申请不作限制)中,便于用户下一次查找或播放该识别出的音频文件。The control 853 can be used to collect the recognized audio files. The control 853 can receive the user's operation. In response to the user's operation, the vehicle-mounted device adds the identified audio file identifier to the preset favorites (or the folder called "favorite music", which is not limited by this application). ), it is convenient for the user to search or play the identified audio file next time.
控件854,可以用于下载识别出的音频文件。控件854可以接收用户的操作,响应于该用户的操作,车载设备从网络中下载该识别出的音频文件的音频资源。可选的,响应于该用户的操作,车载设备显示选择框,该选择框中包含“标准品质”“高品质”“无损品质”等音 质选项。该选择框用于接收用户对一个选项的选择操作,响应于该用户对一个选项的选择操作,车载设备下载该选项对应的音质的音频资源。The control 854 can be used to download the recognized audio file. The control 854 can receive a user's operation, and in response to the user's operation, the in-vehicle device downloads the audio resource of the identified audio file from the network. Optionally, in response to the user's operation, the vehicle-mounted device displays a selection box, and the selection box contains sound quality options such as "standard quality", "high quality", and "lossless quality". The selection box is used to receive a user's selection operation on an option, and in response to the user's selection operation on an option, the vehicle-mounted device downloads audio resources of sound quality corresponding to the option.
控件855,可以用于分享识别出的音频文件。控件855可以接收用户的操作,响应于该用户的操作,车载设备显示分享框,该分享框中包含多个分享对象,例如,与车载设备蓝牙连接的一个或多个终端设备。该分享框用于接收用户对一个分享对象的选择操作,响应于该用户对一个分享对象的选择操作,车载设备将该音频文件的标识或者音频资源发送给该选择操作对应的分享对象。The control 855 can be used to share the recognized audio file. The control 855 can receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays a sharing frame, which contains multiple sharing objects, for example, one or more terminal devices connected to the vehicle-mounted device via Bluetooth. The sharing box is used to receive a user's selection operation of a sharing object, and in response to the user's selection operation of a sharing object, the vehicle-mounted device sends the audio file identifier or audio resource to the sharing object corresponding to the selection operation.
在一种可能的实现方式中,通知窗口842可用于接收用户的操作,响应于该用户的操作,车载设备显示用户界面82。In a possible implementation manner, the notification window 842 may be used to receive a user's operation, and in response to the user's operation, the vehicle-mounted device displays the user interface 82.
在一种可能的实现方式中,用户界面82可接收用户的滑动操作,响应于该滑动操作,车载设备显示用户界面82之前最近显示的用户界面。In a possible implementation manner, the user interface 82 may receive a user's sliding operation, and in response to the sliding operation, the vehicle-mounted device displays the user interface that was recently displayed before the user interface 82.
需要说明的是,不限于车载设备,其他智能家居设备(配置有显示屏的智能家居设备)均可按照上述图7A-图7C介绍的方式对哼唱识别功能进行设置,按照上述图8A-图8B介绍的方式对识别结果进行显示。另外,车载设备以及智能家居设备中显示识别结果的用户界面,还可以参照上述介绍的智能终端中的用户界面21、用户界面31、用户界面32、用户界面33以及用户界面34。但由于各个设备具备的功能不完全相同,用户界面中包含的界面元素可以作相应的调整。It should be noted that, not limited to in-vehicle devices, other smart home devices (smart home devices equipped with display screens) can set the humming recognition function according to the method described in Figure 7A-7C above, according to Figure 8A-Figure above The method introduced in 8B displays the recognition result. In addition, the user interface displaying the recognition result in the vehicle-mounted device and the smart home device can also refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, and the user interface 34 in the smart terminal described above. However, since the functions of each device are not exactly the same, the interface elements contained in the user interface can be adjusted accordingly.
基于前述介绍的UI实施例,下面实施例介绍执行本申请提供的哼唱识别方法的系统架构。该系统架构中包括电子设备和音乐识别服务器。其中:Based on the aforementioned UI embodiment, the following embodiment introduces the system architecture for executing the humming recognition method provided in this application. The system architecture includes electronic equipment and music recognition server. among them:
电子设备,可以为图1A示例性示出的智能终端100,具体可以为手机、平板电脑等便携式电子设备,或者智能手表、智能手环等可穿戴设备,电子设备还可以为图1C示例性示出的智能家居设备110或者图1D示例性示出的车载设备120。具体的,该电子设备可以具有音频输入模块,以及音频输出模块。电子设备可以通过音频输入模块采集外部环境中的声音,并将声音信号发送给音乐识别服务器进行哼唱识别,之后,电子设备从音乐识别服务器接收识别出的音频文件以及播放位置,再通过扬声器模块从该播放位置播放识别出的音频文件。在一种可能的实现方式中,该电子设备中还可以包括摄像头模块,该摄像头模块用于获取用户的口型信息,电子设备可以将获取到的口型信息发送给音乐识别服务器,用于音乐识别服务器结合该口型信息与该声音信号进行哼唱识别。The electronic device may be the smart terminal 100 exemplarily shown in FIG. 1A, specifically it may be a portable electronic device such as a mobile phone or a tablet computer, or a wearable device such as a smart watch or a smart bracelet, and the electronic device may also be the exemplarily shown in FIG. 1C. The smart home device 110 or the in-vehicle device 120 exemplarily shown in FIG. 1D. Specifically, the electronic device may have an audio input module and an audio output module. The electronic device can collect the sound in the external environment through the audio input module, and send the sound signal to the music recognition server for humming recognition. After that, the electronic device receives the recognized audio file and the playback position from the music recognition server, and then passes the speaker module Play the recognized audio file from the playback position. In a possible implementation, the electronic device may also include a camera module, the camera module is used to obtain the user's mouth shape information, the electronic device may send the acquired mouth shape information to the music recognition server for use in music The recognition server combines the lip shape information with the voice signal to perform humming recognition.
音乐识别服务器,可以对接收到的声音信号进行特征提取,并利用提取出的特征(例如,基频序列)进行检索,从预存的音频资源库(或者称为特征数据库)中匹配出和用户哼唱片段最相似的音频信息。可选的,该音乐识别服务器可以是一个单独的服务器,该音乐识别服务器还可以由多个服务器共同组成。可选的,该音频资源库可以保存在该音乐识别服务器中,该音频资源库还可以保存在与该音乐识别服务器建立有连接关系的其他的设备(例如,数据库服务器)中。The music recognition server can perform feature extraction on the received sound signal, and use the extracted features (for example, the fundamental frequency sequence) to search, and match it with the user's hum from the pre-stored audio resource library (or called the feature database). The most similar audio information on the album. Optionally, the music recognition server may be a separate server, and the music recognition server may also be composed of multiple servers. Optionally, the audio resource library may be stored in the music recognition server, and the audio resource library may also be stored in another device (for example, a database server) that has a connection relationship with the music recognition server.
参见图9,是本申请实施例提供的一种哼唱识别方法的流程图,本申请实施例提供的哼唱识别方式包括但不限于如下步骤。Refer to FIG. 9, which is a flowchart of a humming recognition method provided by an embodiment of the present application. The humming recognition method provided by the embodiment of the present application includes but is not limited to the following steps.
S901、电子设备通过音频输入模块采集外部环境中的声音。S901. The electronic device collects sounds in the external environment through the audio input module.
可选的,在电子设备通过音频输入模块采集外部环境中的声音之前,电子设备需要判定 自身的音频输入模块和/或音频输出模块是否被占用。若自身的音频输入模块和/或音频输出模块被占用,例如,播放音频/视频、拨打电话、进行语音导航等等,则电子设备不通过音频输入模块采集外部环境中的声音以用于哼唱识别操作,需要说明的是,电子设备不通过音频输入模块采集外部环境中的声音以用于哼唱识别操作,不代表在这种情况下,电子设备不进行通过音频输入模块采集外部环境中的声音的操作,而是代表电子设备获取声音的目的不是为了进行哼唱识别。举例而言,在通话过程中,电子设备(例如,手机)需要通过音频输入模块采集外部环境中的声音,是为了获取用户输入的语音信息,以及获取环境声音用于降噪。Optionally, before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether its own audio input module and/or audio output module is occupied. If its own audio input module and/or audio output module is occupied, for example, playing audio/video, making a phone call, performing voice navigation, etc., the electronic device does not collect sounds in the external environment through the audio input module for humming Recognition operation, it should be noted that the electronic device does not collect sounds in the external environment through the audio input module for humming recognition operation. It does not mean that in this case, the electronic device does not collect the external environment through the audio input module. The operation of sound, but the purpose of acquiring sound by the electronic device is not for humming recognition. For example, during a call, an electronic device (for example, a mobile phone) needs to collect sounds in the external environment through an audio input module, in order to obtain voice information input by the user and to obtain environmental sounds for noise reduction.
若该电子设备自身的音频输入模块和/或音频输出模块未被占用,则通过音频输入模块采集外部环境中的声音。可选的,在电子设备的音频输入模块和/或音频输出模块被释放之后,例如,音频/视频播放结束、电话挂断、语音导航结束等等,该电子设备可通过音频输入模块采集外部环境中的声音。也可以理解为,本申请实施例提供的哼唱识别操作的优先级,低于该电子设备中除该哼唱识别操作的其他需占用音频资源的操作的优先级。If the audio input module and/or audio output module of the electronic device is not occupied, the audio input module is used to collect sounds in the external environment. Optionally, after the audio input module and/or audio output module of the electronic device is released, for example, the audio/video playback ends, the phone hangs up, the voice navigation ends, etc., the electronic device can collect the external environment through the audio input module In the voice. It can also be understood that the priority of the humming recognition operation provided in the embodiment of the present application is lower than the priority of other operations in the electronic device that need to occupy audio resources except the humming recognition operation.
S902、若该判定该声音的声纹信息与预存的声纹信息一致,则该电子设备向音乐识别服务器发送第一音频文件,该第一音频文件中包含该声音。S902: If it is determined that the voiceprint information of the sound is consistent with the prestored voiceprint information, the electronic device sends a first audio file to the music recognition server, and the first audio file contains the sound.
具体的,在电子设备向音乐识别服务器发送第一音频文件之前,电子设备会对该声音的声纹信息与预存的声纹信息进行匹配。若匹配成功,即该声音的声纹信息与预存的声纹信息一致,则电子设备向音乐识别服务器发送该第一音频文件,以进行哼唱识别;若匹配失败,即该声音的声纹信息与预存的声纹信息不一致,则电子设备继续通过音频输入模块采集外部环境中的声音。需要说明的是,该预存的声纹信息为,预存的从用户输入的声音信号中提取的声纹信息。具体的,该电子设备可以通过上述实施例中的用户界面55,用户界面63,用户界面73示例性所示的用户界面接收用户输入的声音;之后,电子设备对采集到的声音作声纹提取处理,再将提取出的声纹信息进行存储。Specifically, before the electronic device sends the first audio file to the music recognition server, the electronic device matches the voiceprint information of the sound with the prestored voiceprint information. If the matching is successful, that is, the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends the first audio file to the music recognition server for humming recognition; if the matching fails, the voiceprint information of the sound If it is inconsistent with the pre-stored voiceprint information, the electronic device continues to collect sounds in the external environment through the audio input module. It should be noted that the pre-stored voiceprint information is the pre-stored voiceprint information extracted from the voice signal input by the user. Specifically, the electronic device can receive the sound input by the user through the user interface 55, the user interface 63, and the user interface 73 exemplarily shown in the above-mentioned embodiment; then, the electronic device extracts voiceprints from the collected sound Process, and then store the extracted voiceprint information.
另外,该声音的声纹信息与预存的声纹信息一致,不代表该声音的声纹信息与预存的声纹信息完全相同;在该声音的声纹信息与预存的声纹信息的相似度不小于预设值(例如,90%,95%)时,则可判定该声音的声纹信息与预存的声纹信息一致。具体的,该电子设备对该声音的声纹信息与预存的声纹信息进行匹配的方式可以为:电子设备从该声音信号中提取出声纹信息,电子设备计算该提取出的声纹信息与预测的声纹信息的相似度。若该相似度大于或等于预设值,则该电子设备判定该声音的声纹信息与预存的声纹信息一致;若该相似度小于预设值,则该电子设备判定该声音的声纹信息与预存的声纹信息不一致。In addition, the voiceprint information of the voice is consistent with the prestored voiceprint information, which does not mean that the voiceprint information of the voice is exactly the same as the prestored voiceprint information; the similarity between the voiceprint information of the voice and the prestored voiceprint information is not When it is less than the preset value (for example, 90%, 95%), it can be determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information. Specifically, the electronic device may match the voiceprint information of the sound with the pre-stored voiceprint information: the electronic device extracts the voiceprint information from the sound signal, and the electronic device calculates the extracted voiceprint information and The similarity of the predicted voiceprint information. If the similarity is greater than or equal to the preset value, the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information; if the similarity is less than the preset value, the electronic device determines the voiceprint information of the sound Inconsistent with the pre-stored voiceprint information.
S903、音乐识别服务器根据该第一音频文件从音频资源库中查找第二音频文件,以及确定第二音频文件的起始播放位置。S903. The music recognition server searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file.
在一种实施例中,音乐识别服务器根据该第一音频文件从音频资源库中查找出第二音频文件的方式可以为:对第一音频文件进行特征提取,并利用提取出的特征(例如,基频序列)进行检索,从预存的音频资源库(或者称为特征数据库)中选取出和第一音频文件最相似的第二音频文件。也即,该第二音频文件的特征与该第一音频文件的特征的相似度,高于第三音频文件的特征与该声音的特征的相似度,该第三音频文件为上述音频资源库中除该第二音频文件的音频文件。In an embodiment, the way that the music recognition server finds the second audio file from the audio resource library according to the first audio file may be: extracting features of the first audio file, and using the extracted features (for example, The base frequency sequence) is searched, and the second audio file that is most similar to the first audio file is selected from the pre-stored audio resource library (or called the feature database). That is, the similarity between the feature of the second audio file and the feature of the first audio file is higher than the similarity between the feature of the third audio file and the feature of the sound, and the third audio file is in the aforementioned audio resource library The audio file except the second audio file.
可选的,音乐识别服务器可以利用自动语音识别(automatic speech recognition,ASR)技术将该第一音频文件转化为文本信息,从而确定该第一音频文件所对应的歌词信息。进一步的,音乐识别服务器可以根据识别出的文字信息确定用户哼唱音乐的进度,进而确定第二 音频文件的起始播放位置。第二音频文件的起始播放位置与该第一音频文件的结束位置相对应,因此,电子设备从该起始播放位置播放第二音频文件,可以达到跟随用户的哼唱进度播放音频的效果。Optionally, the music recognition server can use automatic speech recognition (ASR) technology to convert the first audio file into text information, so as to determine the lyrics information corresponding to the first audio file. Further, the music recognition server can determine the progress of the user's humming music according to the recognized text information, and then determine the initial playback position of the second audio file. The start playback position of the second audio file corresponds to the end position of the first audio file. Therefore, the electronic device plays the second audio file from the start playback position to achieve the effect of playing audio following the user's humming progress.
S904、音乐识别服务器向该电子设备发送该第二音频文件以及第一指示信息,该第一指示信息指示该第二音频文件的起始播放位置。S904. The music recognition server sends the second audio file and first indication information to the electronic device, where the first indication information indicates a starting playback position of the second audio file.
S905、在接收了音乐识别服务器发送的该第二音频文件以及第一指示信息之后,该电子设备通过音频输出模块从该起始播放位置播放该第二音频文件。S905. After receiving the second audio file and the first instruction information sent by the music recognition server, the electronic device plays the second audio file from the starting playback position through the audio output module.
以下将对上述方法中各个步骤的具体实施方式作进一步的补充说明。The specific implementation of each step in the above method will be further supplemented below.
在一种实施例中,在电子设备通过音频输入模块采集外部环境中的声音之前,电子设备需要判定自身的哼唱识别功能是否开启。电子设备可以通过如上述实施例中的用户界面41,用户界面42,用户界面51,用户界面52,用户界面62,用户界面72示例性所示的用户界面接收用户对哼唱识别功能的设置。若电子设备判定自身的哼唱识别功能开启,则电子设备执行通过音频输入模块采集外部环境中的声音的步骤;若电子设备判定自身的哼唱识别功能未开启,则电子设备不执行通过音频输入模块采集外部环境中的声音的步骤。In an embodiment, before the electronic device collects sounds in the external environment through the audio input module, the electronic device needs to determine whether its own humming recognition function is enabled. The electronic device may receive the user's setting of the humming recognition function through the user interface exemplarily shown in the user interface 41, the user interface 42, the user interface 51, the user interface 52, the user interface 62, and the user interface 72 in the foregoing embodiment. If the electronic device determines that its humming recognition function is enabled, the electronic device performs the step of collecting sounds in the external environment through the audio input module; if the electronic device determines that its humming recognition function is not enabled, the electronic device does not perform audio input The steps for the module to collect sounds in the external environment.
在又一种可能的实现方式中,当检测到该电子设备处于锁定状态时,该电子设备停止通过音频输入模块采集外部环境中的声音。可以理解的是,当检测到电子设备解除锁定之后,该电子设备可以通过音频输入模块采集外部环境中的声音。这种实现方式,可以参照以上实施例中对用户界面52中的开关控件556的介绍,该开关控件556可用于设定哼唱识别功能的可使用状态。通过这种方式,可以在电子设备处于锁定状态时,停止对环境声音的采集,能够降低功耗,节省电子设备的电量。In yet another possible implementation manner, when it is detected that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that after detecting that the electronic device is unlocked, the electronic device can collect sounds in the external environment through the audio input module. For this implementation manner, reference can be made to the introduction of the switch control 556 in the user interface 52 in the above embodiment, and the switch control 556 can be used to set the usable state of the humming recognition function. In this way, the collection of environmental sounds can be stopped when the electronic device is in the locked state, which can reduce power consumption and save the power of the electronic device.
在又一种可能的实现方式中,当检测到该电子设备处于预设地点时,该电子设备停止通过音频输入模块采集外部环境中的声音。可以理解的是,若检测到该电子设备不再位于预设地点时,该电子设备可通过音频输入模块采集外部环境中的声音。其中,该预设地点可以为用户设置的地点(例如,用户设置的公司所在地等等),该预设地点还可以为电子设备中预存的地点(例如,学校、医院、影院,等等)。其中,该电子设备可以通过全球定位系统(global positioning system,GPS),蓝牙(bluetooth,BT)或者无线局域网(wireless local area networks,WLAN)确定自身所在位置。这种可能的实现方式,可以参照以上实施例中,对用户界面52中的开关控件557的介绍。具体的,当该“环境勿扰”的开关控件(开关控件557)处于开启状态时,电子设备实时检测(或者按照预设周期检测)自身是否位于预设地点,若检测到该电子设备处于预设地点时,该电子设备停止通过音频输入模块采集外部环境中的声音。需要说明的是,该预设地点是不适于播放音频文件的地点,通过这种方式,可以避免在不适宜的地方播放第二音频文件的问题,并节省电子设备的电量。In another possible implementation manner, when it is detected that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that if it is detected that the electronic device is no longer located at the preset location, the electronic device can collect sounds in the external environment through the audio input module. The preset location may be a location set by the user (for example, the location of a company set by the user, etc.), and the preset location may also be a location prestored in the electronic device (for example, a school, a hospital, a theater, etc.). Wherein, the electronic device can determine its own location through a global positioning system (GPS), Bluetooth (BT) or wireless local area networks (WLAN). For this possible implementation manner, refer to the introduction of the switch control 557 in the user interface 52 in the above embodiment. Specifically, when the “Environment Do Not Disturb” switch control (switch control 557) is in the on state, the electronic device detects in real time (or detects according to a preset period) whether it is located at a preset location, and if it is detected that the electronic device is in a preset location When the location is set, the electronic device stops collecting sounds in the external environment through the audio input module. It should be noted that the preset location is a location that is not suitable for playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided and the power of the electronic device can be saved.
在又一种可能的实现方式中,当检测到环境光亮度小于预设值的持续时间,大于预设时间时,该电子设备停止通过音频输入模块采集外部环境中的声音。可以理解的是,当检测到环境光亮度大于或等于预设值的持续时间,大于预设时间时,该电子设备可通过音频输入模块采集外部环境中的声音。可选的,电子设备可以通过环境光传感器感知环境光亮度。需要说明的是,电子设备的环境光亮度小于预设值的持续时间,大于预设时间的情况,可能代表了该电子设备位于用户口袋,或者当前时间为夜晚的情况,在这种情况下,电子设备不适于播放音频文件,通过这种方式,可以避免在不适宜的地方播放第二音频文件的问题,并节省 电子设备的电量。In another possible implementation manner, when it is detected that the duration of the ambient light brightness is less than the preset value and greater than the preset time, the electronic device stops collecting sounds in the external environment through the audio input module. It is understandable that when it is detected that the duration of the ambient light brightness is greater than or equal to the preset value and greater than the preset time, the electronic device can collect the sound in the external environment through the audio input module. Optionally, the electronic device can sense the brightness of the ambient light through the ambient light sensor. It should be noted that the situation where the ambient light brightness of the electronic device is less than the preset value for the duration and greater than the preset time may represent the situation that the electronic device is in the user's pocket or the current time is night. In this case, The electronic device is not suitable for playing audio files. In this way, the problem of playing the second audio file in inappropriate places can be avoided, and the power of the electronic device can be saved.
在又一种可能的实现方式中,该电子设备在第一时间段内停止通过音频输入模块采集外部环境中的声音。其中,该第一时间段可以为预设的时间段(例如晚上11点到早上9点),该第一时间段还可以为根据用户输入的时间信息确定的时间段。其中,该第一时间段为根据用户输入的时间信息确定的时间段的这种情况可以对应于,上述实施例中,对用户界面52中的开关控件536的介绍。用户可以输入哼唱识别功能的开启时间与结束时间,该第一时间段为从结束时间到开启时间的这一个时间段。In yet another possible implementation manner, the electronic device stops collecting sounds in the external environment through the audio input module within the first time period. Wherein, the first time period may be a preset time period (for example, 11 pm to 9 am), and the first time period may also be a time period determined according to the time information input by the user. The situation that the first time period is a time period determined according to the time information input by the user may correspond to the introduction of the switch control 536 in the user interface 52 in the foregoing embodiment. The user can input the start time and end time of the humming recognition function, and the first time period is the time period from the end time to the start time.
在一种实施例中,在步骤S902中,在电子设备判定采集的声音的声纹信息与预存的声纹信息是否一致之前,电子设备可以判断该声音信号是否为人声。若该电子设备判断该声音为人声,则该电子设备再判断采集的声音的声纹信息与预存的声纹信息是否一致;若该电子设备判定该声音不为人声,则该电子设备继续通过音频输入模块采集外部环境中的声音。其中,电子设备判断声音是否为人声的方式可以为:电子设备计算该声音的频率,若该频率位于预设频率范围内,则电子设备判定该声音为人声;若该频率不位于预设频率范围内,则电子设备判定该声音不为人声。该预设频率范围可以根据需求进行设定,举例而言,由于男声的基准音区为64Hz~523Hz,女声的基准音区为160Hz~1200Hz,该预设频率范围可以为64Hz~1200Hz。In one embodiment, in step S902, before the electronic device determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information, the electronic device may determine whether the voice signal is a human voice. If the electronic device determines that the sound is a human voice, the electronic device then determines whether the voiceprint information of the collected sound is consistent with the pre-stored voiceprint information; if the electronic device determines that the sound is not a human voice, the electronic device continues to pass the audio The input module collects sounds in the external environment. The method for the electronic device to determine whether the sound is a human voice may be: the electronic device calculates the frequency of the sound, and if the frequency is within a preset frequency range, the electronic device determines that the sound is a human voice; if the frequency is not within the preset frequency range , The electronic device determines that the sound is not a human voice. The preset frequency range can be set according to requirements. For example, since the reference range of male voices is 64 Hz to 523 Hz and the reference range of female voices is 160 Hz to 1200 Hz, the preset frequency range may be 64 Hz to 1200 Hz.
在一种实施例中,电子设备还可以通过摄像头获取用户的口型信息。举例而言,电子设备可以通过如上述实施例中的用户界面52以及用户界面53示例性所示的用户界面接收用户对摄像头的访问权限的设置。可选的,电子设备的摄像头的开启状态可以与哼唱识别功能的开启状态保持一致。可选的,在电子设备通过摄像头获取用户的口型信息之前,该电子设备判断该声音是否为人声。若判定该声音为人声,电子设备可以通过摄像头获取用户的口型信息。判断该声音是否为人声的方式,可参照上述的描述,此处不再赘述。这种方式,可以降低电子设备的功耗,节约电子设备的电量。In an embodiment, the electronic device may also obtain the user's mouth shape information through a camera. For example, the electronic device may receive the setting of the user's access authority to the camera through the user interface exemplarily shown in the user interface 52 and the user interface 53 in the foregoing embodiment. Optionally, the on state of the camera of the electronic device may be consistent with the on state of the humming recognition function. Optionally, before the electronic device obtains the user's mouth shape information through the camera, the electronic device determines whether the sound is a human voice. If it is determined that the sound is a human voice, the electronic device can obtain the user's mouth shape information through the camera. For the method of judging whether the sound is a human voice, refer to the above description, which will not be repeated here. In this way, the power consumption of the electronic device can be reduced, and the power of the electronic device can be saved.
在这种情况下,音乐识别服务器还可以接收哼唱识别服务器发送的口型信息,音乐识别服务器能够根据口型信息确定文字信息,结合口型确定的文字信息与第一音频文件共同确定最终的识别结果。也即,该第二音频文件对应的文本信息与该口型信息对应的文本信息的相似度,高于该第三音频文件对应的文本信息与该口型信息对应的文本信息的相似度。通过这种方式能够进一步提升识别第二音频文件的准确性。In this case, the music recognition server can also receive the lip shape information sent by the humming recognition server. The music recognition server can determine the text information based on the lip shape information, and combine the text information determined by the lip shape and the first audio file to determine the final Recognition results. That is, the similarity between the text information corresponding to the second audio file and the text information corresponding to the lip shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information. In this way, the accuracy of identifying the second audio file can be further improved.
在一种实施例中,在该音乐识别服务器根据该第一音频文件从音频资源库中查找出第二音频文件之前,该音乐识别服务器判断该第一音频文件是否为音乐片段。可选的,该音乐识别服务器音乐片段可以根据第一音频文件对应的文字信息,以及音频文件中连续的文字之间的多个间隔时间判断该第一音频文件是否为音乐片段。需要说明的是,音乐识别服务器中预存有个音频文件对应的文字信息(可以理解为歌词),以及音频文件中连续的文字之间的多个间隔时间。若该第一音频文件对应的文字信息与预存的一个或多个音频文件对应的文字信息的相似度不小于预设值,并且该第一音频文件中连续的文字之间的多个间隔时间与该一个或多个音频文件中连续的文字之间的多个间隔时间的相似度不小于预设值,则判定该第一音频文件为音乐片段。具体的,当该音乐识别服务器判定该声音信号为音乐片段时,音乐识别服务器根据该第一音频文件从音频资源库中查找出第二音频文件。可选的,该第二音频文件包 含于该一个或多个音频文件中。当该音乐识别服务器判断该声音信号不为音乐片段时,音乐识别服务器反馈给电子设备该声音信号不为音乐片段的结果。In an embodiment, before the music recognition server finds the second audio file from the audio resource library according to the first audio file, the music recognition server determines whether the first audio file is a music fragment. Optionally, the music recognition server music segment may determine whether the first audio file is a music segment based on text information corresponding to the first audio file and multiple intervals between consecutive texts in the audio file. It should be noted that the music recognition server prestores text information corresponding to an audio file (which can be understood as lyrics), and multiple intervals between consecutive text in the audio file. If the similarity between the text information corresponding to the first audio file and the text information corresponding to one or more pre-stored audio files is not less than the preset value, and the multiple intervals between consecutive text in the first audio file are equal to The similarity of multiple intervals between consecutive words in the one or more audio files is not less than a preset value, and then it is determined that the first audio file is a music segment. Specifically, when the music recognition server determines that the sound signal is a music fragment, the music recognition server finds the second audio file from the audio resource library according to the first audio file. Optionally, the second audio file is included in the one or more audio files. When the music recognition server determines that the sound signal is not a music segment, the music recognition server feeds back to the electronic device the result that the sound signal is not a music segment.
在一种可能的实施方式中,该第二音频文件的标签包含于第一用户的用户标签。其中,该第一用户为电子设备中登录的用户,或者为使用该电子设备的用户,该音乐识别服务器中预存有该第一用户的用户标签。通过这种方式,可以使得第二音频文件更加符合用户的喜好,提升用户体验。In a possible implementation manner, the tag of the second audio file is included in the user tag of the first user. Wherein, the first user is a user who logs in in an electronic device, or is a user who uses the electronic device, and a user tag of the first user is pre-stored in the music recognition server. In this way, the second audio file can be made more in line with the user's preferences, and the user experience can be improved.
在一种实施例中,在该电子设备通过音频输出模块从该起始播放位置播放该第二音频文件之前,需要确定该电子设备的位置信息与预设地点是否一致。具体的,若该电子设备判定该电子设备的位置与预设地点不一致,则该电子设备通过该音频输出模块从该起始播放位置播放该第二音频文件。可选的,若电子设备确定自身所在的位置与预设地点一致,电子设备可以仅显示哼唱识别结果,但不对该音频文件进行播放,显示哼唱识别结果的用户界面可参照上述实施例中介绍的用户界面21、用户界面31、用户界面32、用户界面33、用户界面34、用户界面81以及用户界面82,此处不再赘述。该预设地点的含义以及预设地点的确定方式,可参照前述介绍的内容,此处不再赘述。这种可能的实现方式,可以参照以上实施例中,对用户界面52中的开关控件557的介绍。具体的,当该“环境勿扰”的开关控件(开关控件557)处于开启状态时,电子设备在播放音频文件之前,需要确定自身所在的位置不为预设地点。通过这种方式,可以避免在不适宜的地方播放第二音频文件的问题,并节省电子设备的电量。In an embodiment, before the electronic device plays the second audio file from the starting playback position through the audio output module, it is necessary to determine whether the location information of the electronic device is consistent with the preset location. Specifically, if the electronic device determines that the location of the electronic device is inconsistent with the preset location, the electronic device plays the second audio file from the starting playback position through the audio output module. Optionally, if the electronic device determines that its location is consistent with the preset location, the electronic device may only display the humming recognition result, but not play the audio file, and the user interface for displaying the humming recognition result can refer to the above embodiment The introduced user interface 21, user interface 31, user interface 32, user interface 33, user interface 34, user interface 81, and user interface 82 will not be repeated here. For the meaning of the preset location and the determination method of the preset location, please refer to the content introduced above, which will not be repeated here. For this possible implementation manner, refer to the introduction of the switch control 557 in the user interface 52 in the above embodiment. Specifically, when the switch control (switch control 557) of "Environment Do Not Disturb" is in the on state, the electronic device needs to determine that its location is not a preset location before playing the audio file. In this way, the problem of playing the second audio file in an inappropriate place can be avoided, and the power of the electronic device can be saved.
在又一种可能的实现方式中,在电子设备播放第二音频文件之前,确定电子设备所在环境的环境音量,该电子设备根据该环境音量确定播放该第二音频文件的音量。具体的,该环境音量越大,该电子设备播放第二音频文件的音量越大,该环境音量越小,该电子设备播放第二音频文件的音量越小。In another possible implementation manner, before the electronic device plays the second audio file, the environmental volume of the environment where the electronic device is located is determined, and the electronic device determines the volume at which the second audio file is played according to the environmental volume. Specifically, the greater the environmental volume, the greater the volume at which the electronic device plays the second audio file, and the lower the environmental volume, the lower the volume at which the electronic device plays the second audio file.
在一种可能的实现方式中,在该电子设备通过音频输出模块从该起始播放位置播放该第二音频文件之后,该方法还包括:该电子设备显示第二音频文件的标识信息,以及播放控件;其中,该播放控件的显示状态为第一状态,该第一状态表示该第二音频文件正在被播放;若该电子设备检测到作用于处于该第一状态的该播放控件的第一用户操作,响应于该第一用户操作,该电子设备暂停播放该第二音频文件,并将该播放控件的显示状态设为第二状态,该第二状态表示该第二音频文件暂停播放。可选的,若该电子设备检测到作用于处于该第二状态的该播放控件的第二用户操作,响应于该第二用户操作,该电子设备继续播放该第二音频文件,并将该播放控件的显示状态设为第一状态。另外,该电子设备显示第二音频文件的标识信息,以及播放控件的用户界面,可参照上述实施例中介绍的用户界面21、用户界面31、用户界面32、用户界面33、用户界面34、用户界面81以及用户界面82,此处不再赘述。In a possible implementation manner, after the electronic device plays the second audio file from the start playback position through the audio output module, the method further includes: the electronic device displays the identification information of the second audio file, and playing Control; wherein the display state of the playback control is a first state, and the first state indicates that the second audio file is being played; if the electronic device detects a first user acting on the playback control in the first state Operation, in response to the first user operation, the electronic device pauses playing the second audio file, and sets the display state of the playback control to a second state, which indicates that the second audio file is paused. Optionally, if the electronic device detects a second user operation acting on the playback control in the second state, in response to the second user operation, the electronic device continues to play the second audio file, and plays the second audio file The display state of the control is set to the first state. In addition, the electronic device displays the identification information of the second audio file and the user interface of the playback control. Refer to the user interface 21, the user interface 31, the user interface 32, the user interface 33, the user interface 34, and the user interface introduced in the above embodiment. The interface 81 and the user interface 82 are not repeated here.
在一种可能的实现方式中,电子设备从开始播放第二音频文件的时刻到预设时刻(例如,第5秒,第6秒等时间值)的时间段内,将使播放第二音频文件的音量由低到高逐渐增大。例如,从音量的最小值逐渐增大到用户设定的音量值,或者,从用户设定的音量值的30%逐渐增大到用户设定的音量值的100%,还可以存在其他的音量增大方式,本申请实施例不作限制。In a possible implementation manner, the electronic device will play the second audio file within the time period from the moment when it starts to play the second audio file to the preset moment (for example, the 5th second, the 6th second, etc.) The volume gradually increases from low to high. For example, gradually increase from the minimum volume value to the volume value set by the user, or gradually increase from 30% of the volume value set by the user to 100% of the volume value set by the user, and there may be other volume levels The increase mode is not limited in the embodiment of this application.
在一种可能的实现方式中,在该电子设备通过音频输出模块从该起始播放位置播放该第 二音频文件之后,该电子设备还可以检测该第二音频文件是否存储在预存的音乐文件夹中,若是,该电子设备可以在播放完该第二音频文件之后,播放该音乐文件夹中的其他音频文件。In a possible implementation manner, after the electronic device plays the second audio file from the starting playback position through the audio output module, the electronic device may also detect whether the second audio file is stored in a pre-stored music folder If yes, the electronic device can play other audio files in the music folder after playing the second audio file.
可以理解的,关于图9所述方法的各个步骤的具体实现方式,可参考前述图1A-图8B所述的实施例,这里不赘述。It is understandable that, for the specific implementation of each step of the method described in FIG. 9, reference may be made to the embodiments described in FIGS. 1A to 8B, which are not repeated here.
在一种实施例中,本申请提供的哼唱识别方法还可以应用在开放平台中。具体的,该开放平台获取第一音频文件,该第一音频文件中包括外部环境中的声音;若该开放平台判定该第一音频文件的声纹信息与预存的声纹信息一致,则该开放平台根据该第一音频文件从音频资源库中查找第二音频文件,以及确定该第二音频文件的起始播放位置;其中,该第二音频文件的特征与该第一音频文件的特征的相似度,高于第三音频文件的特征与该声音的特征的相似度,该第三音频文件为上述音频资源库中除该第二音频文件的音频文件,该第二音频文件的起始播放位置与该第一音频文件的结束位置相对应;该开放平台从该起始播放位置播放该第二音频文件,或该开发平台控制电子设备的其他应用程序从该起始播放位置播放该第二音频文件。In an embodiment, the humming recognition method provided in this application can also be applied in an open platform. Specifically, the open platform obtains a first audio file, and the first audio file includes sounds in the external environment; if the open platform determines that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, the open The platform searches for the second audio file from the audio resource library according to the first audio file, and determines the initial playback position of the second audio file; wherein the characteristics of the second audio file are similar to those of the first audio file The degree of similarity is higher than the similarity between the feature of the third audio file and the feature of the sound. The third audio file is the audio file except the second audio file in the above audio resource library, and the start playback position of the second audio file Corresponds to the end position of the first audio file; the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device to play the second audio from the start playback position file.
其中,该开放平台是提供开放应用程序编程接口(applicationprogramming interface,API)或函数(function)的平台。也即,该开放平台可具备提供有API的应用程序的功能,或者函数的功能。可选的,该开放平台可以通过调用API(或者函数)以实现上述图9中电子设备和音乐识别服务器所执行的方法。举例而言,开放平台可以是语音助手平台,可以仅包括电子设备侧的语音助手,也可以包括电子设备侧和服务器侧与语音助手直接关联的平台,还可以仅是服务器侧与语音助手关联的平台,本发明实施例不做具体限定。开放平台获取第一音频的方式可以为,该开放平台通过自身所在装置的音频输入模块获取第一音频文件,或者,该开放平台接收与它自身具有连接关系的电子设备发送的第一音频文件。可选的,电子设备可主动向该开放平台发送第一音频文件,或者,开放平台主动从电子设备中获取第一音频文件。之后,该开放平台调用具有声纹识别功能的API(或者函数)判断该第一音频文件的声纹信息与预存的声纹信息是否一致,若判定该第一音频文件的声纹信息与预存的声纹信息一致,该开放文件调用具有哼唱识别功能的API(或者函数)以实现根据第一音频文件从音频资源库中查找第二音频文件。之后,该开放平台通过自身所在装置的音频输出模块从该起始播放位置播放该第二音频文件,或者,该开发平台控制电子设备的其他应用程序从该起始播放位置播放该第二音频文件。可选的,该开放平台可以向电子设备发送第二音频文件以及第一指示信息,该第一指示信息包括该起始播放位置,该第一指示信息用于指示该电子设备从该起始播放位置播放该第二音频文件。Wherein, the open platform is a platform that provides an open application programming interface (API) or function. That is, the open platform may have the function of an application program provided with an API, or the function of a function. Optionally, the open platform can implement the method executed by the electronic device and the music recognition server in FIG. 9 by calling an API (or function). For example, the open platform may be a voice assistant platform, which may include only the voice assistant on the electronic device side, or a platform directly associated with the voice assistant on the electronic device side and the server side, or only the voice assistant on the server side. Platform, the embodiment of the present invention does not specifically limit it. The way for the open platform to obtain the first audio may be that the open platform obtains the first audio file through the audio input module of the device where it is located, or the open platform receives the first audio file sent by an electronic device connected to itself. Optionally, the electronic device may actively send the first audio file to the open platform, or the open platform may actively obtain the first audio file from the electronic device. After that, the open platform calls the API (or function) with the voiceprint recognition function to determine whether the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information. If it is determined that the voiceprint information of the first audio file is consistent with the pre-stored voiceprint information, The voiceprint information is consistent, and the open file calls an API (or function) with a humming recognition function to find the second audio file from the audio resource library according to the first audio file. After that, the open platform plays the second audio file from the start playback position through the audio output module of the device where it is located, or the development platform controls other applications of the electronic device to play the second audio file from the start playback position . Optionally, the open platform may send a second audio file and first instruction information to the electronic device, where the first instruction information includes the start playback position, and the first instruction information is used to instruct the electronic device to play from the start Position to play the second audio file.
需要说明的是,该开放平台执行本申请实施例提供的哼唱识别的方式,均可参照图9中所述方法的各个步骤的具体实现方式,此处不再赘述。It should be noted that, for the manner in which the open platform executes the humming recognition provided in the embodiment of the present application, reference may be made to the specific implementation manner of each step of the method described in FIG. 9, which will not be repeated here.
本申请的各实施方式可以任意进行组合,以实现不同的技术效果。The various embodiments of the present application can be combined arbitrarily to achieve different technical effects.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站 站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in this application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk).
总之,以上所述仅为本申请技术方案的实施例而已,并非用于限定本申请的保护范围。凡根据本申请的揭露,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。In short, the above descriptions are only examples of the technical solutions of the present application, and are not used to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made according to the disclosure of this application shall be included in the protection scope of this application.

Claims (32)

  1. 一种哼唱识别方法,其特征在于,包括:A humming recognition method, which is characterized in that it comprises:
    电子设备通过音频输入模块采集外部环境中的声音;The electronic device collects the sound in the external environment through the audio input module;
    若所述电子设备判定所述声音的声纹信息与预存的声纹信息一致,则所述电子设备向音乐识别服务器发送第一音频文件,所述第一音频文件中包含所述声音,所述音乐识别服务器用于根据所述第一音频文件从音频资源库中查找出第二音频文件,以及确定所述第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;If the electronic device determines that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends a first audio file to the music recognition server, the first audio file contains the sound, and the The music recognition server is used to find the second audio file from the audio resource library according to the first audio file, and determine the initial playback position of the second audio file; wherein, the characteristics of the second audio file and the The similarity of the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound, and the third audio file is the audio except the second audio file in the audio resource library. File, the start playback position of the second audio file corresponds to the end position of the first audio file;
    所述电子设备接收所述音乐识别服务器发送的所述第二音频文件以及第一指示信息,所述第一指示信息指示所述第二音频文件的起始播放位置;Receiving, by the electronic device, the second audio file and first indication information sent by the music recognition server, where the first indication information indicates the starting playback position of the second audio file;
    所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件。The electronic device plays the second audio file from the start playback position through the audio output module.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    所述电子设备通过摄像头获取用户的口型信息;The electronic device obtains the user's mouth shape information through a camera;
    若所述声音的声纹信息与预存的声纹信息一致,则所述电子设备向音乐识别服务器发送所述口型信息;If the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the electronic device sends the lip shape information to the music recognition server;
    其中,所述音乐识别服务器还用于将所述口型信息转化为文本信息,所述根据所述第一音频文件从音频资源库中查找出第二音频文件,包括:Wherein, the music recognition server is further configured to convert the lip shape information into text information, and the search for a second audio file from an audio resource library according to the first audio file includes:
    根据所述第一音频文件和所述口型信息对应的文本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。Find a second audio file from the audio resource library according to the text information corresponding to the first audio file and the lip shape information, where the text information corresponding to the second audio file and the text corresponding to the lip shape information The similarity of the information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.
  3. 根据权利要求2所述的方法,其特征在于,所述电子设备通过摄像头获取用户的口型信息,包括:The method according to claim 2, wherein the electronic device acquiring the user's mouth shape information through a camera comprises:
    若所述电子设备判定所述声音为人声,则通过摄像头获取用户的口型信息。If the electronic device determines that the sound is a human voice, the user's mouth shape information is acquired through a camera.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述电子设备通过音频输入模块采集外部环境中的声音,包括:The method according to any one of claims 1 to 3, wherein the electronic device collects sound in the external environment through an audio input module, comprising:
    若所述电子设备判定所述音频输入模块和/或所述音频输出模块未被占用,则所述电子设备通过所述音频输入模块采集外部环境中的声音。If the electronic device determines that the audio input module and/or the audio output module is not occupied, the electronic device collects sounds in the external environment through the audio input module.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述第二音频文件的标签包含于第一用户的用户标签。The method according to any one of claims 1 to 4, wherein the tag of the second audio file is included in the user tag of the first user.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,在所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件之后,所述方法还包括:The method according to any one of claims 1-5, wherein after the electronic device plays the second audio file from the starting playback position through an audio output module, the method further comprises:
    所述电子设备显示第二音频文件的标识信息,以及播放控件;The electronic device displays the identification information of the second audio file and the playback control;
    其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;
    若所述电子设备检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,所述电子设备暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。If the electronic device detects a first user operation acting on the playback control in the first state, in response to the first user operation, the electronic device pauses playing the second audio file, and changes The display state of the playback control is set to a second state, and the second state indicates that the playback of the second audio file is paused.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-6, wherein the method further comprises:
    当检测到所述电子设备处于锁定状态时,所述电子设备停止通过所述音频输入模块采集外部环境中的声音。When detecting that the electronic device is in a locked state, the electronic device stops collecting sounds in the external environment through the audio input module.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-7, wherein the method further comprises:
    当检测到所述电子设备处于预设地点时,所述电子设备停止通过所述音频输入模块采集外部环境中的声音。When detecting that the electronic device is at a preset location, the electronic device stops collecting sounds in the external environment through the audio input module.
  9. 根据权利要求1-7任一项所述的方法,其特征在于,所述电子设备通过音频输出模块从所述起始播放位置播放所述第二音频文件,包括:7. The method according to any one of claims 1-7, wherein the electronic device playing the second audio file from the starting playback position through an audio output module comprises:
    若所述电子设备判定所述电子设备的位置与预设地点不一致,则所述电子设备通过所述音频输出模块从所述起始播放位置播放所述第二音频文件。If the electronic device determines that the location of the electronic device is not consistent with the preset location, the electronic device plays the second audio file from the starting playback position through the audio output module.
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-9, wherein the method further comprises:
    所述电子设备在第一时间段内停止通过所述音频输入模块采集外部环境中的声音。The electronic device stops collecting sounds in the external environment through the audio input module within the first time period.
  11. 一种电子设备,其特征在于,包括音频输入模块,音频输出模块,处理器,存储器,其中:An electronic device, characterized by comprising an audio input module, an audio output module, a processor, and a memory, wherein:
    所述存储器用于存储程序指令;The memory is used to store program instructions;
    所述处理器用于根据所述程序指令执行以下操作:The processor is configured to perform the following operations according to the program instructions:
    通过音频输入模块采集外部环境中的声音;Collect the sound in the external environment through the audio input module;
    若判定所述声音的声纹信息与预存的声纹信息一致,则向音乐识别服务器发送第一音频文件,所述第一音频文件中包含所述声音,所述音乐识别服务器用于根据所述第一音频文件从音频资源库中查找出第二音频文件,以及确定第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;If it is determined that the voiceprint information of the sound is consistent with the pre-stored voiceprint information, the first audio file is sent to the music recognition server, and the first audio file contains the sound, and the music recognition server is used for The first audio file finds the second audio file from the audio resource library, and determines the initial playback position of the second audio file; wherein the similarity between the features of the second audio file and the features of the first audio file , Higher than the similarity between the feature of the third audio file and the feature of the sound, the third audio file is an audio file in the above audio resource library except the second audio file, and the start of the second audio file The start playback position corresponds to the end position of the first audio file;
    接收所述音乐识别服务器发送的所述第二音频文件以及第一指示信息,所述第一指示信息指示所述第二音频文件的起始播放位置;Receiving the second audio file and first indication information sent by the music recognition server, where the first indication information indicates a starting playback position of the second audio file;
    通过音频输出模块从所述起始播放位置播放所述第二音频文件。Playing the second audio file from the starting playback position through an audio output module.
  12. 根据权利要求11所述的电子设备,其特征在于,所述电子设备还包括摄像头,所述处理器还用于根据所述程序指令执行以下操作:The electronic device according to claim 11, wherein the electronic device further comprises a camera, and the processor is further configured to perform the following operations according to the program instructions:
    通过摄像头获取用户的口型信息;Obtain the user's mouth shape information through the camera;
    若所述声音的声纹信息与预存的声纹信息一致,则向音乐识别服务器发送所述口型信息;If the voiceprint information of the sound is consistent with the pre-stored voiceprint information, sending the lip shape information to the music recognition server;
    其中,所述音乐识别服务器还用于将所述口型信息转化为文本信息;Wherein, the music recognition server is also used to convert the mouth shape information into text information;
    所述音乐识别服务器还具体用于:根据所述第一音频文件和所述口型信息对应的文本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。The music recognition server is further specifically configured to: search for a second audio file from an audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text corresponding to the second audio file The similarity between the information and the text information corresponding to the lip shape information is higher than the similarity between the text information corresponding to the third audio file and the text information corresponding to the lip shape information.
  13. 根据权利要求12所述的电子设备,其特征在于,所述处理器具体用于根据所述程序指令执行以下操作:The electronic device according to claim 12, wherein the processor is specifically configured to perform the following operations according to the program instructions:
    若判定所述声音为人声,则通过摄像头获取用户的口型信息。If it is determined that the voice is a human voice, the user's mouth shape information is obtained through the camera.
  14. 根据权利要求11-13任一项所述的电子设备,其特征在于,所述处理器具体用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-13, wherein the processor is specifically configured to perform the following operations according to the program instructions:
    若判定所述音频输入模块和/或音频输出模块未被占用,则通过音频输入模块采集外部环境中的声音。If it is determined that the audio input module and/or the audio output module are not occupied, the audio input module is used to collect sounds in the external environment.
  15. 根据权利要求11-14任一项所述的电子设备,其特征在于,所述第二音频文件的标签包含于第一用户的用户标签。The electronic device according to any one of claims 11-14, wherein the tag of the second audio file is included in the user tag of the first user.
  16. 根据权利要求11-15中任一项所述的电子设备,其特征在于,所述电子设备还包括显示屏,所述处理器还用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-15, wherein the electronic device further comprises a display screen, and the processor is further configured to perform the following operations according to the program instructions:
    通过显示屏显示第二音频文件的标识信息,以及播放控件;Display the identification information of the second audio file and the playback controls on the display screen;
    其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;
    若检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。If a first user operation acting on the play control in the first state is detected, in response to the first user operation, pause the play of the second audio file, and set the display state of the play control The second state indicates that the second audio file is paused.
  17. 根据权利要求11-16任一项所述的电子设备,其特征在于,所述处理器还用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-16, wherein the processor is further configured to perform the following operations according to the program instructions:
    当检测到所述电子设备处于锁定状态时,停止通过音频输入模块采集外部环境中的声音。When detecting that the electronic device is in a locked state, stop collecting sounds in the external environment through the audio input module.
  18. 根据权利要求11-17任一项所述的电子设备,其特征在于,所述处理器还用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-17, wherein the processor is further configured to perform the following operations according to the program instructions:
    当检测到所述电子设备处于预设地点时,停止通过音频输入模块采集外部环境中的声音。When detecting that the electronic device is at a preset location, stop collecting sounds in the external environment through the audio input module.
  19. 根据权利要求11-17任一项所述的电子设备,其特征在于,所述处理器具体用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-17, wherein the processor is specifically configured to perform the following operations according to the program instructions:
    若判定所述电子设备的位置与预设地点不一致,则通过音频输出模块从所述起始播放位置播放所述第二音频文件。If it is determined that the location of the electronic device is inconsistent with the preset location, the second audio file is played from the starting playback position through the audio output module.
  20. 根据权利要求11-19任一项所述的电子设备,其特征在于,所述处理器还用于根据所述程序指令执行以下操作:The electronic device according to any one of claims 11-19, wherein the processor is further configured to perform the following operations according to the program instructions:
    在第一时间段内停止通过音频输入模块采集外部环境中的声音。Stop collecting sounds in the external environment through the audio input module in the first time period.
  21. 一种哼唱识别方法,其特征在于,包括:A humming recognition method is characterized in that it comprises:
    开放平台获取第一音频文件,所述第一音频文件中包括外部环境中的声音;The open platform acquires a first audio file, where the first audio file includes sounds in the external environment;
    若所述开放平台判定所述第一音频文件的声纹信息与预存的声纹信息一致,则所述开放平台根据所述第一音频文件从音频资源库中查找第二音频文件,以及确定所述第二音频文件的起始播放位置;其中,所述第二音频文件的特征与所述第一音频文件的特征的相似度,高于第三音频文件的特征与所述声音的特征的相似度,所述第三音频文件为上述音频资源库中除所述第二音频文件的音频文件,所述第二音频文件的起始播放位置与所述第一音频文件的结束位置相对应;If the open platform determines that the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform searches for the second audio file from the audio resource library according to the first audio file, and determines the The initial playback position of the second audio file; wherein the similarity between the features of the second audio file and the features of the first audio file is higher than the similarity between the features of the third audio file and the features of the sound Degree, the third audio file is an audio file other than the second audio file in the audio resource library, and the start playback position of the second audio file corresponds to the end position of the first audio file;
    所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件。The open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device to play the second audio file from the start playback position.
  22. 根据权利要求21所述的方法,其特征在于,所述方法还包括:The method of claim 21, wherein the method further comprises:
    所述开放平台通过所述电子设备获取用户的口型信息;The open platform obtains the user's mouth shape information through the electronic device;
    若所述开放平台判定所述第一音频文件的声纹信息与预存的声纹信息一致,所述开放平台将所述口型信息转化为文本信息;If the open platform determines that the voiceprint information of the first audio file is consistent with the prestored voiceprint information, the open platform converts the lip-shape information into text information;
    所述根据所述第一音频文件从音频资源库中查找出第二音频文件,包括:The searching for a second audio file from an audio resource library according to the first audio file includes:
    根据所述第一音频文件和所述口型信息对应的文本信息从音频资源库中查找出第二音频文件,其中,所述第二音频文件对应的文本信息与所述口型信息对应的文本信息的相似度,高于所述第三音频文件对应的文本信息与所述口型信息对应的文本信息的相似度。Find a second audio file from the audio resource library according to the text information corresponding to the first audio file and the lip shape information, wherein the text information corresponding to the second audio file and the text corresponding to the lip shape information The similarity of the information is higher than the similarity of the text information corresponding to the third audio file and the text information corresponding to the mouth shape information.
  23. 根据权利要求22所述的方法,其特征在于,所述开放平台通过所述电子设备获取用户的口型信息,包括:The method according to claim 22, wherein said open platform acquiring user's mouth shape information through said electronic device comprises:
    若所述开放平台判定所述第一音频文件中包括的声音为人声,则通过所述电子设备获取用户的口型信息。If the open platform determines that the sound included in the first audio file is a human voice, the user's mouth shape information is acquired through the electronic device.
  24. 根据权利要求21-23任一项所述的方法,其特征在于,开放平台获取第一音频文件,包括:The method according to any one of claims 21-23, wherein the open platform obtaining the first audio file comprises:
    若所述音频输入模块和/或音频输出模块未被其他应用占用,则所述开放平台获取第一音频文件。If the audio input module and/or audio output module is not occupied by other applications, the open platform obtains the first audio file.
  25. 根据权利要求21-24任一项所述的方法,其特征在于,所述第二音频文件的标签包含于第一用户的用户标签。The method according to any one of claims 21-24, wherein the tag of the second audio file is included in the user tag of the first user.
  26. 根据权利要求21-25中任一项所述的方法,其特征在于,在所述开放平台从所述起始播放位置播放所述第二音频文件之后,所述方法还包括:The method according to any one of claims 21-25, wherein after the open platform plays the second audio file from the starting playback position, the method further comprises:
    所述开放平台通过电子设备显示第二音频文件的标识信息,以及播放控件;The open platform displays the identification information of the second audio file and the playback control through the electronic device;
    其中,所述播放控件的显示状态为第一状态,所述第一状态表示所述第二音频文件正在被播放;Wherein, the display state of the playback control is a first state, and the first state indicates that the second audio file is being played;
    若所述开放平台检测到作用于处于所述第一状态的所述播放控件的第一用户操作,响应于所述第一用户操作,所述开放平台暂停播放所述第二音频文件,或控制电子设备的其他应用程序暂停播放所述第二音频文件,并将所述播放控件的显示状态设为第二状态,所述第二状态表示所述第二音频文件暂停播放。If the open platform detects a first user operation acting on the playback control in the first state, in response to the first user operation, the open platform suspends playing the second audio file, or controls Other applications of the electronic device pause the playback of the second audio file, and set the display state of the playback control to the second state, and the second state indicates that the second audio file is paused.
  27. 根据权利要求21-26任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 21-26, wherein the method further comprises:
    当检测到所述电子设备处于锁定状态时,所述开放平台停止获取第一音频文件。When detecting that the electronic device is in a locked state, the open platform stops acquiring the first audio file.
  28. 根据权利要求21-27任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 21-27, wherein the method further comprises:
    当检测到所述电子设备处于预设地点时,所述开放平台停止获取第一音频文件。When it is detected that the electronic device is at a preset location, the open platform stops acquiring the first audio file.
  29. 根据权利要求21-27任一项所述的方法,其特征在于,所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件,包括:The method according to any one of claims 21-27, wherein the open platform plays the second audio file from the start playback position, or the development platform controls other applications of the electronic device from Playing the second audio file at the start playback position includes:
    若所述开放平台判定所述电子设备的位置与预设地点不一致,则所述开放平台从所述起始播放位置播放所述第二音频文件,或所述开发平台控制电子设备的其他应用程序从所述起始播放位置播放所述第二音频文件。If the open platform determines that the location of the electronic device is inconsistent with the preset location, the open platform plays the second audio file from the starting playback position, or the development platform controls other applications of the electronic device Playing the second audio file from the starting playback position.
  30. 根据权利要求21-29任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 21-29, wherein the method further comprises:
    所述开放平台在第一时间段内停止获取第一音频文件。The open platform stops acquiring the first audio file in the first time period.
  31. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-10,21-30中任一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product runs on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10, 21-30 .
  32. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-10,21-30中任一项所述的方法。A computer-readable storage medium, comprising instructions, characterized in that, when the instructions are executed on an electronic device, the electronic device is caused to execute the method according to any one of claims 1-10, 21-30 .
PCT/CN2020/092802 2019-05-31 2020-05-28 Humming recognition method and related device WO2020239001A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910472410.9 2019-05-31
CN201910472410.9A CN112015943A (en) 2019-05-31 2019-05-31 Humming recognition method and related equipment

Publications (1)

Publication Number Publication Date
WO2020239001A1 true WO2020239001A1 (en) 2020-12-03

Family

ID=73506279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092802 WO2020239001A1 (en) 2019-05-31 2020-05-28 Humming recognition method and related device

Country Status (2)

Country Link
CN (1) CN112015943A (en)
WO (1) WO2020239001A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076444A (en) * 2021-03-31 2021-07-06 维沃移动通信有限公司 Song identification method and device, electronic equipment and storage medium
CN115712368A (en) * 2021-08-20 2023-02-24 华为技术有限公司 Volume display method, electronic device and storage medium
CN115602154B (en) * 2022-12-15 2023-08-11 杭州网易云音乐科技有限公司 Audio identification method, device, storage medium and computing equipment
CN116679900B (en) * 2022-12-23 2024-04-09 荣耀终端有限公司 Audio service processing method, firmware loading method and related devices

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678308A (en) * 2012-09-03 2014-03-26 许丰 Intelligent navigation player
CN104092654A (en) * 2014-01-22 2014-10-08 腾讯科技(深圳)有限公司 Media playing method, client and system
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497401A (en) * 2011-11-30 2012-06-13 上海博泰悦臻电子设备制造有限公司 Music media information acquiring method and system of vehicle-mounted music system
CN107799117A (en) * 2017-10-18 2018-03-13 倬韵科技(深圳)有限公司 Key message is identified to control the method, apparatus of audio output and audio frequency apparatus
CN108877790A (en) * 2018-05-21 2018-11-23 江西午诺科技有限公司 Speaker control method, device, readable storage medium storing program for executing and mobile terminal
CN109412910A (en) * 2018-11-20 2019-03-01 三星电子(中国)研发中心 The method and apparatus for controlling smart home device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678308A (en) * 2012-09-03 2014-03-26 许丰 Intelligent navigation player
CN104092654A (en) * 2014-01-22 2014-10-08 腾讯科技(深圳)有限公司 Media playing method, client and system
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112015943A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
RU2766255C1 (en) Voice control method and electronic device
CN110597512B (en) Method for displaying user interface and electronic equipment
WO2020177622A1 (en) Method for displaying ui assembly and electronic device
WO2021139768A1 (en) Interaction method for cross-device task processing, and electronic device and storage medium
WO2020238356A1 (en) Interface display method and apparatus, terminal, and storage medium
WO2020239001A1 (en) Humming recognition method and related device
CN114461111B (en) Function starting method and electronic equipment
WO2021000839A1 (en) Screen splitting method and electronic device
WO2021000804A1 (en) Display method and apparatus in locked state
CN110119296B (en) Method for switching parent page and child page and related device
CN109819306B (en) Media file clipping method, electronic device and server
WO2021249087A1 (en) Card sharing method, electronic device, and communication system
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
CN111970401B (en) Call content processing method, electronic equipment and storage medium
WO2021175272A1 (en) Method for displaying application information and related device
WO2022184173A1 (en) Card widget display method, graphical user interface, and related apparatus
CN112068907A (en) Interface display method and electronic equipment
WO2022127130A1 (en) Method for adding operation sequence, electronic device, and system
CN112740148A (en) Method for inputting information into input box and electronic equipment
WO2023138305A9 (en) Card display method, electronic device, and computer readable storage medium
CN113742460A (en) Method and device for generating virtual role
WO2022089276A1 (en) Collection processing method and related apparatus
WO2022052962A1 (en) Application module startup method and electronic device
CN114518965A (en) Cut and pasted content processing method and device
WO2023160455A1 (en) Object deletion method and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20812565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20812565

Country of ref document: EP

Kind code of ref document: A1