CN113066501A - Method and device for starting terminal by voice, medium and electronic equipment - Google Patents

Method and device for starting terminal by voice, medium and electronic equipment Download PDF

Info

Publication number
CN113066501A
CN113066501A CN202110275697.3A CN202110275697A CN113066501A CN 113066501 A CN113066501 A CN 113066501A CN 202110275697 A CN202110275697 A CN 202110275697A CN 113066501 A CN113066501 A CN 113066501A
Authority
CN
China
Prior art keywords
chip
voice
audio
target
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110275697.3A
Other languages
Chinese (zh)
Inventor
陈喆
胡宁宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110275697.3A priority Critical patent/CN113066501A/en
Publication of CN113066501A publication Critical patent/CN113066501A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The disclosure provides a method and a device for starting a terminal by voice, a medium and electronic equipment, and relates to the technical field of audio signal processing. The method comprises the following steps: under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments; under the condition that the plurality of audio segments contain a first preset text, sending a target audio segment containing the first preset text to a first chip, and triggering the first chip to exit from a sleep state; loading parameters related to voice playing through the first chip so that the terminal plays a second preset text, and performing user identity authentication according to the target audio clip based on the first chip; and displaying the target graphical user interface under the condition that the user identity authentication is passed. The voice awakening method and the voice awakening device can effectively reduce the voice awakening response time and can effectively solve the problem of voiceprint drift of a user.

Description

Method and device for starting terminal by voice, medium and electronic equipment
Technical Field
The present disclosure relates to the field of audio signal processing technologies, and in particular, to a method for starting a terminal by using voice, an apparatus for starting a terminal by using voice, a computer-readable storage medium, and an electronic device.
Background
At present, the terminal is awakened through voice, and convenient use experience is provided for a user.
However, when the terminal is in a blank screen state, a Central Processing Unit (CPU) therein is in a sleep state. In the scheme of the voice-activated terminal provided by the related technology, a terminal CPU is awakened to enable the terminal to start the process of the related voice assistant, so that the problem of long time consumption exists.
Disclosure of Invention
The present disclosure provides a method for starting a terminal by voice, a device for starting a terminal by voice, a computer readable storage medium and an electronic device, thereby reducing time consumption of a process of starting a terminal by voice at least to a certain extent.
According to an aspect of the present disclosure, there is provided a method of voice-starting a terminal, including: under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments; under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, and triggering the first chip to exit from a sleep state; loading parameters related to voice playing through the first chip to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip; and displaying the target graphical user interface under the condition that the user identity authentication is passed.
According to an aspect of the present disclosure, an apparatus for voice-starting a terminal may include: the device comprises an acquisition module, a sending module, an association processing module and a display module.
Wherein the obtaining module is configured to: under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments; the above-mentioned sending module is configured to: under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, and triggering the first chip to exit from a sleep state; the association processing module is configured to: loading parameters related to voice playing through the first chip to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip; the display module is configured to: and displaying the target graphical user interface under the condition that the user identity authentication is passed.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of voice-activating a terminal as described in any one of the above.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and
a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of voice-enabled terminal of any of the above via execution of the executable instructions.
In the method, the apparatus, the computer-readable storage medium, and the electronic device for starting the terminal by using voice provided by some embodiments of the present disclosure, in a case that it is determined that a first chip of the terminal is currently in a sleep state, a voice signal is received and segmented, and a plurality of audio segments are obtained. Then, when it is determined that the plurality of audio segments include a first preset text (e.g., "xiao a"), a target audio segment including the first preset text is sent to the first chip, and the first chip is triggered to exit from the sleep state.
Since the relevance between the playing of the second preset text (e.g., "i am on the woolen") and whether the user identity can be authenticated is not large, that is, the second preset text is played under the condition that the user identity is not authenticated, the information security of the user is not affected. Therefore, when the user identity authentication is carried out according to the target audio clip based on the first chip, parameters related to voice playing are loaded so that the terminal plays the second preset text, and therefore the time consumed by starting the terminal is effectively saved.
Compared with the related art, the method and the device have the advantages that the user identity authentication is carried out according to the target audio clip, and then the parameters related to the voice playing are loaded so that the terminal plays the second preset text.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 schematically shows a system architecture diagram of a method or apparatus for voice-enabled terminals to which an embodiment of the present disclosure may be applied.
FIG. 2 schematically shows a block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Fig. 3 schematically shows a flowchart of a method for voice-starting a terminal in an embodiment of the present disclosure.
Fig. 4 is a schematic diagram illustrating a scenario of a method for voice-enabling a terminal according to an exemplary embodiment of the disclosure.
Fig. 5 is a schematic diagram illustrating a scenario of a method for voice-enabling a terminal in an exemplary embodiment of the related art.
A flow diagram of a first text verification method in an exemplary embodiment of the disclosure is schematically illustrated in fig. 6.
Fig. 7 schematically shows a flowchart of a voiceprint verification method in an exemplary embodiment of the disclosure.
Fig. 8 is a block diagram schematically illustrating an apparatus for voice-enabling a terminal in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. In addition, all of the following terms "first" and "second" are used for distinguishing purposes only and should not be construed as limiting the present disclosure.
Fig. 1 schematically shows a system architecture diagram of a method or apparatus for voice-enabled terminals to which an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminals 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminals 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminals 101, 102, 103 may be various terminal devices having a display screen. It should be understood that the number of terminals, networks, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, and servers, as desired for an implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
Based on the above system architecture, the following describes in detail an embodiment of the method for starting a terminal by voice according to the present technical solution:
the use scenario of the method for starting the terminal by voice provided by the technical scheme can be that the terminal is awakened by a user, wherein the user who can successfully awaken the terminal is the user who has performed identity authentication at the terminal, and the terminal can be started after the terminal is successfully identified when the user sends the sound of the preset text. Illustratively, the terminal 101 may be of various types that can be used to play video or image browsing phones, tablets, desktop computers, in-vehicle devices, wearable devices, and the like.
FIG. 2 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, cause the processor to implement the method of voice-activated terminal of the exemplary embodiments of the present disclosure.
Specifically, as shown in fig. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management Module 240, a power management Module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication Module 250, a wireless communication Module 260, an audio Module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor Module 280, a display 290, a camera Module 291, a pointer 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors. Additionally, a memory may be provided in processor 210 for storing instructions and data.
The USB interface 230 is an interface conforming to the USB standard specification, and may specifically be a MiniUSB interface, a microsusb interface, a USB type c interface, or the like. The USB interface 230 may be used to connect a charger to charge the electronic device 200, and may also be used to transmit data between the electronic device 200 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface can also be used for connecting other electronic equipment and the like.
The charge management module 240 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. The power management module 241 is used for connecting the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives the input of the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, the wireless communication module 260, and the like.
The wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.
The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the electronic device 200.
The Wireless Communication module 260 may provide a solution for Wireless Communication applied to the electronic device 200, including Wireless Local Area Networks (WLANs) (e.g., Wireless Fidelity (Wi-Fi) network), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like.
The electronic device 200 implements a display function through the GPU, the display screen 290, the application processor, and the like. The GPU is an image-blurring microprocessor, connected to the display screen 290 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.
The electronic device 200 may implement a shooting function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290, the application processor, and the like. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, where N is a positive integer greater than 1, and if the electronic device 200 includes N cameras, one of the N cameras is a main camera, and the others may be sub cameras, such as a telephoto camera.
Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include a program storage area and a data storage area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 200.
The electronic device 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the headphone interface 274, the application processor, and the like. Such as music playing, recording, etc.
Audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. Audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.
The speaker 271 is used to convert the audio electric signal into a voice signal. The electronic apparatus 200 can listen to music through the speaker 271 or listen to a handsfree phone call. The receiver 272, also called "earpiece", is used to convert the audio electrical signal into a speech signal. When the electronic device 200 receives a call or a voice signal, it can receive the voice by placing the receiver 272 close to the human ear. The microphone 273, also known as a "microphone," is used to convert voice signals into electrical signals. When making a call or transmitting a voice signal, the user can input the voice signal to the microphone 273 by sounding a sound near the microphone 273 through the mouth of the person. The electronic device 200 may be provided with at least one microphone 273. The earphone interface 274 is used to connect wired earphones.
For sensors included with the electronic device 200, a depth sensor is used to obtain depth information of the scene. The pressure sensor is used for sensing a pressure signal and converting the pressure signal into an electric signal. The gyro sensor may be used to determine the motion pose of the electronic device 200. The air pressure sensor is used for measuring air pressure. The magnetic sensor includes a hall sensor. The electronic device 200 may detect the opening and closing of the flip holster using a magnetic sensor. The acceleration sensor may detect the magnitude of acceleration of the electronic device 200 in various directions (typically three axes). The distance sensor is used for measuring distance. The proximity light sensor may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The fingerprint sensor is used for collecting fingerprints. The temperature sensor is used for detecting temperature. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through the display screen 290. The ambient light sensor is used for sensing the ambient light brightness. The bone conduction sensor may acquire a vibration signal.
The keys 294 include a power-on key, a volume key, and the like. The keys 294 may be mechanical keys. Or may be touch keys. The motor 293 may generate a vibration indication. The motor 293 may be used for both electrical vibration prompting and touch vibration feedback. Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc. The SIM card interface 295 is used to connect a SIM card. The electronic device 200 interacts with the network through the SIM card to implement functions such as communication and data communication.
The present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device.
A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
In the embodiment of the disclosure, a method for starting a terminal by voice is provided first. A flow chart of the method of the voice-activated terminal is schematically shown in fig. 3. Specifically, the method for starting the terminal by voice in the embodiment shown in fig. 3 includes:
step S310, receiving and segmenting a voice signal to obtain a plurality of audio segments under the condition that the first chip of the terminal is determined to be in the dormant state currently;
step S320, under the condition that it is determined that the plurality of audio segments include a first preset text, sending a target audio segment including the first preset text to the first chip, and triggering the first chip to exit from a sleep state;
step S330, loading parameters related to voice playing through the first chip so as to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip; and the number of the first and second groups,
and step S340, displaying a target graphic user interface under the condition that the user identity authentication is passed.
The technical scheme provided by the disclosure is suitable for the situation that the first chip of the terminal is in the dormant state at present (for example, the CPU is in the dormant state), and the user starts the terminal through sound, so that the terminal displays a graphical User Interface (UI) in the started state.
In the embodiment provided in fig. 3, since the relevance between the playing of the second predetermined text (e.g., "i am") and whether the user identity can be authenticated is not large, that is, the second predetermined text is played when the user identity is not authenticated, the security of the user information is not affected. Therefore, when the user identity authentication is performed according to the target audio clip based on the first chip, parameters related to voice playing are loaded so that the terminal plays the second preset text. And meanwhile, identity authentication is carried out through the first chip, so that the target graphical user interface is displayed under the condition that the user identity authentication is passed. Therefore, time consumption for starting the terminal is effectively saved.
In an exemplary embodiment, a scenario diagram of a method for voice-enabled terminal in an exemplary embodiment of the present disclosure is schematically illustrated in fig. 4. Fig. 5 is a schematic diagram illustrating a scenario of a method for voice-enabling a terminal in an exemplary embodiment of the related art.
As mentioned above, since the relevance between the playing of the second preset text (e.g., "i am") and whether the user identity can be authenticated is not large, in the present technical solution shown in fig. 4, when the user identity is not authenticated, on one hand, the user identity is authenticated according to the target audio clip based on the first chip, and on the other hand, the parameters related to the voice playing are loaded to enable the terminal to play the second preset text. Compared with the related art shown in fig. 5, the user identity authentication is performed according to the target audio clip, and then the parameters related to the voice playing are loaded to enable the terminal to play the second preset text. Obviously, the two steps in the scheme provided by the related art have a sequential execution sequence, and the technical scheme is executed simultaneously, so that the time consumed by starting the terminal by voice can be effectively saved.
Specific implementations of the various steps involved in the embodiment shown in FIG. 3 are described below:
in an exemplary embodiment, referring to fig. 4, the terminal acquires sounds within a range of a distance from the terminal through a Microphone (MIC). Further, in order to reduce the power consumption of the voice-enabled terminal, in this embodiment, the voice signal transmitted by the MIC is received by the second chip having lower power consumption than the first chip. The first chip is a CPU, and the second chip is a Digital Signal Processing (DSP). Specifically, the received voice signal is subjected to text verification by the DSP (hereinafter referred to as "first text verification" for text verification again by the CPU) to wake up the first chip in the case where the first text verification is passed.
For example, fig. 6 schematically illustrates a flowchart of a first text verification method in an exemplary embodiment of the disclosure. The embodiment shown in this figure comprises the following steps:
step S610, receiving and segmenting the voice signal through the second chip to obtain a plurality of audio segments.
In the exemplary embodiment, since the speech signal is generally continuous speech, the speech signal is segmented in the present embodiment. The second chip performs endpoint detection on the voice signal by adopting a preset endpoint detection algorithm, and then divides the voice signal into a plurality of audio segments according to an endpoint detection result.
It should be noted that the embodiment is not limited to what endpoint detection algorithm is used to perform endpoint detection on the first audio signal, and can be selected by one of ordinary skill in the art according to actual needs. For example, a Voice Activity Detection (VAD) algorithm is used to perform endpoint Detection on the first audio signal.
Step S620, extracting mel-frequency cepstrum coefficients of the audio segments through the second chip.
In an exemplary embodiment, the second chip high-pass filters the audio segment. Further, the filtered audio segment is windowed to smooth the edges of the audio segment. E.g. in the form of a hamming window. Then, the second chip extracts the mel-frequency cepstrum coefficient based on the following formula,
Figure BDA0002976543980000111
wherein, Fmel(f) Denotes mel-frequency cepstrum coefficients, and f denotes a frequency point after fourier transform.
Step S630, matching, by the second chip, mel-frequency cepstrum coefficients of the audio segments according to the gaussian-mixed general background model associated with the first preset text. And step S640, if there is a matching audio segment, determining that the plurality of audio segments include the first preset text.
In an exemplary embodiment, the first preset text is a text preset for voice-starting the terminal. For example, when the setting sound includes "xiao a", the voice assistant of the terminal may be activated to play a second preset text for response, such as "i am".
In this embodiment, a pre-trained gaussian mixture general background model associated with the first predetermined text is adopted. For example, the pre-training process may be to extract a mel-frequency cepstrum coefficient from the speech signal of the first predetermined text, and then train to obtain a gaussian-mixed general background model associated with the first predetermined text according to the mel-frequency cepstrum coefficient.
Exemplarily, the mel-frequency cepstrum coefficients of the audio segments are respectively extracted by the second chip, the mel-frequency cepstrum coefficients are input into a pre-trained gaussian-mixed general background model, the mel-frequency cepstrum coefficients of the current audio segment are identified by the gaussian-mixed general background model, and the matching probability of the current audio segment and the first preset text is output. And when the matching probability meets the preset requirement, the current audio clip is considered to be matched with the first preset text.
With the embodiment of fig. 6, a second chip (such as a DSP described above) may perform a first text verification of the received sound. Thereby verifying whether the sound contains the first preset text. Specifically, referring to fig. 4 or 5, when the verification sound includes the first preset text (i.e., the first text verification is successful) through the DSP verification, the "primary wake-up" for the terminal is realized, which may also be referred to as "screen-off wake-up". Illustratively, when a black screen in the initial state of the terminal screen is lighted, the initial desktop can be displayed. Specifically, in order to improve the use security of the terminal, for the terminal including the power-on lock, the terminal still needs to be operated after a password is input although the terminal is in a bright screen state.
Referring to fig. 4, considering that the "primary wakeup" is implemented by using a DSP with low power consumption, in order to further improve the text recognition accuracy and further facilitate the terminal security, the CPU performs the "secondary wakeup" after the primary wakeup. Specifically, under the condition that the plurality of audio segments are determined to contain the first preset text, the DSP sends an event signal at this time, triggers the first chip to exit the sleep state, and sends the target audio segment containing the first preset text to the first chip. In this embodiment, the second text verification is performed on the target audio segment, and specifically, the verification process may adopt an embodiment corresponding to steps S620 to S640 in fig. 6, which is not described herein again. It should be noted that, when it is determined that the plurality of audio segments do not include the first preset text after the first text verification, it is determined that the current sound is not used for waking up the terminal, and the first chip is kept in a sleep state, so as to avoid mistaken waking up and improve the accuracy of waking up.
In an exemplary embodiment, after the CPU performs the second text verification, it is further determined whether the sound includes the first preset text. Referring to fig. 4 or 5, in the case where the CPU verifies again that the verification sound includes the first preset text (i.e., the second text verification is successful), the "secondary wake-up" of the terminal is implemented.
Illustratively, referring to fig. 4, next, the following two aspects are performed simultaneously based on the CPU:
on one hand, parameters related to voice playing are loaded so that the terminal can play the second preset text.
In an exemplary embodiment, the voice assistant service is started, the voice assistant related parameters are loaded, and the audio which needs to be played and contains the second preset text, such as "small cloth, woolen cloth" and the like, are loaded. Therefore, the response to the user voice is made before the voiceprint verification, compared with the related technology, the response time provided for the user is shortened, and the user experience is improved.
And on the other hand, user identity authentication is carried out according to the target audio clip, namely voiceprint verification is carried out based on the CPU.
In an exemplary embodiment, a flow diagram of a voiceprint verification method in an exemplary embodiment of the disclosure is schematically illustrated in fig. 7. Voiceprint Recognition (VPR), also commonly referred to as Speaker Recognition (SR). There are generally two categories that can be distinguished, including: speaker Identification (SI) and Speaker Verification (SV). Specifically, for the SI, it is determined which one of several persons (which may be referred to as a tagged user, i.e., multiple tagged users) said a certain voice segment (which may be referred to as a user to be recognized), so as to implement identity recognition of the user to be recognized through the voice of the user to be recognized. For SVs, it is specifically confirmed whether a certain speech segment is spoken by a designated person (which may be called a tagged user, i.e. a tagged user). Both the recognition problem addressed in SI and the confirmation problem addressed in SV require that voiceprint features of the speaker (which may be referred to as the user to be identified) be extracted first.
In the related voiceprint-based identity recognition scheme, a short-time voice frequency spectrum feature method is used for extracting voiceprint features, and then identity recognition is carried out on a tester according to the extracted voiceprint features. The voiceprint feature extraction by using the short-time speech frequency spectrum feature method may cause the voiceprint feature dimension to be relatively limited, so that the difference between voiceprints of different people is insufficient, and the accuracy and the stability of identity recognition are influenced.
Aiming at the technical problems existing in the identity recognition scheme based on the voiceprint in the related technology, the technical scheme adopts a mode based on a machine learning model to extract the voiceprint characteristics. Exemplary, reference is made to the embodiment shown in fig. 7.
Step S710, obtaining a voiceprint feature of the target audio clip.
In an exemplary embodiment, the voiceprint features of the target audio segment can be obtained by a pre-trained voiceprint feature extraction model. The voiceprint feature extraction model can be a neural network model.
Step S720, inputting the voiceprint characteristics into an identity confirmation model to obtain the prediction probability of the target user to which the target audio clip belongs.
In an exemplary embodiment, the predicted probability of the target user to whom the target audio piece belongs may be obtained by a pre-trained identity confirmation model. Wherein, the identity confirmation model can be a neural network model.
It should be noted that the identity verification model is updated periodically according to the recent voice data of the user. Illustratively, training is performed every other week with the last week of voice data to update the identity confirmation model. Thereby ensuring accurate identification even if the voice of the user changes. The process of updating the correlation model will be described in the following embodiments.
Step S730, determining whether the prediction probability is greater than a first preset value. And if the predicted probability is determined to be greater than the first preset value, indicating that the identity authentication accuracy can meet the preset requirement, executing the step S740, allowing the user to pass the identity authentication, and displaying a target graphical user interface. And under the condition that the prediction probability is not larger than the first preset value, indicating that the identity authentication accuracy can not meet the preset requirement, ending the processing of the related voice signals.
The first preset value is a value set according to actual needs, and is not limited herein. The target graphical user interface is a UI displayed after the terminal is unlocked. Through the voice starting mode, the terminal can be prevented from being unlocked manually, and convenience for unlocking the terminal is provided for a user.
Through the embodiments shown in fig. 6 and 7, the received sound is subjected to the first text verification through the second chip (such as the above-mentioned DSP) in sequence, and the second text verification is performed through the first chip (such as the CPU), so that the verification accuracy of whether the sound contains the first preset text is effectively improved. Further, referring to fig. 4 or fig. 5, voiceprint authentication is performed by the CPU to implement authentication of the user identity. And under the condition that the identity authentication is passed, the voiceprint awakening of the terminal is realized. With continued reference to fig. 4, in case of passing the authentication, the CPU may launch the voice assistant Activity to pop up the UI interaction interface, so as to implement the launching of the terminal.
Through the scheme, compared with the related art, the time consumed for waking up the terminal through voice can be reduced by 700ms (namely, the starting time of the voice assistant is effectively reduced).
It should be noted that, the voice of the terminal user may change (i.e., voiceprint drift) over time, which may result in a decrease in the voiceprint wake-up rate, and in order to solve the problem of low accuracy of voiceprint verification caused by the voiceprint drift problem, the technical solution enriches the training database of the identity confirmation model continuously through the recent voice signal of the user, so that the continuously updated identity confirmation model can determine the identity of the user more accurately through the voiceprint feature.
In an exemplary embodiment, the predicted probability of the target user to which the audio clip belongs is used as a measure of whether the training sample for updating the identity verification model can be used as a criterion. For example, if the prediction probability is greater than the first preset value, which indicates that the authentication accuracy can meet the preset requirement, the target audio clip is saved in the target folder, so as to update the identity verification model through the voiceprint feature contained in the target folder (e.g., the updateunaudio folder). Specifically, every preset time interval (e.g., one week or one month), N audio clips with a high prediction probability are obtained from the target folder, where N is a positive integer; and updating the identity confirmation model through the voiceprint characteristics of the N audio segments with higher prediction probability.
For example, after the N audio clips for updating the identity verification model are determined, the target folder needs to be emptied to ensure that the voice information stored in the target folder is recent (e.g., a recent week, a recent month, etc.) of the user, so as to train the identity verification model by using recent voice to effectively solve the problem of voice drift of the user.
In an exemplary embodiment, with continued reference to fig. 7, in case it is determined in step S730 that the prediction probability is greater than the first preset value, step S750 is further performed: and acquiring the signal-to-noise ratio of the target audio segment. And in step S760: determining whether the signal-to-noise ratio is below a second preset value.
And under the condition that the signal-to-noise ratio is not lower than the second preset value, the noise ratio in the current audio clip signal is higher, and the audio clip signal is not suitable for being used as a training sample to update and train the identity confirmation model, so that the audio clip is not suitable for being used as the training sample of the identity confirmation model, and further the further processing of the audio clip is finished.
And under the condition that the signal-to-noise ratio is determined to be lower than a second preset value, the current audio fragment signal is relatively pure. In this embodiment, the signal-to-noise ratio being lower than the second preset value is taken as a necessary condition for updating the training samples for training the identity verification model, that is, the embodiment further verifies the target audio clip with the signal-to-noise ratio being lower than the second preset value, so as to ensure that the confidence level of the training samples for updating the identity verification model is higher.
For example, in case that it is determined that the signal-to-noise ratio is lower than the second preset value, step S770 is executed: and acquiring credit verification information corresponding to the voice signal, and verifying the voice signal according to the credit verification information.
In an exemplary embodiment, the credit verification information may be a verification condition preset by the user, for example, the WIFI network information corresponding to the acquired voice signal is a target WIFI name, the geographical location information (e.g., GPS) corresponding to the acquired voice signal is GPS information corresponding to a place where the user lives, or the bluetooth device information corresponding to the acquired voice signal is a target bluetooth name. Therefore, one or more of the corresponding GPS information, WIFI network information, and bluetooth device information when the voice signal is acquired can be used as the credit verification information.
In step S780, it is determined whether the verification is passed.
In an exemplary embodiment, if the credit verification information is GPS information corresponding to the time of acquiring the voice signal, it is determined whether the GPS information corresponding to the time of acquiring the voice signal is GPS information corresponding to the place where the user lives. If yes, the verification is passed, otherwise, the verification is not passed. And under the condition that the verification is not passed, if the credit degree of the current target audio segment does not meet the preset requirement, abandoning the further processing of the target audio segment, namely finishing the related processing.
In the case where the verification is passed, step S790 is performed: saving the target audio clip to a target folder to update the identity confirmation model with the voiceprint features contained in the target folder (e.g., the updateriaudio folder).
Specifically, audio segments in the target folder updata tetrainaudio folder are traversed, and an optimal audio updating model is selected to screen out audio information for updating the identity confirmation model according to the prediction probability of a target user to which the target audio segment belongs and the signal-to-noise ratio. In the embodiment, the training sample for updating the identity confirmation model is determined by combining the prediction probability and the signal-to-noise ratio, so that the quality of the sample can be effectively ensured, and the prediction accuracy of the model can be improved.
Illustratively, referring to fig. 7, the updated identity determination model is used to receive the voiceprint features and output the probability that the target audio piece belongs to the target user in step S720. The updated identity determination model can effectively solve the voiceprint drift problem, and the prediction probability determined by the updated identity determination model has higher accuracy.
For example, as described above, after the M audio segments for updating the identity verification model are determined, the target folder needs to be emptied to ensure that recent voice information of the user is stored in the target folder, so that the identity verification model is trained by using recent voice to effectively solve the problem of voiceprint drift of the user.
According to the technical scheme of the voice starting terminal, on one hand, when the user identity authentication is carried out according to the target audio clip, parameters related to voice playing are loaded so that the terminal plays the preset text, therefore, the time consumed by starting the terminal is effectively saved, and the voice awakening response time can be effectively reduced. On the other hand, the identity confirmation model of the user is updated at intervals in a certain period by combining with the voiceprint influence factors of the user, so that the problem of voiceprint drift of the user is effectively solved, and the accuracy of predicting the identity of the user is improved.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Fig. 8 schematically illustrates a block diagram of an apparatus of a voice-enabled terminal according to an exemplary embodiment of the present disclosure. Referring to fig. 8, an apparatus 800 of a voice-enabled terminal according to an exemplary embodiment of the present disclosure may include the following modules: an acquisition module 801, a sending module 802, an association processing module 803, and a display module 804.
The obtaining module 801 is configured to: under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments; the sending module 802 is configured to: under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, and triggering the first chip to exit from a sleep state; the association processing module 803 is configured to: loading parameters related to voice playing through the first chip to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip; the display module 804 is configured to: and displaying the target graphical user interface under the condition that the user identity authentication is passed.
In an exemplary embodiment of the present disclosure, based on the above scheme, the obtaining module 801 is specifically configured to: receiving and segmenting a voice signal through a second chip to obtain a plurality of audio segments, wherein the power consumption of the second chip is lower than that of the first chip;
the sending module 802 is specifically configured to: and determining that the plurality of audio clips contain a first preset text through the second chip, and sending a target audio clip containing the first preset text to the first chip through the second chip.
In an exemplary embodiment of the present disclosure, based on the above scheme, the sending module 802 is further specifically configured to: extracting mel frequency cepstrum coefficients of the audio frequency segments through the second chip; matching the Mel frequency cepstrum coefficient of each audio clip according to the Gaussian mixture general background model related to the first preset text by the second chip; and in the case that the matched audio segments exist, determining that the plurality of audio segments contain the first preset text.
In an exemplary embodiment of the present disclosure, based on the above scheme, the apparatus further includes: text determination module 805.
Wherein the text determination module 805 is configured to: before loading parameters related to voice playing through the first chip, determining that the plurality of audio clips contain the first preset text through the first chip.
In an exemplary embodiment of the present disclosure, based on the above scheme, the sending module 802 is further configured to: and under the condition that the plurality of audio segments do not contain the first preset text, keeping the first chip in a dormant state.
In an exemplary embodiment of the present disclosure, based on the above scheme, the association processing module 803 is specifically configured to: acquiring the voiceprint characteristics of the target audio clip; inputting the voiceprint characteristics into an identity confirmation model to obtain the prediction probability of a target user to which the target audio clip belongs; and if the prediction probability is determined to be larger than the first preset value, the user identity authentication is passed.
In an exemplary embodiment of the present disclosure, based on the above scheme, the apparatus further includes: a verification module 806 and a model update module 807.
Wherein the verification module 806 is configured to: under the condition that the prediction probability is larger than a first preset value, acquiring credit verification information corresponding to the voice signal, and verifying the voice signal according to the credit verification information; the model update module 807 described above is configured to: and under the condition that the verification is passed, saving the target audio clip to a target folder so as to update the identity confirmation model through the voiceprint features contained in the target folder.
In an exemplary embodiment of the disclosure, based on the above scheme, the verification module 806 is specifically configured to: and acquiring one or more of network information, geographical position information and Bluetooth equipment information corresponding to the collected voice signals to obtain the credit verification information.
In an exemplary embodiment of the disclosure, based on the above scheme, the model updating module 807 is specifically configured to: every preset time interval, acquiring N audio clips with high prediction probability in the target folder, wherein N is a positive integer; and updating the identity confirmation model through the voiceprint characteristics of the N audio clips with higher prediction probability, and emptying the target folder.
In an exemplary embodiment of the present disclosure, based on the above scheme, the apparatus further includes: a signal-to-noise ratio determination module 808.
Wherein the signal-to-noise ratio determination module 808 is configured to: before verifying the voice signal according to the credit verification information, acquiring a signal-to-noise ratio of the target audio clip under the condition that the prediction probability is greater than the first preset value; and verifying the voice signal according to the credit verification information under the condition that the signal-to-noise ratio is lower than a second preset value.
In an exemplary embodiment of the disclosure, based on the above scheme, the model updating module 807 is further specifically configured to: every interval preset time length, obtaining M audio clips with high prediction probability and low signal-to-noise ratio in the target folder, wherein M is a positive integer; and updating the identity confirmation model through the voiceprint characteristics of the M audio clips with higher prediction probability, and emptying the target folder.
It should be noted that, since each functional module of the apparatus of the voice-enabled terminal according to the embodiment of the present disclosure is the same as that in the embodiment of the method of the voice-enabled terminal, it is not described herein again.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (14)

1. A method for voice-enabling a terminal, comprising:
under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments;
under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, and triggering the first chip to exit from a sleep state;
loading parameters related to voice playing through the first chip to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip;
and displaying the target graphical user interface under the condition that the user identity authentication is passed.
2. The method of claim 1,
receiving and segmenting a voice signal to obtain a plurality of audio segments, comprising:
receiving and segmenting a voice signal through a second chip to obtain a plurality of audio segments, wherein the power consumption of the second chip is lower than that of the first chip;
under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, wherein the method comprises the following steps:
and determining that the plurality of audio clips contain a first preset text through the second chip, and sending a target audio clip containing the first preset text to the first chip through the second chip.
3. The method of claim 2, wherein determining, by the second chip, that the plurality of audio clips contain the first predetermined text comprises:
extracting a mel frequency cepstrum coefficient of each audio frequency fragment through the second chip;
matching the Mel frequency cepstrum coefficient of each audio clip according to the Gaussian mixture general background model related to the first preset text by the second chip;
and in the case that the matched audio segments exist, determining that the first preset text is contained in the plurality of audio segments.
4. The method according to any of claims 1 to 3, wherein before loading parameters related to voice playback by the first chip, the method further comprises:
determining, by the first chip, that the plurality of audio segments include the first preset text.
5. The method of claim 4, further comprising:
maintaining the first chip in a sleep state if it is determined that the first preset text is not included in the plurality of audio segments.
6. The method according to any one of claims 1 to 3, wherein performing user identity authentication according to the target audio segment based on the first chip comprises:
acquiring the voiceprint characteristics of the target audio clip;
inputting the voiceprint characteristics into an identity confirmation model to obtain the prediction probability of a target user to which the target audio clip belongs;
and if the prediction probability is determined to be larger than the first preset value, the user identity authentication is passed.
7. The method of claim 6, further comprising:
under the condition that the prediction probability is larger than a first preset value, acquiring credit verification information corresponding to the voice signal, and verifying the voice signal according to the credit verification information;
and under the condition that the verification is passed, saving the target audio clip to a target folder so as to update the identity confirmation model through the voiceprint features contained in the target folder.
8. The method of claim 7, wherein obtaining credit verification information corresponding to the voice signal comprises:
and acquiring one or more of network information, geographical position information and Bluetooth equipment information corresponding to the collected voice signals to obtain the credit verification information.
9. The method of claim 7, wherein updating the identity confirmation model with voiceprint features contained in the target folder comprises:
every preset time interval, acquiring N audio clips with high prediction probability in the target folder, wherein N is a positive integer;
and updating the identity confirmation model through the voiceprint features of the N audio clips with higher prediction probability, and emptying the target folder.
10. The method of voice-enabled terminal as claimed in claim 7, wherein prior to authenticating the voice signal based on the credit verification information, the method further comprises:
under the condition that the prediction probability is larger than the first preset value, acquiring the signal-to-noise ratio of the target audio clip; and verifying the voice signal according to the credit verification information under the condition that the signal-to-noise ratio is lower than a second preset value.
11. The method of claim 10, wherein updating the identity confirmation model with voiceprint features contained in the target folder comprises:
every interval of preset duration, obtaining M audio clips with higher prediction probability and smaller signal-to-noise ratio in the target folder, wherein M is a positive integer;
and updating the identity confirmation model through the voiceprint features of the M audio clips with higher prediction probability, and emptying the target folder.
12. An apparatus for voice-activating a terminal, comprising:
an acquisition module configured to: under the condition that the first chip of the terminal is determined to be in the dormant state currently, receiving and segmenting a voice signal to obtain a plurality of audio segments;
a sending module configured to: under the condition that the plurality of audio segments are determined to contain a first preset text, sending a target audio segment containing the first preset text to the first chip, and triggering the first chip to exit from a sleep state;
an association processing module configured to: loading parameters related to voice playing through the first chip to enable the terminal to play a second preset text, and performing user identity authentication according to the target audio clip based on the first chip;
a display module configured to: and displaying the target graphical user interface under the condition that the user identity authentication is passed.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of voice-enabled terminal according to any one of claims 1 to 11.
14. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of voice-enabled terminal of any of claims 1-11 via execution of the executable instructions.
CN202110275697.3A 2021-03-15 2021-03-15 Method and device for starting terminal by voice, medium and electronic equipment Pending CN113066501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110275697.3A CN113066501A (en) 2021-03-15 2021-03-15 Method and device for starting terminal by voice, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110275697.3A CN113066501A (en) 2021-03-15 2021-03-15 Method and device for starting terminal by voice, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113066501A true CN113066501A (en) 2021-07-02

Family

ID=76560730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110275697.3A Pending CN113066501A (en) 2021-03-15 2021-03-15 Method and device for starting terminal by voice, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113066501A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150301796A1 (en) * 2014-04-17 2015-10-22 Qualcomm Incorporated Speaker verification
CN108305633A (en) * 2018-01-16 2018-07-20 平安科技(深圳)有限公司 Speech verification method, apparatus, computer equipment and computer readable storage medium
CN109979438A (en) * 2019-04-04 2019-07-05 Oppo广东移动通信有限公司 Voice awakening method and electronic equipment
CN111862965A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Awakening processing method and device, intelligent sound box and electronic equipment
CN112331193A (en) * 2019-07-17 2021-02-05 华为技术有限公司 Voice interaction method and related device
CN112328994A (en) * 2020-11-17 2021-02-05 携程计算机技术(上海)有限公司 Voiceprint data processing method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150301796A1 (en) * 2014-04-17 2015-10-22 Qualcomm Incorporated Speaker verification
CN108305633A (en) * 2018-01-16 2018-07-20 平安科技(深圳)有限公司 Speech verification method, apparatus, computer equipment and computer readable storage medium
CN109979438A (en) * 2019-04-04 2019-07-05 Oppo广东移动通信有限公司 Voice awakening method and electronic equipment
CN111862965A (en) * 2019-04-28 2020-10-30 阿里巴巴集团控股有限公司 Awakening processing method and device, intelligent sound box and electronic equipment
CN112331193A (en) * 2019-07-17 2021-02-05 华为技术有限公司 Voice interaction method and related device
CN112328994A (en) * 2020-11-17 2021-02-05 携程计算机技术(上海)有限公司 Voiceprint data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111933112B (en) Awakening voice determination method, device, equipment and medium
CN111819533B (en) Method for triggering electronic equipment to execute function and electronic equipment
CN110660398B (en) Voiceprint feature updating method and device, computer equipment and storage medium
CN108762494A (en) Show the method, apparatus and storage medium of information
CN110956971B (en) Audio processing method, device, terminal and storage medium
CN110600040B (en) Voiceprint feature registration method and device, computer equipment and storage medium
CN111105788B (en) Sensitive word score detection method and device, electronic equipment and storage medium
CN111694605A (en) Voice information processing method and device, storage medium and electronic equipment
CN112331193A (en) Voice interaction method and related device
CN114299933A (en) Speech recognition model training method, device, equipment, storage medium and product
CN111048109A (en) Acoustic feature determination method and apparatus, computer device, and storage medium
CN112614507A (en) Method and apparatus for detecting noise
CN111652624A (en) Ticket buying processing method, ticket checking processing method, device, equipment and storage medium
CN114067782A (en) Audio recognition method and device, medium and chip system thereof
CN113066501A (en) Method and device for starting terminal by voice, medium and electronic equipment
CN113554932A (en) Track playback method and related device
CN115312068B (en) Voice control method, equipment and storage medium
CN111341317B (en) Method, device, electronic equipment and medium for evaluating wake-up audio data
CN112116908B (en) Wake-up audio determining method, device, equipment and storage medium
CN114493787A (en) House management method, house management device and computer-readable storage medium
CN113362836A (en) Vocoder training method, terminal and storage medium
CN114093368A (en) Cross-device voiceprint registration method, electronic device and storage medium
CN116030817B (en) Voice wakeup method, equipment and storage medium
WO2022233239A1 (en) Upgrading method and apparatus, and electronic device
CN116189718A (en) Voice activity detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination