CN113129904A - Voiceprint determination method, apparatus, system, device and storage medium - Google Patents

Voiceprint determination method, apparatus, system, device and storage medium Download PDF

Info

Publication number
CN113129904A
CN113129904A CN202110341365.0A CN202110341365A CN113129904A CN 113129904 A CN113129904 A CN 113129904A CN 202110341365 A CN202110341365 A CN 202110341365A CN 113129904 A CN113129904 A CN 113129904A
Authority
CN
China
Prior art keywords
voiceprint
preset
recognition signal
voice recognition
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110341365.0A
Other languages
Chinese (zh)
Other versions
CN113129904B (en
Inventor
孙洪菠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110341365.0A priority Critical patent/CN113129904B/en
Priority to CN202210877631.6A priority patent/CN115394304A/en
Publication of CN113129904A publication Critical patent/CN113129904A/en
Application granted granted Critical
Publication of CN113129904B publication Critical patent/CN113129904B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The disclosure discloses a voiceprint determination method, a voiceprint determination device, a voiceprint determination system, voiceprint determination equipment and a storage medium, and relates to the technical field of computers, in particular to the technical fields of voice recognition, deep learning and the like. The voiceprint determination method includes: adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, wherein the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content; and sending the voice recognition signal containing the preset identification to a server so that the server performs voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identification, wherein the voiceprint model is established based on the preset content. The data volume of the transmission link can be reduced.

Description

Voiceprint determination method, apparatus, system, device and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of speech recognition and deep learning, and in particular, to a voiceprint determination method, apparatus, system, device, and storage medium.
Background
Speech recognition (speech recognition) refers to the conversion of speech into text, and unlike speech recognition, voiceprint determination is aimed at identifying the Identity (Identity) of a speaker.
In the related art, a dedicated voice signal is generally used for voiceprint determination, and assuming that the voice signal used for voice recognition is called a voice recognition signal and the voice signal used for voiceprint determination is called a voiceprint determination signal, the client needs to send the voice recognition signal and the voiceprint determination signal to the server, and the server performs voiceprint determination based on the voiceprint determination signal and performs voice recognition based on the voice recognition signal.
Disclosure of Invention
The disclosure provides a voiceprint determination method, apparatus, system, device and storage medium.
According to an aspect of the present disclosure, there is provided a voiceprint determination method including: adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, wherein the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content; and sending the voice recognition signal containing the preset identification to a server so that the server performs voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identification, wherein the voiceprint model is established based on the preset content.
According to another aspect of the present disclosure, there is provided a voiceprint determination method including: receiving a voice recognition signal, wherein the voice recognition signal comprises a preset identifier, the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content; determining the voiceprint determination part in the voice recognition signal based on the preset identification; and carrying out voiceprint judgment on the voiceprint judgment part by adopting a voiceprint model, wherein the voiceprint model is established based on the preset content.
According to another aspect of the present disclosure, there is provided a voiceprint determination apparatus including: the voice recognition system comprises an adding module, a judging module and a processing module, wherein the adding module is used for adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content; and the sending module is used for sending the voice recognition signal containing the preset identifier to the server so as to enable the server to perform voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identifier, and the voiceprint model is established based on the preset content.
According to another aspect of the present disclosure, there is provided a voiceprint determination apparatus including: the voice recognition system comprises a receiving module, a judging module and a processing module, wherein the receiving module is used for receiving a voice recognition signal, the voice recognition signal comprises a preset identification, the voice recognition signal comprises a voiceprint judging part, the preset identification is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content; a determining module, configured to determine the voiceprint determination portion in the voice recognition signal based on the preset identifier; and the judging module is used for judging the voiceprint of the voiceprint judging part by adopting a voiceprint model, and the voiceprint model is established based on the preset content.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.
According to the technical scheme of the disclosure, the data volume of the transmission link can be reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;
FIG. 8 is a schematic diagram according to an eighth embodiment of the present disclosure;
fig. 9 is a schematic diagram of an electronic device for implementing any one of the voiceprint determination methods of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. The embodiment provides a voiceprint determination method, including:
101. adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, wherein the voice recognition signal comprises a voiceprint judgment part, the preset identifier is used for determining the voiceprint judgment part in the voice recognition signal, and the voiceprint judgment part comprises preset content.
102. And sending the voice recognition signal containing the preset identification to a server so that the server performs voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identification, wherein the voiceprint model is established based on the preset content.
The execution subject of the embodiment is a client, and the client is deployed on user equipment. The specific form of the user device is not limited, and for example, the user device may be a smart home terminal, and various mobile devices, and the mobile devices include: mobile phones, tablet computers, handheld computing devices, PDAs (personal digital assistants), portable media players, devices using headphones and earphones (e.g., bluetooth compatible devices), cell phone tablet (i.e., combination smart phone/tablet devices), wearable computers, and the like. The smart home terminal is, for example, a smart sound box.
The form of the client is not limited, and may be provided by an APP (application), a web page, or a program. The APP may be explicitly installed on the interface of the user device, or the APP may be called by the user through specific hardware and/or software buttons, which is not limited by this disclosure.
The speech recognition signal refers to a speech signal containing a wakeup word and a content to be recognized, for example, the speech recognition signal is "small degree, tomorrow weather", where "small degree" is the wakeup word and "tomorrow weather" is the content to be recognized.
Voiceprint (Voiceprint) is a spectrum of sound waves that carry language information. The voiceprint characteristics of any two people are different and each person has relative stability. The voiceprint determination has two types, Text-Dependent (Text-Dependent) and Text-Independent (Text-Independent). The voice print judgment related to the text requires a user to pronounce according to preset content, the voice print models of each person are accurately established one by one, and the user also needs to pronounce according to the preset content when the voice print judgment is carried out. Text-independent voiceprint determination does not require the user to pronounce according to preset content.
In the embodiment of the present disclosure, the voiceprint associated with the text is determined as an example, and the associated text, that is, the preset content is a wakeup word, for example, "degree of smallness".
Unlike the related art that additionally transmits a voiceprint determination signal, in the embodiment of the present disclosure, the voiceprint determination signal is not additionally transmitted, but is performed based on a voice recognition signal. Specifically, the server may determine a voiceprint determination portion in the voice recognition signal based on the preset identifier, where the voiceprint determination portion is, for example, a portion including a wakeup word, and then perform voiceprint determination based on the voiceprint determination portion.
The voiceprint decision section is a portion of the speech recognition signal, and the two end points of the section may be referred to as a voiceprint decision start point and a voiceprint decision end point.
In some embodiments, the preset identification may be added based on an endpoint of the voiceprint determination portion. For example, a preset flag is added after the voiceprint determination end point in the speech recognition signal. Specifically, the client may start sending a voice recognition signal from a voiceprint determination starting point, add a preset identifier after a voiceprint determination ending point, and use the voice recognition signal before the preset identifier as a voiceprint determination part after the server receives the voice recognition signal.
For example, a preset identifier may be added before the voiceprint determination starting point, at this time, the client may start sending a voice recognition signal from a time point less than or equal to the voiceprint determination starting point, and after receiving the preset identifier before the voiceprint determination starting point, the server may select, according to the preconfigured length, a part of the preconfigured length from behind the preset identifier as the voiceprint determination part. The preconfigured length may be the same as a second, subsequent preset duration.
Another alternative is that the client starts to send the voice recognition signal from the voiceprint determination starting point, the server configures the length in advance, that is, the second preset duration, and after receiving the voice recognition signal, the server selects a part of the preconfigured length from the starting point of the received voice recognition signal as the voiceprint determination part. At this time, the preset identifier is not limited to the position of the addition, that is, the addition is not limited to be performed based on the endpoint, and the preset identifier at this time may be an indication of voiceprint determination, that is, after the server receives the voice recognition signal, it is determined that voiceprint determination needs to be performed based on the preset identifier, and then, the voiceprint determination portion is determined in the manner described above.
By adding the preset identifier based on the endpoint, the server can determine the voiceprint judgment part in time.
Furthermore, the mode of adding the preset identification at the tail point is judged based on the voiceprint, so that the method is not required to be configured in advance at the server side and is easier to realize.
In some embodiments, determining the voiceprint determination starting point and the voiceprint determination ending point may include: determining a wake-up time point in the speech recognition signal; taking the awakening time point as a reference, tracing back to the front for a first preset time length, and determining the time length as a voiceprint judgment starting point; and delaying backward for a second preset time length by taking the voiceprint determination starting point as a reference, and determining the voiceprint determination tail point.
The wake-up time point refers to a time point when the user equipment determines that the user equipment successfully wakes up, for example, a time point when the user equipment recognizes at least a part of the wake-up word. Taking the example that the awakening word is "small degree", generally, when the "small degree" is recognized, the user equipment may determine that the awakening is successful, and then the end time point corresponding to the "small degree" may be used as the awakening time point.
Generally, the first preset time duration is greater than or equal to the time duration of the wake-up word, for example, the wake-up word is "small degree", each word is generally 500ms, and the first preset time duration is, for example, 2000 ms.
The second predetermined duration is generally longer than the first predetermined duration, and the second predetermined duration is, for example, 2560 ms.
By the selection mode, the voice content between the voiceprint judgment starting point and the voiceprint judgment tail point can be ensured to comprise the complete awakening word, so that the voiceprint judgment based on the awakening word can be conveniently carried out by the server.
Taking the first preset time duration as 2000ms and the second preset time duration as 2560ms as an example, the correlation among the wake-up time point, the voiceprint determination start point and the voiceprint determination end point can be seen in fig. 2.
In this embodiment, by determining the voiceprint determination starting point and the voiceprint determination end point in the above manner, it can be ensured that the voiceprint determination starting point and the voiceprint determination end point include completed preset contents, so as to ensure the accuracy of voiceprint determination.
In the related art, the client sends the voiceprint determination signal to the server in addition to the voice recognition signal, wherein the voice recognition signal and the voiceprint determination signal can respectively carry different types of identifiers for distinguishing, that is, the voice signal sent by the client to the server is classified into two types, one type is the voice recognition signal and the other type is the voiceprint determination signal, which increases the burden of the transmission link and may also affect the voice recognition signal to affect the voice recognition effect.
In the embodiment of the present disclosure, the voiceprint determination signal does not need to be additionally transmitted, but the voiceprint determination is performed by using the voice recognition signal, that is, the voice recognition signal is used not only for voice recognition but also for voiceprint determination.
The client can add a preset identifier in the voice recognition signal, and the server performs voiceprint judgment on the voice recognition signal containing the preset identifier. Specifically, the client may add a preset identifier after a voiceprint determination end point of the voice recognition signal, and start to send the voice recognition signal to the server from the voiceprint determination start point, and the server takes the voice recognition signal before the preset identifier as a voiceprint determination signal, and performs voiceprint determination on the voiceprint determination signal based on a pre-established voiceprint model. For example, the user may say in advance that the voice contains "small degree" for a plurality of times, and the server establishes the voiceprint model based on the voice signal containing the wakeup word for a plurality of times. The specific way of establishing the voiceprint model can be realized by adopting the related technology, and is not detailed here.
In order to enable the server to perform voiceprint determination based on the voice recognition signal, a preset identifier may be added to the voice recognition signal, and accordingly, after receiving the voice recognition signal including the preset identifier, the server may perform voiceprint determination based on the voice recognition signal before the preset identifier.
When the client sends the voice recognition signal to the server, the voice recognition signal can be sent in a mode of voice packets one by one, namely, the client can start from the voiceprint judgment starting point, divide the voice recognition signal into voice packets, add a preset empty packet after the voiceprint judgment tail point, and send the voice packets and the preset empty packet to the server. For example, if each voice packet is 160ms, and 2560ms is 16 voice packets, the client starts from the start of voiceprint determination, and sends 16 voice packets, and then sends a null packet after the 16 th voice packet. After receiving the empty packet, the server performs voiceprint determination based on the voice recognition signal before the empty packet, that is, performs voiceprint determination based on the previous 16 voice packets. Wherein the voice recognition signal before the empty packet is a voice recognition signal containing a wakeup word.
Further, the class identifier in the empty packet as the preset identifier may be different from the class identifier in the voice packet corresponding to the normal voice recognition signal, so as to distinguish between the empty packet and the normal voice recognition signal.
In this embodiment, by determining the voiceprint determination start point and the voiceprint determination end point in the voice recognition signal, and determining the end point based on the voiceprint, the preset identifier is added to the voice recognition signal, so that the server performs voiceprint determination on the voice recognition signal including the preset identifier, and voiceprint determination can be completed without additionally sending a special voiceprint determination voice signal, thereby reducing the data volume of the transmission link.
Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure, where this embodiment provides a voiceprint determination method, and the embodiment is implemented by a server, as shown in fig. 3, the method includes:
301. receiving a voice recognition signal, wherein the voice recognition signal comprises a preset identifier, the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content.
302. And determining the voiceprint judgment part in the voice recognition signal based on the preset identification.
303. And carrying out voiceprint judgment on the voiceprint judgment part by adopting a voiceprint model, wherein the voiceprint model is established based on the preset content.
In some embodiments, the determining the voiceprint determination portion in the speech recognition signal based on the preset identifier comprises: and determining a part before the preset mark in the voice recognition signal as the voiceprint judgment part.
In this embodiment, by determining the voiceprint determination portion in the voice recognition signal and performing voiceprint determination based on the voiceprint determination portion, voiceprint determination can be completed without additionally transmitting a special voiceprint determination voice signal, thereby reducing the data amount of the transmission link. Further, by adding the preset mark after the voiceprint determination end point, the part before the preset mark can be used as the voiceprint determination part, and the voiceprint determination part can be determined efficiently.
Fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure. This embodiment provides a voiceprint determination method, which, in combination with the structure shown in fig. 5, includes:
401. and the client acquires a registration voice signal containing the preset content.
The registration voice signal is a voice signal used for registering a voiceprint, the preset content is a fixed content when a voiceprint related to a text is determined, and the preset content is, for example, a wakeup word.
The client can provide a voiceprint registration interface for the user, the voiceprint registration interface can comprise a voiceprint registration button, and after the user clicks the voiceprint registration button, the client can collect voice signals sent by the user. The voice print related to the text is determined as an example, so that the client can prompt the user to send a voice signal containing the awakening word before collecting the voice signal sent by the user, for example, the client is installed on an intelligent sound box, and the intelligent sound box can prompt the user to "please speak" the small degree "" similar content in a text form through a screen or in a voice form. Accordingly, the "degree of smallness" uttered by the user according to the prompt may be used as the registration voice signal.
402. And the client sends the registration voice signal to the server so that the server establishes a voiceprint model based on the registration voice signal.
As shown in fig. 5, the client may include: a voice acquisition module, a voice Software Development Kit (SDK), and an interaction module. The voice SDK may include: the device comprises a cache module, an identification module and a wake-up module. In fig. 5, the server is taken as an example of the cloud, and it can be understood that, if the voice recognition is performed offline, the server may also be located on the user equipment.
The voice acquisition module is used for acquiring voice signals sent by users, and the voice acquisition module is a microphone array, for example.
After the voice signal is collected by the voice collecting module, the voice signal can be stored in a buffer module (buffer). When the voiceprint is registered, the recognition module can acquire a registration voice signal containing the awakening word from the buffer module and send the registration voice signal to the cloud end through the communication module.
Further, before the voice signal (the registration voice signal and/or the voice recognition signal) is stored in the buffer module by the voice collecting module, the voice signal can be processed so as to store the processed voice signal. Processing the speech signal may include general signal processing procedures such as noise reduction, enhancement, etc., and/or may also include beam processing the speech signal. It is understood that the processing manner of the registered voice signal and the voice recognition signal for voiceprint determination later needs to be consistent, for example, when the voice recognition signal to be referred to in the following is a voice recognition signal after beam processing, beam processing is also performed on the registered voice signal, so as to perform voiceprint registration based on the registered voice signal after beam processing.
In this embodiment, the client sends a registration voice signal containing the preset content to the server, so that voiceprint registration can be completed for subsequent voiceprint determination.
403. The client collects the voice recognition signal.
The speech recognition signal refers to a signal to be speech recognized, and generally includes a wakeup word and a content to be recognized, for example, the speech recognition signal is "small, tomorrow weather". It will be appreciated that in the context of real-time speech recognition, the speech recognition signal is streamed, i.e. the user is not required to speak all "minutia, tomorrow" for processing, but rather to start processing from the detection of speech, e.g. "minutia". In addition, for the voice wake-up scene, "small degree" and "tomorrow" are not limited to be spoken continuously, that is, the user may speak the "small degree" first, and after the client or the cloud recognizes that the wake-up is performed and the wake-up operation is performed, the user speaks the "tomorrow" again.
Specifically, after the voice acquisition module acquires the voice recognition signal, the voice recognition signal is stored in the cache module, and the recognition module acquires the voice recognition module from the cache module.
404. The client determines a wake-up time point in the speech recognition signal.
The client may determine a wake up flag, which includes: a voice watermark value; and determining the tail point of the voice frame where the voice watermark corresponding to the voice watermark value is positioned as a wake-up time point.
Specifically, after the recognition module obtains the voice recognition signal from the buffer, the recognition module may send the voice recognition signal to the wake-up module. The wake-up module is configured to detect whether the voice signal includes a wake-up word, and the wake-up module may be implemented by using various related technologies, for example, dividing the voice signal into multiple frames, extracting a voice feature of each frame of the voice signal, and determining whether the frame of the voice signal includes the wake-up word according to the voice feature and the wake-up acoustic model. The awakening module can feed back an awakening identifier to the recognition module after detecting that the voice signal contains the awakening word, and the recognition module determines a time point corresponding to the awakening identifier as an awakening time point after receiving the awakening identifier.
The wake-up indicator is, for example, a voice watermark value.
The voice acquisition module can add voice watermarks on the acquired voice signals and send the voice signals added with the voice watermarks to the awakening module and the recognition module. When the voice watermark is added, the voice acquisition module may also allocate a voice watermark value to each voice watermark, where the voice watermark values are counted in sequence from 0, for example, that is, the voice watermark values may be 0, 1, 2, etc. respectively. The voice acquisition module may add the voice watermark to the voice signal by using various related technologies, and the manner of adding the voice watermark is not limited in this embodiment.
The wake-up module may process based on the voice frame when detecting the wake-up word. That is, the voice signal is divided into voice frames, for example, every 32ms, and whether a wakeup word is included in each voice frame is detected. When the awakening word is detected, the voice watermark on the voice frame containing the awakening word can be analyzed based on a pre-configured protocol to obtain a corresponding voice watermark value, and then the voice watermark value is sent to the identification module as an awakening identifier.
After receiving the voice watermark value as the wake-up identifier, the recognition module may determine a tail point of a voice frame where the voice watermark corresponding to the voice watermark value is located as a wake-up time point.
In this embodiment, the wake-up time point is determined based on the voice watermark value, so that the accuracy of the wake-up time point can be ensured.
405. And the client backtracks the first preset duration forwards by taking the awakening time point as a reference to determine the first preset duration as a voiceprint judgment starting point, and delays the second preset duration backwards by taking the voiceprint judgment starting point as a reference to determine the second preset duration as a voiceprint judgment tail point.
The relationship between the wake-up time, the start of voiceprint determination, and the end of voiceprint determination can be seen in fig. 2, and will not be described in detail here.
406. And the client divides the voice recognition signal into voice packets from the voiceprint judgment starting point, sends the voice packets to the cloud, adds a preset empty packet after the voiceprint judgment tail point, and sends the preset empty packet to the cloud.
The voice packet corresponding to the voice recognition signal comprises a first type identifier, the preset empty packet comprises a second type identifier, and the first type identifier is different from the second type identifier.
As shown in fig. 6, the client may send voice packets packet by packet from the start of voiceprint determination, where the duration of each voice packet may be 160ms, and since the duration from the start of voiceprint determination to the end of voiceprint determination is 2560ms, the voice packets may be divided into 16 voice packets, and a null packet may be added after the 16 th voice packet. Type identifiers (Type) can be contained in the null packets and the voice packets, and the Type identifiers can be different. As shown in fig. 6, the type of the voice packet is identified as 0x01, and the type of the null packet is identified as 0x 08. It is to be understood that when transmitting the voice signal (the registration voice signal and/or the voice recognition signal), certain processing, such as compression processing, may also be performed on the voice signal. As shown in fig. 6, taking opus compression as an example, after opus compression, a corresponding voice packet (package, pkg) may be obtained, where each voice packet may include a length, a type, and a value, and is sent to the cloud. In addition, the voice data in the buffer can be read from an Advanced Linux Sound Architecture (ALSA) of the terminal, and 32ms of voice data can be read each time.
In this embodiment, adopt different type sign through pronunciation package and empty package, can better distinguish pronunciation package and empty package to guarantee the degree of accuracy that the voiceprint judges the signal, and then guarantee the degree of accuracy that the voiceprint judges.
407. And the cloud terminal takes the voice recognition signal before the preset empty packet as a voiceprint judgment part, and performs voiceprint judgment on the voiceprint judgment part by adopting a voiceprint model.
That is, a voice signal between the start point of voiceprint determination and the end point of voiceprint determination is used as a voiceprint determination portion, and voiceprint determination is performed on the voiceprint determination signal using a voiceprint model.
It can be understood that the cloud may also perform speech recognition and corresponding processing on the speech recognition signal, for example, recognize "tomorrow weather", then obtain tomorrow weather, and feed back to the user through the client. Specifically, the 17 th packet of fig. 6 may be the first voice packet corresponding to the content to be recognized, and so on.
In this embodiment, the voice recognition signal is sent to the cloud end from the voiceprint determination starting point, and the preset null packet is added after the voiceprint determination end point, so as to instruct the cloud end to perform voiceprint determination on the voice recognition signal before the preset null packet, and the voice signal specially used for voiceprint determination can be not required to be additionally transmitted, so that the transmission data volume of the transmission link between the client end and the cloud end can be reduced, the burden of the transmission link is reduced, the interference of the additionally transmitted voice signal on the voice recognition signal is avoided, and the accuracy of voice recognition is ensured.
Fig. 7 is a schematic diagram according to a seventh embodiment of the present disclosure, which provides a voiceprint determination apparatus. As shown in fig. 7, the voiceprint determination apparatus 700 can include an adding module 701 and a sending module 702.
The adding module 701 is configured to add a preset identifier to a voice recognition signal to obtain the voice recognition signal including the preset identifier, where the voice recognition signal includes a voiceprint determination portion, the preset identifier is used to determine the voiceprint determination portion in the voice recognition signal, and the voiceprint determination portion includes preset content; the sending module 702 is configured to send the voice recognition signal containing the preset identifier to the server, so that the server performs voiceprint determination on the voiceprint model and the voice recognition signal containing the preset identifier, where the voiceprint model is established based on the preset content.
In some embodiments, the adding module 701 is specifically configured to: determining an end point of a voiceprint determination portion in the speech recognition signal; and adding a preset identifier in the voice recognition signal based on the endpoint.
In some embodiments, the endpoint comprises: the voiceprint determining end point, the adding module 701 is further specifically configured to: and adding a preset identifier after the voiceprint determination tail point in the voice recognition signal.
In some embodiments, the preset content is a wakeup word, and the adding module 701 is further specifically configured to: determining a wake-up time point in the speech recognition signal; taking the awakening time point as a reference, tracing back to the front for a first preset time length, and determining the time length as a voiceprint judgment starting point; and delaying backward for a second preset time length by taking the voiceprint determination starting point as a reference, and determining the voiceprint determination tail point.
In some embodiments, the adding module 701 is further specifically configured to: determining a wake-up identifier, wherein the wake-up identifier comprises: a voice watermark value; and determining the tail point of the voice frame where the voice watermark corresponding to the voice watermark value is positioned as a wake-up time point.
In some embodiments, the endpoints include a voiceprint determination starting point and a voiceprint determination ending point, the preset identifier is a preset null packet, and the preset null packet is added after the voiceprint determination ending point, where the sending module 702 is specifically configured to: and dividing the voice recognition signal into voice packets, and starting from the voiceprint judgment starting point, sending the voice packets and the preset empty packets to a server.
In some embodiments, the voice packet corresponding to the voice recognition signal includes a first type identifier, the preset null packet includes a second type identifier, and the first type identifier is different from the second type identifier.
In some embodiments, the apparatus further comprises: the registration module is used for acquiring a registration voice signal containing the preset content; and sending the registration voice signal to the server side so that the server side establishes the voiceprint model based on the registration voice signal.
Fig. 8 is a schematic diagram according to an eighth embodiment of the present disclosure, which provides a voiceprint determination apparatus. As shown in fig. 8, the voiceprint determination apparatus 800 can include a receiving module 801, a determining module 802, and a determining module 803.
The receiving module 801 is configured to receive a voice recognition signal, where the voice recognition signal includes a preset identifier, the voice recognition signal includes a voiceprint determination portion, the preset identifier is used to determine the voiceprint determination portion in the voice recognition signal, and the voiceprint determination portion includes preset content; the determining module 802 is configured to determine the voiceprint determination portion in the voice recognition signal based on the preset identifier; the determining module 803 is configured to perform voiceprint determination on the voiceprint determining portion by using a voiceprint model, where the voiceprint model is established based on the preset content.
In some embodiments, the voiceprint determination portion is located between a voiceprint determination starting point and a voiceprint determination ending point, the speech recognition signal is sent from the voiceprint determination starting point, the preset identifier is added after the voiceprint determination ending point, and the determining module 802 is specifically configured to: and determining the voice recognition signal before the preset identification as the voiceprint judgment part.
In the embodiment of the disclosure, by determining the voiceprint determination starting point and the voiceprint determination ending point in the voice recognition signal, and determining the ending point based on the voiceprint, the preset identifier is added in the voice recognition signal, so that the server can perform voiceprint determination on the voice recognition signal containing the preset identifier, and can complete voiceprint determination on the basis of not additionally sending a special voiceprint determination voice signal, thereby reducing the data volume of the transmission link.
It is to be understood that in the disclosed embodiments, the same or similar elements in different embodiments may be referenced.
It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are used for distinction only, and do not indicate the degree of importance, the order of timing, and the like.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The calculation unit 901, the ROM 602, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the voiceprint determination method. For example, in some embodiments, the voiceprint determination method can be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the voiceprint determination method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the voiceprint determination method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. A voiceprint determination method comprising:
adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, wherein the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content;
and sending the voice recognition signal containing the preset identification to a server so that the server performs voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identification, wherein the voiceprint model is established based on the preset content.
2. The method of claim 1, wherein the adding of the preset identification to the speech recognition signal comprises:
determining an end point of a voiceprint determination portion in the speech recognition signal;
and adding a preset identifier in the voice recognition signal based on the endpoint.
3. The method of claim 2, wherein the endpoint comprises: determining a tail point by a voiceprint, wherein a preset identifier is added to the voice recognition signal based on the end point, and the method comprises the following steps:
and adding a preset identifier after the voiceprint determination tail point in the voice recognition signal.
4. The method of claim 3, wherein the predetermined content is a wake-up word, and the determining an end point of a voiceprint determination portion in the speech recognition signal comprises:
determining a wake-up time point in the speech recognition signal;
taking the awakening time point as a reference, tracing back to the front for a first preset time length, and determining the time length as a voiceprint judgment starting point;
and delaying backward for a second preset time length by taking the voiceprint determination starting point as a reference, and determining the voiceprint determination tail point.
5. The method of claim 4, wherein the determining a wake-up time point in a speech recognition signal comprises:
determining a wake-up identifier, wherein the wake-up identifier comprises: a voice watermark value;
and determining the tail point of the voice frame where the voice watermark corresponding to the voice watermark value is positioned as a wake-up time point.
6. The method according to claim 2, wherein the end point includes a voiceprint determination start point and a voiceprint determination end point, the preset identifier is a preset null packet, and the preset null packet is added after the voiceprint determination end point, and the sending the voice recognition signal including the preset identifier to the server includes:
and dividing the voice recognition signal into voice packets, and starting from the voiceprint judgment starting point, sending the voice packets and the preset empty packets to a server.
7. The method according to claim 6, wherein the voice packet corresponding to the voice recognition signal includes a first type identifier, and the predetermined null packet includes a second type identifier, and the first type identifier and the second type identifier are different.
8. The method of any of claims 1-7, further comprising:
collecting a registration voice signal containing the preset content;
and sending the registration voice signal to the server so that the server establishes the voiceprint model based on the registration voice signal.
9. A voiceprint determination method comprising:
receiving a voice recognition signal, wherein the voice recognition signal comprises a preset identifier, the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content;
determining the voiceprint determination part in the voice recognition signal based on the preset identification;
and carrying out voiceprint judgment on the voiceprint judgment part by adopting a voiceprint model, wherein the voiceprint model is established based on the preset content.
10. The method according to claim 9, wherein the voiceprint determination section is located between a voiceprint determination start point and a voiceprint determination end point, the voice recognition signal is transmitted from the voiceprint determination start point, the preset flag is added after the voiceprint determination end point, and the determining the voiceprint determination section in the voice recognition signal based on the preset flag comprises:
and determining the voice recognition signal before the preset identification as the voiceprint judgment part.
11. A voiceprint determination apparatus comprising:
the voice recognition system comprises an adding module, a judging module and a processing module, wherein the adding module is used for adding a preset identifier in a voice recognition signal to obtain the voice recognition signal containing the preset identifier, the voice recognition signal comprises a voiceprint judging part, the preset identifier is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content;
and the sending module is used for sending the voice recognition signal containing the preset identifier to the server so as to enable the server to perform voiceprint judgment on the voiceprint model and the voice recognition signal containing the preset identifier, and the voiceprint model is established based on the preset content.
12. The apparatus according to claim 11, wherein the adding module is specifically configured to:
determining an end point of a voiceprint determination portion in the speech recognition signal;
and adding a preset identifier in the voice recognition signal based on the endpoint.
13. The apparatus of claim 12, wherein the endpoint comprises: the voiceprint determination tail point, the adding module is further specifically configured to:
and adding a preset identifier after the voiceprint determination tail point in the voice recognition signal.
14. The apparatus according to claim 13, wherein the preset content is a wakeup word, and the adding module is further specifically configured to:
determining a wake-up time point in the speech recognition signal;
taking the awakening time point as a reference, tracing back to the front for a first preset time length, and determining the time length as a voiceprint judgment starting point;
and delaying backward for a second preset time length by taking the voiceprint determination starting point as a reference, and determining the voiceprint determination tail point.
15. The apparatus of claim 14, wherein the adding module is further specific to:
determining a wake-up identifier, wherein the wake-up identifier comprises: a voice watermark value;
and determining the tail point of the voice frame where the voice watermark corresponding to the voice watermark value is positioned as a wake-up time point.
16. The apparatus according to claim 12, wherein the end points include a voiceprint determination start point and a voiceprint determination end point, the preset identifier is a preset null packet, and the preset null packet is added after the voiceprint determination end point, and the sending module is specifically configured to:
and dividing the voice recognition signal into voice packets, and starting from the voiceprint judgment starting point, sending the voice packets and the preset empty packets to a server.
17. The apparatus according to claim 16, wherein a voice packet corresponding to the voice recognition signal includes a first type identifier, and the predetermined null packet includes a second type identifier, and the first type identifier and the second type identifier are different.
18. The apparatus of any of claims 11-17, further comprising:
the registration module is used for acquiring a registration voice signal containing the preset content; and sending the registration voice signal to the server side so that the server side establishes the voiceprint model based on the registration voice signal.
19. A voiceprint determination apparatus comprising:
the voice recognition system comprises a receiving module, a judging module and a processing module, wherein the receiving module is used for receiving a voice recognition signal, the voice recognition signal comprises a preset identification, the voice recognition signal comprises a voiceprint judging part, the preset identification is used for determining the voiceprint judging part in the voice recognition signal, and the voiceprint judging part comprises preset content;
a determining module, configured to determine the voiceprint determination portion in the voice recognition signal based on the preset identifier;
and the judging module is used for judging the voiceprint of the voiceprint judging part by adopting a voiceprint model, and the voiceprint model is established based on the preset content.
20. The apparatus according to claim 19, wherein the voiceprint decision section is located between a voiceprint decision start point and a voiceprint decision end point, the speech recognition signal is transmitted from the voiceprint decision start point, the preset identification is added after the voiceprint decision end point, and the determining means is specifically configured to:
and determining the voice recognition signal before the preset identification as the voiceprint judgment part.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.
CN202110341365.0A 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium Active CN113129904B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110341365.0A CN113129904B (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium
CN202210877631.6A CN115394304A (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110341365.0A CN113129904B (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210877631.6A Division CN115394304A (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN113129904A true CN113129904A (en) 2021-07-16
CN113129904B CN113129904B (en) 2022-08-23

Family

ID=76774575

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110341365.0A Active CN113129904B (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium
CN202210877631.6A Pending CN115394304A (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202210877631.6A Pending CN115394304A (en) 2021-03-30 2021-03-30 Voiceprint determination method, apparatus, system, device and storage medium

Country Status (1)

Country Link
CN (2) CN113129904B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115394304A (en) * 2021-03-30 2022-11-25 北京百度网讯科技有限公司 Voiceprint determination method, apparatus, system, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182627A1 (en) * 2004-01-14 2005-08-18 Izuru Tanaka Audio signal processing apparatus and audio signal processing method
CN104821934A (en) * 2015-03-20 2015-08-05 百度在线网络技术(北京)有限公司 Artificial intelligence based voice print login method and device
US20180007060A1 (en) * 2016-06-30 2018-01-04 Amazon Technologies, Inc. Multi-Factor Authentication to Access Services
CN107578770A (en) * 2017-08-31 2018-01-12 百度在线网络技术(北京)有限公司 Networking telephone audio recognition method, device, computer equipment and storage medium
US20190005961A1 (en) * 2017-06-28 2019-01-03 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for processing voice message, terminal and storage medium
US20190066671A1 (en) * 2017-08-22 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd. Far-field speech awaking method, device and terminal device
CN111583934A (en) * 2020-04-30 2020-08-25 联想(北京)有限公司 Data processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107689225B (en) * 2017-09-29 2019-11-19 福建实达电脑设备有限公司 A method of automatically generating minutes
CN111739558B (en) * 2019-03-21 2023-03-28 杭州海康威视数字技术股份有限公司 Monitoring system, method, device, server and storage medium
CN110875043B (en) * 2019-11-11 2022-06-17 广州国音智能科技有限公司 Voiceprint recognition method and device, mobile terminal and computer readable storage medium
CN113129904B (en) * 2021-03-30 2022-08-23 北京百度网讯科技有限公司 Voiceprint determination method, apparatus, system, device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182627A1 (en) * 2004-01-14 2005-08-18 Izuru Tanaka Audio signal processing apparatus and audio signal processing method
CN104821934A (en) * 2015-03-20 2015-08-05 百度在线网络技术(北京)有限公司 Artificial intelligence based voice print login method and device
US20180007060A1 (en) * 2016-06-30 2018-01-04 Amazon Technologies, Inc. Multi-Factor Authentication to Access Services
US20190005961A1 (en) * 2017-06-28 2019-01-03 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for processing voice message, terminal and storage medium
US20190066671A1 (en) * 2017-08-22 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd. Far-field speech awaking method, device and terminal device
CN107578770A (en) * 2017-08-31 2018-01-12 百度在线网络技术(北京)有限公司 Networking telephone audio recognition method, device, computer equipment and storage medium
CN111583934A (en) * 2020-04-30 2020-08-25 联想(北京)有限公司 Data processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115394304A (en) * 2021-03-30 2022-11-25 北京百度网讯科技有限公司 Voiceprint determination method, apparatus, system, device and storage medium

Also Published As

Publication number Publication date
CN115394304A (en) 2022-11-25
CN113129904B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
CN106448663B (en) Voice awakening method and voice interaction device
CN110268469B (en) Server side hotword
CN107919130B (en) Cloud-based voice processing method and device
CN108665895B (en) Method, device and system for processing information
WO2017031846A1 (en) Noise elimination and voice recognition method, apparatus and device, and non-volatile computer storage medium
US9530400B2 (en) System and method for compressed domain language identification
WO2018068636A1 (en) Method and device for detecting audio signal
WO2014064324A1 (en) Multi-device speech recognition
CN103514882A (en) Voice identification method and system
CN108877779B (en) Method and device for detecting voice tail point
US20200312305A1 (en) Performing speaker change detection and speaker recognition on a trigger phrase
CN111816216A (en) Voice activity detection method and device
US20150325252A1 (en) Method and device for eliminating noise, and mobile terminal
CN113129904B (en) Voiceprint determination method, apparatus, system, device and storage medium
CN110992953A (en) Voice data processing method, device, system and storage medium
CN113658586A (en) Training method of voice recognition model, voice interaction method and device
CN115831125A (en) Speech recognition method, device, equipment, storage medium and product
CN109376224A (en) Corpus filter method and device
CN113838477A (en) Packet loss recovery method and device for audio data packet, electronic equipment and storage medium
CN113808585A (en) Earphone awakening method, device, equipment and storage medium
CN114333017A (en) Dynamic pickup method and device, electronic equipment and storage medium
CN112542157A (en) Voice processing method and device, electronic equipment and computer readable storage medium
CN113539300A (en) Voice detection method and device based on noise suppression, storage medium and terminal
CN112509567A (en) Method, device, equipment, storage medium and program product for processing voice data
CN113096651A (en) Voice signal processing method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant