CN113421570A - Intelligent earphone identity authentication method and device - Google Patents

Intelligent earphone identity authentication method and device Download PDF

Info

Publication number
CN113421570A
CN113421570A CN202110688361.XA CN202110688361A CN113421570A CN 113421570 A CN113421570 A CN 113421570A CN 202110688361 A CN202110688361 A CN 202110688361A CN 113421570 A CN113421570 A CN 113421570A
Authority
CN
China
Prior art keywords
voice
text
voice password
code
processing result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110688361.XA
Other languages
Chinese (zh)
Inventor
张光强
何小兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisyou Technology Shenzhen Co ltd
Original Assignee
Unisyou Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisyou Technology Shenzhen Co ltd filed Critical Unisyou Technology Shenzhen Co ltd
Priority to CN202110688361.XA priority Critical patent/CN113421570A/en
Publication of CN113421570A publication Critical patent/CN113421570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/00174Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys
    • G07C9/00563Electronically operated locks; Circuits therefor; Nonmechanical keys therefor, e.g. passive or active electrical keys or other data carriers without mechanical keys using personal physical data of the operator, e.g. finger prints, retinal images, voicepatterns
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/10Details of earpieces, attachments therefor, earphones or monophonic headphones covered by H04R1/10 but not provided for in any of its subgroups

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides an intelligent earphone identity authentication method and device, and relates to the technical field of identity authentication. The method comprises the following steps: and acquiring a voice password. And processing the voice password to obtain a voice password processing result. And inputting the voice password processing result to the trained acoustic model to obtain a plurality of pinyin identification results. And inputting a plurality of pinyin identification results to the trained language model to obtain a voice text. And comparing the voice text with the character library to obtain a voice comparison text. And if the voice comparison text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone. An intelligent earphone identity authentication device comprises: the device comprises an acquisition module, a voice password processing module, a pinyin identification module, a voice text identification module, a character comparison module and a text comparison module. For the intelligent earphone, user identity authentication can be carried out, whether the intelligent earphone is unlocked or not is judged according to the user identity authentication condition, and the use safety of the intelligent earphone is ensured.

Description

Intelligent earphone identity authentication method and device
Technical Field
The invention relates to the technical field of identity authentication, in particular to an intelligent earphone identity authentication method and device.
Background
With the development of intelligent wearable devices, intelligent headsets are becoming more and more common. However, since the headset has privacy functions such as conversation and voice chat, the identity authentication function based on the headset attracts much attention in the industry in order to protect personal information of users. The function operation of the intelligent earphone is based on pure voice interaction, and voice passwords can be directly sent out through the earphone so as to carry out operations such as calling, short message broadcasting, note taking, schedule and the like. The existing identity authentication technology is mostly realized based on a mobile phone terminal, for example, screen lock screen schemes such as inputting digital passwords, pattern passwords, fingerprint passwords and the like are required to be operated on a screen, and are directed to a hardware terminal with a screen, but for an intelligent terminal without a screen, user identity authentication cannot be performed, and the use safety of the intelligent terminal cannot be guaranteed.
Disclosure of Invention
The invention aims to provide an intelligent earphone identity authentication method and device, which are used for solving the problem that in the prior art, for an intelligent terminal without a screen, user identity authentication cannot be carried out, and the use safety of the intelligent terminal cannot be ensured.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present application provides an intelligent headset identity authentication method, which includes the following steps: and acquiring a voice password. And processing the voice password to obtain a voice password processing result. And inputting the voice password processing result into the trained acoustic model to obtain a plurality of pinyin identification results. And inputting a plurality of pinyin identification results to the trained language model to obtain a voice text. And comparing the voice text with a preset character library to obtain a voice comparison text. And comparing the voice contrast text with the pre-stored text, and if the voice contrast text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone.
In some embodiments of the present invention, after the step of comparing the speech with the pre-stored text, the method further comprises: and if the voice comparison text is inconsistent with the pre-stored text, sending an instruction for re-acquiring the voice password.
In some embodiments of the present invention, the step of processing the voice password to obtain a result of the voice password processing includes: and sampling and coding the voice password to obtain a first code. The first code is converted to obtain a second code. And decoding the second code to obtain a voice password processing result, and transmitting the voice password processing result to the cloud server.
In some embodiments of the present invention, before the step of inputting the result of the voice password processing to the trained acoustic model, the method further includes: and establishing an acoustic initial model. A plurality of voice data are acquired to establish a voice database. And training the acoustic initial model by using the voice database to obtain a trained acoustic model.
In some embodiments of the present invention, before the step of inputting the multiple pinyin recognition results to the trained language model, the method further includes: and establishing a language initial model. A plurality of characters are acquired to build a character database. And training the language initial model by using the character database to obtain a trained language model.
In a second aspect, an embodiment of the present application provides an intelligent headset identity authentication device, which includes: and the acquisition module is used for acquiring the voice password. And the voice password processing module is used for processing the voice password to obtain a voice password processing result. And the pinyin identification module is used for inputting the voice password processing result to the trained acoustic model so as to obtain a pinyin identification result. And the voice text recognition module is used for inputting the pinyin recognition result to the trained language model so as to obtain a voice text. And the character comparison module is used for comparing the voice text with a preset character library to obtain a voice comparison text. And the text comparison module is used for comparing the voice comparison text with the pre-stored text, and if the voice comparison text is consistent with the pre-stored text, an instruction for unlocking the intelligent earphone is sent.
In some embodiments of the invention, the text comparison module includes: and the reacquiring unit is used for sending an instruction of reacquiring the voice password if the voice comparison text is inconsistent with the pre-stored text.
In some embodiments of the present invention, the voice password processing module includes: and the first coding unit is used for sampling and coding the voice password to obtain a first code. And the second coding unit is used for converting the first code to obtain a second code. And the decoding unit is used for decoding the second code to obtain a voice password processing result and transmitting the voice password processing result to the cloud server.
In some embodiments of the present invention, the intelligent earphone identity authentication apparatus further includes: and the acoustic initial model establishing module is used for establishing an acoustic initial model. And the voice database establishing module is used for acquiring a plurality of voice data so as to establish a voice database. And the acoustic model training module is used for training an acoustic initial model by using the voice database so as to obtain a trained acoustic model.
In some embodiments of the present invention, the intelligent earphone identity authentication apparatus further includes: and the language initial model establishing module is used for establishing a language initial model. The character database establishing module is used for acquiring a plurality of characters so as to establish a character database. And the language model training module is used for training the language initial model by utilizing the character database so as to obtain a trained language model.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The program or programs, when executed by a processor, implement a method as in any one of the first aspects above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method according to any one of the first aspect described above.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:
the invention provides an intelligent earphone identity authentication method and device, which comprises the following steps: a voice password is obtained. And processing the voice password to obtain a voice password processing result. And inputting the voice password processing result into the trained acoustic model to obtain a plurality of pinyin identification results. And inputting a plurality of pinyin identification results to the trained language model to obtain a voice text. And comparing the voice text with a preset character library to obtain a voice comparison text. And comparing the voice contrast text with the pre-stored text, and if the voice contrast text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone. The invention utilizes PCM coding to carry out sampling coding on the acquired voice password. And then the PCM code is converted into SBC code or AAC code for transmission, so as to improve the data transmission efficiency. And then, the SBC code or the AAC code is decoded into a code format supported by the cloud server, namely a voice password processing result. And after the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, and then a plurality of pinyin identification results are obtained. After the multiple pinyin identification results are input into the trained language model, the language model identifies the pinyin identification results according to the character database and the upper and lower character semantics of the pinyin identification results in the voice password processing result, so that characters corresponding to the pinyin identification results are obtained, and the obtained characters corresponding to the pinyin identification results are the voice texts, so that the accuracy of the obtained voice texts is improved, and the consistency of the voice texts and the voice of the user is ensured. And the voice text is inquired and compared again through the character library to obtain the voice comparison text which is more accurate than the voice text, so that the consistency of the voice comparison text and the voice of the user is further ensured. And finally, comparing the voice comparison text with the pre-stored text, and when the voice comparison text is matched with the pre-stored text in a consistent manner, passing the verification and sending an instruction for unlocking the intelligent earphone so as to unlock the intelligent earphone. For an intelligent terminal without a screen, such as an intelligent earphone, the purpose of user identity authentication can still be achieved, and whether the intelligent earphone is unlocked or not is judged according to the user identity authentication condition, so that the use safety of the intelligent earphone is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart of an intelligent headset identity authentication method according to an embodiment of the present invention;
fig. 2 is a block diagram of an intelligent earphone identity authentication apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural block diagram of an electronic device according to an embodiment of the present invention.
Icon: 100-intelligent earphone identity authentication device; 110-an obtaining module; 120-a voice password processing module; 130-pinyin identification module; 140-a speech text recognition module; 150-text comparison module; 160-text comparison module; 101-a memory; 102-a processor; 103-communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", "third", etc. are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, if it appears that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it should be noted that if the terms "upper", "lower", "inner", "outer", etc. are used to indicate an orientation or positional relationship based on that shown in the drawings or that is usually placed when the product of the application is used, the description is merely for convenience and simplicity of description, and it is not intended to indicate or imply that the referenced device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present application.
In the description of the present application, it should also be noted that, unless otherwise explicitly stated or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments and features of the embodiments described below can be combined with each other without conflict.
Examples
Referring to fig. 1, fig. 1 is a flowchart illustrating an intelligent headset identity authentication method according to an embodiment of the present disclosure. An intelligent earphone identity authentication method comprises the following steps:
s110: acquiring a voice password;
specifically, the user sends out voice, and the intelligent earphone can record the voice sent out by the user so as to receive the voice of the user and achieve the effect of obtaining the voice password.
S120: processing the voice password to obtain a voice password processing result;
specifically, the acquired voice password may be sample-coded by PCM coding, and the sampling rate of the PCM coding may be 16 KHz. The PCM code can be converted into SBC code or AAC code through the Bluetooth chip of the intelligent headset. The SBC code is an abbreviation of subband codec, the input of the SBC code is PCM code, and the output is binary stream. The basic principle of the SBC coding is to divide the frequency of a signal into a plurality of sub-bands, then encode each sub-band, pack the data encoded by each sub-band as one frame of data, and output the data in a binary stream manner, and the SBC coding is suitable for an android system. AAC coding is an audio compression algorithm with high compression ratio, and AAC coding is suitable for an apple system. The memory of the SBC code or the AAC code is greatly smaller than that of the PCM code, and the data transmission efficiency can be effectively improved by converting the PCM code into the SBC code or the AAC code for transmission. And transmitting the SBC code or the AAC code to the Bluetooth equipment of the charging box through Bluetooth, decoding the SBC code or the AAC code by the charging box chip, converting the SBC code or the AAC code into PCM code, and obtaining the PCM code which is the voice password processing result.
As another implementation manner of this embodiment, after transmitting the SBC code or the AAC code to the bluetooth device of the charging box through bluetooth, the charging box chip may decode the SBC code or the AAC code, and encode the SBC code or the AAC code into another encoding format supported by the cloud server, for example, an OPUS code, where the OPUS code is the above-mentioned voice password processing result.
S130: inputting the voice password processing result into a trained acoustic model to obtain a plurality of pinyin identification results;
specifically, after the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, so that a plurality of pinyin identification results are obtained.
It should be noted that the voice data may include chinese voice data and english voice data.
S140: inputting a plurality of pinyin identification results to the trained language model to obtain a voice text;
specifically, after a plurality of pinyin identification results are input into a trained language model, the language model identifies each pinyin identification result according to the character database and the upper and lower character semantics of each pinyin identification result in the voice password processing result, and then obtains characters corresponding to each pinyin identification result, and the obtained characters corresponding to each pinyin identification result are the voice text. The method and the device have the advantages that the phonetic passwords are processed to obtain a plurality of phonetic recognition results, and then the phonetic recognition results are recognized, so that the accuracy of the obtained phonetic text can be improved, and the consistency of the phonetic text and the user's voice is ensured.
S150: comparing the voice text with a preset text library to obtain a voice comparison text;
specifically, the preset word stock comprises a plurality of text messages, the voice text is compared with the plurality of text messages in the word stock, and the plurality of text messages can judge whether the semantics in the voice text are correct or not, namely, the voice text is inquired and compared again through the word stock to obtain the voice comparison text.
S160: and comparing the voice contrast text with the pre-stored text, and if the voice contrast text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone.
Specifically, the pre-stored text is a text which is pre-entered into the cloud server by the earphone holder. And performing word-by-word matching comparison on the voice comparison text and the pre-stored text, and when the voice comparison text is matched with the pre-stored text, checking to pass and sending an instruction for unlocking the intelligent earphone so as to unlock the intelligent earphone.
In the implementation process, the acquired voice password is sampled and encoded by utilizing the PCM code with the sampling rate of 16 KHz. And then the PCM codes are converted into SBC codes or AAC codes for transmission so as to improve the data transmission efficiency. The SBC coding or AAC coding is then decoded into a coding format supported by the cloud server, such as PCM coding, OPUS coding, and the like, i.e., a voice password processing result. And after the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, and then a plurality of pinyin identification results are obtained. After a plurality of pinyin identification results are input to the trained language model, the language model identifies each pinyin identification result according to the character database and the upper and lower character semantics of each pinyin identification result in the voice password processing result, so that characters corresponding to each pinyin identification result are obtained, and the obtained characters corresponding to the plurality of pinyin identification results are the voice text, so that the accuracy of the obtained voice text is improved, and the consistency of the voice text and the voice of the user is ensured. And the voice text is inquired and compared again through the character library to obtain the voice comparison text which is more accurate than the voice text, so that the consistency of the voice comparison text and the voice of the user is further ensured. And finally, comparing the voice comparison text with the pre-stored text, and when the voice comparison text is matched with the pre-stored text in a consistent manner, passing the verification and sending an instruction for unlocking the intelligent earphone so as to unlock the intelligent earphone. For an intelligent terminal without a screen, such as an intelligent earphone, the purpose of user identity authentication can still be achieved, and whether the intelligent earphone is unlocked or not is judged according to the user identity authentication condition, so that the use safety of the intelligent earphone is ensured.
In the implementation process, after the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result and further obtain a plurality of pinyin identification results, and after the plurality of pinyin identification results are input into the trained language model, the language model identifies each pinyin identification result according to the upper and lower character semantics of the character database and each pinyin identification result in the voice password processing result, and further obtains characters corresponding to each pinyin identification result. Therefore, the obtained voice text considers each pinyin identification result and the upper and lower word semantics of each pinyin identification result in the voice password processing result, and the process of converting the voice password processing result into the voice text can avoid the error brought by the dialect of the user to a certain extent.
In some embodiments of this embodiment, the user may set the pre-stored texts in different time periods as different text information, and then the password of the holder is different in different time periods, so that the smart headset is safer and more reliable. For example, the user may store the pre-stored text in the time period of 15:00-17:00 as "bright moon in bed", and when the user needs to use the smart headset at 15:30, the user needs to say "bright moon in bed" to the smart headset, so that the smart headset records the voice command of "bright moon in bed". The voice password of the 'moon before bed' is processed to obtain the voice password processing result of the 'moon before bed'. Inputting the voice password processing result of 'the front moon of bed' into the trained acoustic model to obtain a plurality of pinyin recognition results 'chuang', 'qian', 'ming', 'yue' and 'guang' of 'the front moon of bed'. Inputting a plurality of pinyin identification results into a trained language model to obtain a voice text 'bright moon before bed', comparing the voice text with a preset text library to obtain a voice contrast text 'bright moon before bed', wherein the voice contrast text 'bright moon before bed' is 'bright moon before bed', and the voice contrast text is consistent with a pre-stored text at the moment, and sending a command for unlocking the intelligent earphone to successfully unlock the intelligent earphone.
In some embodiments of this embodiment, after the step of comparing the speech with the pre-stored text, the method further includes: and if the voice comparison text is inconsistent with the pre-stored text, sending an instruction for re-acquiring the voice password. Specifically, the voice comparison text and the pre-stored text are matched and compared word by word, when the voice comparison text is not matched with the pre-stored text in a consistent mode, the verification is not passed, the cloud server sends a command of reacquiring the voice password to the charging box, then the charging box transmits the command of reacquiring the voice password to the intelligent earphone, and the intelligent earphone sends out voice of're-inputting the unlocking password' after receiving the command so as to prompt a user to input the voice again for re-verification.
In some embodiments of this embodiment, the step of processing the voice password to obtain a voice password processing result includes: and sampling and coding the voice password to obtain a first code. The first code is converted to obtain a second code. And decoding the second code to obtain a voice password processing result, and transmitting the voice password processing result to the cloud server. Specifically, the voice password is sample-coded by PCM coding to obtain PCM coding, i.e., a first coding. The sampling rate of the PCM encoding may be 16 KHz. The PCM coding can be converted into SBC coding or AAC coding, i.e. the second coding, by the bluetooth chip of the smart headset. The SBC code is an abbreviation of subband codec, the input of the SBC code is PCM code, and the output is binary stream. The basic principle of the SBC coding is to divide the frequency of a signal into a plurality of sub-bands, then code each sub-band, pack the data encoded by each sub-band, and output the data as one frame of data in a binary stream manner, and the SBC coding is suitable for an android system. AAC coding is an audio compression algorithm with high compression ratio, and is suitable for the apple system. The memory of the SBC code or the AAC code is greatly smaller than that of the PCM code, and the data transmission efficiency can be effectively improved by converting the PCM code into the SBC code or the AAC code for transmission. The SBC code or the AAC code is transmitted to the Bluetooth equipment of the charging box through Bluetooth, and the charging box chip decodes the SBC code or the AAC code and converts the SBC code or the AAC code into a coding format supported by the cloud server, such as PCM code and OPUS code, namely a voice password processing result. And the chip of the charging box transmits the voice password processing result to the cloud server through the communication transmission port.
In some embodiments of this embodiment, before the step of inputting the result of the voice password processing to the trained acoustic model, the method further includes: and establishing an acoustic initial model. And acquiring a plurality of voice data to establish a voice database. And training the acoustic initial model by using the voice database to obtain a trained acoustic model. Specifically, the acoustic initial model is trained by using a voice database containing a plurality of voice data, and then the trained acoustic model is obtained. After the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, and a plurality of pinyin identification results are also obtained.
In some embodiments of this embodiment, before the step of inputting the multiple pinyin recognition results to the trained language model, the method further includes: and establishing a language initial model. A plurality of words are acquired to build a word database. And training the language initial model by using the character database to obtain a trained language model. Specifically, the language initial model is trained by using a character database containing a plurality of characters, and then the trained language model is obtained. After a plurality of pinyin identification results are input into the trained language model, the language model identifies each pinyin identification result according to the character database and the upper and lower character semantics of each pinyin identification result in the voice command processing result to obtain characters corresponding to each pinyin identification result, and then obtains a voice text.
In a second aspect, an embodiment of the present application provides an intelligent earphone identity authentication apparatus 100, which includes: an obtaining module 110, configured to obtain the voice password. And the voice password processing module 120 is configured to process the voice password to obtain a voice password processing result. And the pinyin identification module 130 is configured to input the speech password processing result to the trained acoustic model to obtain a pinyin identification result. And the voice text identification module 140 is configured to input the pinyin identification result to the trained language model to obtain a voice text. And the character comparison module 150 is used for comparing the voice text with a preset character library to obtain a voice comparison text. And the text comparison module 160 is used for comparing the voice comparison text with the pre-stored text, and if the voice comparison text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone. In the implementation process, the voice password is obtained through the obtaining module 110. The voice password is processed by the voice password processing module 120, specifically, firstly, the acquired voice password is sampled and encoded by using PCM encoding with a sampling rate of 16KHz, then the PCM encoding is converted into SBC encoding or AAC encoding for transmission, so as to improve the data transmission efficiency, and then the SBC encoding or AAC encoding is decoded into an encoding format supported by the cloud server, for example, PCM encoding, OPUS encoding, and the like, i.e., a voice password processing result. After the pinyin identification module 130 inputs the voice password processing result to the trained acoustic model, the voice password processing result is compared with a plurality of voice data in the voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, and then a plurality of pinyin identification results are obtained. After the speech text recognition module 140 inputs the multiple pinyin recognition results into the trained language model, the language model recognizes each pinyin recognition result according to the character database and the upper and lower character semantics of each pinyin recognition result in the speech password processing result, so as to obtain characters corresponding to each pinyin recognition result, and the obtained characters corresponding to the multiple pinyin recognition results are the speech text, so that the accuracy of the obtained speech text is improved, and the consistency of the speech text and the user speech is ensured. The text comparison module 150 performs query comparison on the voice text again through the text library to obtain a voice comparison text, which is more accurate than the voice text, and further ensures consistency between the voice comparison text and the voice of the user. The voice contrast text and the pre-stored text are compared through the text contrast module 160, when the voice contrast text is matched with the pre-stored text in a consistent mode, the verification is passed, and an instruction for unlocking the intelligent earphone is sent out to unlock the intelligent earphone. For an intelligent terminal without a screen, such as an intelligent earphone, the purpose of user identity authentication can still be achieved, and whether the intelligent earphone is unlocked or not is judged according to the user identity authentication condition, so that the use safety of the intelligent earphone is ensured.
In some embodiments of this embodiment, the text comparison module 160 includes: and the reacquisition unit is used for sending an instruction of reacquiring the voice command if the voice comparison text is inconsistent with the pre-stored text. Specifically, the voice comparison text and the pre-stored text are subjected to word-by-word matching comparison through the reacquiring unit, when the voice comparison text is inconsistent with the pre-stored text in matching, the verification is not passed, and the cloud server sends a command of reacquiring the voice password to the charging box. Then the charging box can transmit the command to the intelligent earphone, and the intelligent earphone sends out the voice of're-entering the unlocking password' after receiving the command so as to prompt the user to input the voice again for re-checking.
In some embodiments of this embodiment, the voice password processing module 120 includes: the first coding unit is used for sampling and coding the voice password to obtain a first code. And the second coding unit is used for converting the first code to obtain a second code. And the decoding unit is used for decoding the second code to obtain a voice password processing result and transmitting the voice password processing result to the cloud server. Specifically, the voice password is sample-coded by the first coding unit to obtain PCM coding, i.e. the first coding. The PCM encoding is converted into the SBC encoding or the AAC encoding, i.e., the second encoding, by the second encoding unit. The SBC decoding or AAC encoding is decoded by the decoding unit and converted into an encoding format supported by the cloud server, such as PCM encoding and OPUS encoding, namely a voice password processing result. And then the chip of the charging box transmits the voice password processing result to the cloud server through the communication transmission port.
In some embodiments of the present embodiment, the smart headset identity authentication apparatus 100 further includes: and the acoustic initial model establishing module is used for establishing an acoustic initial model. And the voice database establishing module is used for acquiring a plurality of voice data so as to establish a voice database. And the acoustic model training module is used for training an acoustic initial model by utilizing the voice database so as to obtain a trained acoustic model. Specifically, an acoustic initial model is established through an acoustic initial model establishing module. And acquiring a plurality of voice data through a voice database establishing module to establish a voice database. And training an acoustic initial model through an acoustic model training module to obtain a well-trained acoustic model. And training the acoustic initial model by using a voice database containing a plurality of voice data to obtain the trained acoustic model.
In some embodiments of the present embodiment, the smart headset identity authentication apparatus 100 further includes: and the language initial model establishing module is used for establishing a language initial model. The character database building module is used for acquiring a plurality of characters so as to build a character database. And the language model training module is used for training the language initial model by utilizing the character database so as to obtain a trained language model. Specifically, a language initial model is established through a language initial model establishing module. And acquiring a plurality of characters through a character database building module to build a character database. And training the language initial model through a language model training module to obtain a trained language model. The language initial model is trained by using the character database containing a plurality of characters, and the trained language model can be obtained.
Referring to fig. 3, fig. 3 is a schematic structural block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device comprises a memory 101, a processor 102 and a communication interface 103, wherein the memory 101, the processor 102 and the communication interface 103 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the intelligent earphone identity authentication apparatus 100 provided in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 101, so as to perform various functional applications and data processing. The communication interface 103 may be used for communicating signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a Random Access Memory 101 (RAM), a Read Only Memory 101 (ROM), a Programmable Read Only Memory 101 (PROM), an Erasable Read Only Memory 101 (EPROM), an electrically Erasable Read Only Memory 101 (EEPROM), and the like.
The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be essentially implemented or contributed to by the prior art or parts thereof in the form of a software product stored in a storage medium, and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory 101 (ROM), a Random Access Memory 101 (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
To sum up, the method and the device for authenticating the identity of the intelligent headset provided by the embodiment of the application comprise the following steps: and acquiring a voice password. And processing the voice password to obtain a voice password processing result. And inputting the voice password processing result into the trained acoustic model to obtain a plurality of pinyin identification results. And inputting a plurality of pinyin identification results to the trained language model to obtain a voice text. And comparing the voice text with a preset character library to obtain a voice comparison text. And comparing the voice contrast text with the pre-stored text, and if the voice contrast text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone. The invention utilizes PCM coding to carry out sampling coding on the acquired voice password. And then the PCM codes are converted into SBC codes or AAC codes for transmission, so that the data transmission efficiency is improved. And then, the SBC code or the AAC code is decoded into a code format supported by the cloud server, namely a voice password processing result. And after the voice password processing result is input into the trained acoustic model, the voice password processing result is compared with a plurality of voice data in a voice database in the acoustic model to obtain a plurality of pinyins in the voice password processing result, and then a plurality of pinyin identification results are obtained. After the multiple pinyin identification results are input to the trained language model, the language model identifies the pinyin identification results according to the character database and the upper and lower character semantics of the pinyin identification results in the voice password processing result, so that characters corresponding to the pinyin identification results are obtained, and the obtained characters corresponding to the pinyin identification results are the voice texts, so that the accuracy of the obtained voice texts is improved, and the consistency of the voice texts and the voice of the user is ensured. And the voice text is inquired and compared again through the character library to obtain a voice comparison text which is more accurate than the voice text, so that the consistency of the voice comparison text and the voice of the user is further ensured. And finally, comparing the voice comparison text with the pre-stored text, and when the voice comparison text is matched with the pre-stored text in a consistent manner, passing the verification and sending an instruction for unlocking the intelligent earphone so as to unlock the intelligent earphone. For an intelligent terminal without a screen, such as an intelligent earphone, the purpose of user identity authentication can still be achieved, and whether the intelligent earphone is unlocked or not is judged according to the user identity authentication condition, so that the use safety of the intelligent earphone is ensured.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. An intelligent earphone identity authentication method is characterized by comprising the following steps:
acquiring a voice password;
processing the voice password to obtain a voice password processing result;
inputting the voice password processing result to a trained acoustic model to obtain a plurality of pinyin identification results;
inputting a plurality of pinyin identification results to a trained language model to obtain a voice text;
comparing the voice text with a preset text library to obtain a voice comparison text;
and comparing the voice comparison text with a pre-stored text, and if the voice text is consistent with the pre-stored text, sending an instruction for unlocking the intelligent earphone.
2. The smart headset identity authentication method of claim 1, wherein after the step of comparing the voice to a pre-stored text, the method further comprises:
and if the voice comparison text is inconsistent with the pre-stored text, sending an instruction for re-acquiring the voice password.
3. The method for authenticating the identity of an intelligent earphone according to claim 1, wherein the step of processing the voice password to obtain a voice password processing result comprises:
sampling and coding the voice password to obtain a first code;
converting the first code to obtain a second code;
and decoding the second code to obtain a voice password processing result, and transmitting the voice password processing result to a cloud server.
4. The method of claim 1, wherein before the step of inputting the result of the voice password processing to the trained acoustic model, the method further comprises:
establishing an acoustic initial model;
acquiring a plurality of voice data to establish a voice database;
and training the acoustic initial model by using the voice database to obtain a trained acoustic model.
5. The method for authenticating identity of a smart headset as recited in claim 1, wherein prior to the step of inputting the plurality of pinyin recognition results to the trained language model, the method further comprises:
establishing a language initial model;
acquiring a plurality of characters to establish a character database;
and training the language initial model by using the character database to obtain a trained language model.
6. An intelligent earphone identity authentication device, comprising:
the acquisition module is used for acquiring a voice password;
the voice password processing module is used for processing the voice password to obtain a voice password processing result;
the pinyin identification module is used for inputting the voice password processing result to a trained acoustic model to obtain a pinyin identification result;
the speech text recognition module is used for inputting the pinyin recognition result to a trained language model to obtain a speech text;
the character comparison module is used for comparing the voice text with a preset character library to obtain a voice comparison text;
and the text comparison module is used for comparing the voice comparison text with a pre-stored text, and if the voice text is consistent with the pre-stored text, an instruction for unlocking the intelligent earphone is sent.
7. The intelligent earphone identity authentication device of claim 6, wherein the text comparison module comprises:
and the reacquiring unit is used for sending an instruction of reacquiring the voice password if the voice comparison text is inconsistent with the pre-stored text.
8. The intelligent earphone identity authentication device of claim 6, wherein the voice password processing module comprises:
the first coding unit is used for sampling and coding the voice password to obtain a first code;
the second coding unit is used for converting the first code to obtain a second code;
and the decoding unit is used for decoding the second code to obtain a voice password processing result and transmitting the voice password processing result to the cloud server.
9. An electronic device, comprising:
a memory for storing one or more programs;
a processor;
the one or more programs, when executed by the processor, implement the method of any of claims 1-5.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN202110688361.XA 2021-06-21 2021-06-21 Intelligent earphone identity authentication method and device Pending CN113421570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110688361.XA CN113421570A (en) 2021-06-21 2021-06-21 Intelligent earphone identity authentication method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110688361.XA CN113421570A (en) 2021-06-21 2021-06-21 Intelligent earphone identity authentication method and device

Publications (1)

Publication Number Publication Date
CN113421570A true CN113421570A (en) 2021-09-21

Family

ID=77789609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110688361.XA Pending CN113421570A (en) 2021-06-21 2021-06-21 Intelligent earphone identity authentication method and device

Country Status (1)

Country Link
CN (1) CN113421570A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN108509119A (en) * 2017-02-28 2018-09-07 三星电子株式会社 Operating method and its electronic equipment of support for the electronic equipment that function executes
CN109493494A (en) * 2018-12-15 2019-03-19 深圳壹账通智能科技有限公司 Method for unlocking, device, equipment and medium based on smart lock
CN208820987U (en) * 2018-09-14 2019-05-03 中科智云科技(珠海)有限公司 The voice dialogue device of function is corrected with wrong word
CN111276135A (en) * 2018-12-03 2020-06-12 华为终端有限公司 Network voice recognition method, network service interaction method and intelligent earphone
CN112511944A (en) * 2020-12-03 2021-03-16 歌尔科技有限公司 Multifunctional earphone charging box

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509119A (en) * 2017-02-28 2018-09-07 三星电子株式会社 Operating method and its electronic equipment of support for the electronic equipment that function executes
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN208820987U (en) * 2018-09-14 2019-05-03 中科智云科技(珠海)有限公司 The voice dialogue device of function is corrected with wrong word
CN111276135A (en) * 2018-12-03 2020-06-12 华为终端有限公司 Network voice recognition method, network service interaction method and intelligent earphone
CN109493494A (en) * 2018-12-15 2019-03-19 深圳壹账通智能科技有限公司 Method for unlocking, device, equipment and medium based on smart lock
CN112511944A (en) * 2020-12-03 2021-03-16 歌尔科技有限公司 Multifunctional earphone charging box

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于江德等: "《统计语言模型内在机制及应用》", 北京:科学技术文献出版社, pages: 247 *

Similar Documents

Publication Publication Date Title
CN106782572B (en) Voice password authentication method and system
US10140992B2 (en) System and method for voice authentication over a computer network
KR102648306B1 (en) Speech recognition error correction method, related devices, and readable storage medium
CN111883140B (en) Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
US12010108B2 (en) Techniques to provide sensitive information over a voice connection
CN107533598B (en) Input method and device of login password of application program and terminal
CN101208739A (en) Speech recognition system for secure information
US10269353B2 (en) System and method for transcription of spoken words using multilingual mismatched crowd unfamiliar with a spoken language
CN111858892A (en) Voice interaction method, device, equipment and medium based on knowledge graph
CN111343162B (en) System secure login method, device, medium and electronic equipment
CN109493494A (en) Method for unlocking, device, equipment and medium based on smart lock
CN103366745A (en) Method for protecting terminal equipment based on speech recognition and terminal equipment
CN106713111B (en) Processing method for adding friends, terminal and server
CN105227557A (en) A kind of account number processing method and device
CN111768789A (en) Electronic equipment and method, device and medium for determining identity of voice sender thereof
KR20180134482A (en) Apparatus for managing address book using voice recognition, vehicle, system and method thereof
CN110728984A (en) Database operation and maintenance method and device based on voice interaction
CN113421570A (en) Intelligent earphone identity authentication method and device
CN104408345A (en) Method and device for checking identity and server
US20230386453A1 (en) Method for detecting an audio adversarial attack with respect to a voice command processed byan automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
US12008091B2 (en) Single input voice authentication
CN109359307B (en) Translation method, device and equipment for automatically identifying languages
CN117768154A (en) Computer network identity verification system
JP2009086207A (en) Minute information generation system, minute information generation method, and minute information generation program
CA3191994A1 (en) Secure communication system with speaker recognition by voice biometrics for user groups such as family groups

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210921

RJ01 Rejection of invention patent application after publication