CN110232917A - Voice login method, device, equipment and storage medium based on artificial intelligence - Google Patents

Voice login method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN110232917A
CN110232917A CN201910424460.XA CN201910424460A CN110232917A CN 110232917 A CN110232917 A CN 110232917A CN 201910424460 A CN201910424460 A CN 201910424460A CN 110232917 A CN110232917 A CN 110232917A
Authority
CN
China
Prior art keywords
account
text
voice
preset
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910424460.XA
Other languages
Chinese (zh)
Inventor
彭捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910424460.XA priority Critical patent/CN110232917A/en
Publication of CN110232917A publication Critical patent/CN110232917A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0815Network architectures or network communication protocols for network security for authentication of entities providing single-sign-on or federations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to field of artificial intelligence more particularly to a kind of voice login method, device, equipment and storage mediums based on artificial intelligence.This method comprises: the corresponding voice fields of acquisition account text, multiple voice fields input speech recognition modeling is trained, speech recognition modeling is obtained;Log on request is received, voice fields are acquired, voice fields are inputted in speech recognition modeling, account text is obtained;Judge whether account text is Chinese text, if Chinese text, then Chinese part in account text is converted into phonetic, obtains English text;English text is subjected to error correction comparison, target unlock account is obtained, is logged in using target unlock account.The present invention is abandoned by way of inputting conventional keyboard, by the way of oral account, is logged in more simple and convenient.It is used as using speech recognition technology and logs in password, it is not easy to pretended to be by criminal and logs in, more secure and reliable.

Description

Voice login method, device, equipment and storage medium based on artificial intelligence
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of voice login methods based on artificial intelligence, dress It sets, equipment and storage medium.
Background technique
With the development of science and technology more and more enterprises realize electronic office by enterprise's inline system.Usual feelings Under condition, before enterprise staff is handled official business by enterprise's inline system, need first to log in the account of oneself, employee's account is to ensure employee just The important outpost for being often used enterprise-wide computer, logging in Intranet etc., and because of the difference of department, employee also needs to belong to difference Active Directory organization unit in manage, it is also desirable to possess different department privilege.Enterprise staff only correctly enters After employee's account logs in, just the work such as mail transmission/reception, online editing can be carried out according to department privilege.
But existing enterprise's inline system, it is general that the channel of keyboard input employee's account is only provided, by obtaining employee Account carries out the unlock of employee's account, completes to log in.Some Enterprises inline system also needs to input ID card No. unlock, realizes It logs in, since identification card number code bit number is more, when input is easy to happen mistake, and logging in work to employee makes troubles.
Summary of the invention
In view of this, it is necessary to log in inconvenient problem for enterprise's inline system, provide a kind of based on artificial intelligence Voice login method, device, equipment and storage medium.
A kind of voice login method based on artificial intelligence, comprising:
Preset account text is obtained, the corresponding voice fields of the account text are acquired, by multiple voice fields It inputs preset speech recognition modeling to be trained, the speech recognition modeling after being trained;
Log on request is received, preset recording software acquisition voice fields is called, the voice fields is inputted into institute's predicate Speech recognition is carried out in sound identification model, obtains the corresponding account text of the voice fields;
Judge whether the account text is Chinese text, if Chinese text, then by Chinese portion in the account text Divide and be converted into phonetic, obtains English text, otherwise, the account text is English text;
The English text is subjected to error correction comparison, obtains target unlock account, is carried out using target unlock account It logs in.
It is described to obtain preset account text in a kind of possible design, acquire the corresponding voice word of the account text Section, comprising:
From system database obtain recording push table, it is described recording push table in equipped with all employees system account and Corresponding account text reads the account text one by one;
The account text of reading is shown by acquisition interface to the corresponding employee of the system account one by one, The record command that the employee sends is received by the acquisition interface, calls preset recording software, it is soft by the recording Part acquires the voice fields that the employee records, and the account text of the voice fields and displaying is associated and is protected It deposits;
The recording push table is traversed, corresponding at least one voice fields of each account text are obtained.
It is described to obtain preset account text in a kind of possible design, acquire the corresponding voice word of the account text Duan Hou, comprising:
The format of all voice fields of acquisition is converted into identical audio format;
The voice fields after transducing audio format are subjected to mute detection, intercept efficient voice field.
In a kind of possible design, the voice fields after the format by transducing audio carry out mute detection, interception Efficient voice field, comprising:
The voice fields are split according to fixed duration, each cutting unit are defined as a frame voice, to every The identical N number of sampled point of frame voice collecting quantity;
The energy value of every frame voice is calculated, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, and fk is the peak value of k-th of sampled point, and N is that the sampled point of a frame voice is total Number;
If the energy value of continuous N frame voice is higher than preset threshold, the first frame of preset value will be higher than in continuous N frame voice Voice is defined as the preceding breakpoint of a segment of audio, if begin lower than preset threshold from the energy value of M+1 frame, and continue one it is default when It is long, then M+1 frame voice is defined as to the rear breakpoint of a segment of audio, the audio intercepted between the preceding breakpoint and the rear breakpoint is One section of efficient voice field.
It is described multiple voice fields are inputted into preset speech recognition modeling to instruct in a kind of possible design Practice, the speech recognition modeling after being trained, comprising:
Preset neural network model is configured to input layer, hidden layer, output layer, the input layer and the hidden layer Using a variety of connection types, every kind of connection type corresponds to a connection weight, and the hidden layer is after biasing, by activating letter Number carries out parameter and passes to the output layer, initializes connection weight and the biasing of the neural network model;
Multiple voice fields are defined as training set, the neural network model is inputted and is trained, calculate output Layer, the output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of company of i-th of voice fields Meet weight, bjFor the biasing of hidden layer;
By the output layer YjIt is decoded by preset decoder, obtains speech text;
The speech text is compared with corresponding account text, when error rate is greater than training threshold value, adjusts institute Connection weight and biasing are stated, training is re-started, until error rate is not more than the trained threshold value, the voice after being trained is known Other model.
It is described by the output layer Y in a kind of possible designjIt is decoded by preset decoder, obtains voice Text, comprising:
All account texts are added in preset corpus;
By the output layer YjA variety of sentences are parsed by preset pronunciation dictionary, any sentence is labeled as D, sentence It is made of in D n word, sentence D is defined as D=(w1, w2..., wn), then the probability P (D) of sentence D are as follows:
P (D)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence D, C (wn-1) it is the number that (n-1)th word occurs in corpus, C (wn-1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;
Using the maximum sentence D of probability P (D) as final speech text.
It is described that the English text is subjected to error correction comparison in a kind of possible design, obtain target unlock account, packet It includes:
All system accounts in system database are obtained, each system account and the English text are calculated separately Character error rate value weri, character error rate value weriCalculation formula it is as follows:
Wherein, SiIndicate the alphabetical number replaced when being calculated with i-th of system account, D is indicated and i-th of system account The alphabetical number deleted when number being calculated, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiTable Show total number alphabetical in i-th of system account;
Obtain character error rate value weriIn minimum value, by the corresponding system account of minimum value be defined as target unlock account Number.
A kind of voice logon device based on artificial intelligence, comprising:
Training module acquires the corresponding voice fields of the account text for obtaining preset account text, will be multiple The voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by institute's predicate for receiving log on request Sound field inputs in the speech recognition modeling and carries out speech recognition, obtains the corresponding account text of the voice fields;
Conversion module, for judging whether the account text is Chinese text, if Chinese text, then by the account Chinese part is converted into phonetic in text, obtains English text, otherwise, the account text is English text;
Error correction comparison module obtains target unlock account, using described for the English text to be carried out error correction comparison Target unlock account is logged in.
A kind of computer equipment, including memory and processor are stored with computer-readable instruction in the memory, institute When stating computer-readable instruction and being executed by the processor, so that the processor executes the above-mentioned voice based on artificial intelligence and steps on The step of Lu Fangfa.
A kind of storage medium being stored with computer-readable instruction, the computer-readable instruction are handled by one or more When device executes, so that the step of one or more processors execute the above-mentioned voice login method based on artificial intelligence.
Above-mentioned voice login method, device, equipment and storage medium based on artificial intelligence, including obtain preset account Text acquires the corresponding voice fields of the account text, and multiple voice fields are inputted preset speech recognition modeling It is trained, the speech recognition modeling after being trained;Log on request is received, preset recording software acquisition voice word is called The voice fields are inputted in the speech recognition modeling and carry out speech recognition, obtain the corresponding account of the voice fields by section Number text;Judge whether the account text is Chinese text, if Chinese text, then by part Chinese in the account text It is converted into phonetic, obtains English text, otherwise, the account text is English text;The English text is subjected to error correction It compares, obtains target unlock account, logged in using target unlock account.The present invention is by key entry system account Mode replaces with the voice fields of speech recognition enterprise staff oral account, is abandoned by way of inputting conventional keyboard, using mouth The mode stated logs in more simple and convenient.It is used as using speech recognition technology and logs in password, it is not easy to pretended to be by criminal and step on Land, more secure and reliable.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.
Fig. 1 is the flow chart of the voice login method based on artificial intelligence in one embodiment of the invention;
Fig. 2 is the structure chart of the voice logon device based on artificial intelligence in one embodiment of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.
Fig. 1 is the flow chart of the voice login method based on artificial intelligence in one embodiment of the invention, such as Fig. 1 institute Show, a kind of voice login method based on artificial intelligence, comprising the following steps:
Training speech recognition modeling: step S1 obtains preset account text, the corresponding voice word of acquisition account text Multiple voice fields are inputted preset speech recognition modeling and are trained by section, the speech recognition modeling after being trained.
It generally all include the system account and password of the system that is registered in, system in the system database of enterprise's inline system Account is usually Chinese phonetic alphabet tailing portion number, such as ZHANGSAN123, and account text includes Chinese text and English text, Therefore there are two types of the corresponding voice fields of account text, one kind is Chinese speech field, and such as " Zhang San 123 ", another kind is English Voice fields, such as " ZHANGSAN123 ".After being acquired respectively to two kinds of voice fields, instructed by speech recognition modeling Practice, the speech recognition modeling after being trained.
In one embodiment, in step S1, preset account text, the corresponding voice word of acquisition account text are obtained Section, comprising:
Step S101 reads account text: obtaining recording push table from system database, is equipped with institute in recording push table There are the system account and corresponding account text of employee, reads account text one by one.
The recording push table of this step can be the account registration table of storage system account and password, in recording push table It is stored with all system accounts and corresponding password and account text.Account text can be Chinese text, such as " Zhang San 123 " Format, account text can also be with English texts, such as format of " ZHANGSAN123 ".
Step S102 acquires voice fields: one by one that the account text of reading is corresponding to system account by acquisition interface Employee be shown, by acquisition interface receive employee send record command, call preset recording software, pass through recording The voice fields that software collection employee records, the account text of voice fields and displaying is associated and is saved.
This step acquires voice fields when acquiring voice fields, to the registered employee in enterprise's inline system.? When acquisition, corresponding employee is pushed to by preset acquisition interface, acquisition interface includes start button, submitting button, acquires boundary Face also shows that the account text that needs acquire.Employee understands the particular content for needing to record according to account text.When employee triggers After start button, record command is received, preset recording software is called, starts to record, until the stopping in triggering recording software Instruction stops recording.In this step, employee presses one-shot button, triggering recording sign on, press again starting by Button triggers halt instruction.Employee presses submitting button, and instruction is submitted in triggering, the voice fields that recording software is recorded and displaying Account text saved together.
Step S103 traverses table: traversal recording push table obtains the corresponding at least one voice word of each account text Section.
This step can read the Entered state of enterprise's inline system before account text is pushed to employee, will locate It is less than preset push threshold value in online and push times, acquisition interface is pushed automatically, after the completion of push, to record The corresponding push times of this system account add one in sound push table.Therefore recording push table can be traversed, push times are not reached Account text to push threshold value continues to push, and until obtaining a plurality of voice fields, above-mentioned design can greatly increase instruction Experienced sample.
The present embodiment obtains the voice fields that each employee records by way of pushing acquisition interface, as training sample Originally, really and little with the subsequent voice fields difference for logging in acquisition due to the recording of employee, using voice fields as instruction It is more reliable accurate to practice the speech recognition modeling that sample training goes out.
In one embodiment, in step S1, preset account text, the corresponding voice fields of acquisition account text are obtained Afterwards, comprising:
Transducing audio format: the format of all voice fields of acquisition is converted into identical audio by step S104 Format.
This step is also handled voice fields after collecting the corresponding a plurality of voice fields of account text.Due to In acquisition, the recording arrangement that employee uses may be different, therefore the audio format recorded also can be variant, in order to reduce training When error, all voice fields acquired are carried out unified format by preset switching software by this step, such as are converted At unified MP3 format, WMA format or WAV format etc..
Mute detection: voice fields after transducing audio format are carried out mute detection, intercept efficient voice by step S105 Field.
By carrying out mute detection to voice fields, mute by head and the tail intercepts this step, retains efficient voice word Section part, in order to prevent the alignment during influence model training.
This step, can be in the following way when carrying out mute detection:
Step S10501 divides voice fields: voice fields being split according to fixed duration, by each cutting unit It is defined as a frame voice, N number of sampled point identical to every frame voice collecting quantity.
Fixation duration in this step can be 20ms, 30ms etc., and voice fields are fixed duration according to this and are split, Voice fields are divided into several frame voices.Even if due to speech of the same employee in recording process, for the same word Volume may be also different, therefore before carrying out this step segmentation voice fields, voice fields can also be normalized Processing: taking in every voice fields the maximum point of amplitude that its amplitude enlarges to the ratio for recording and widening close to 1, then by other All the points are stretched in this ratio.
Step S10502 calculates energy value: calculating the energy value of every frame voice, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, fkFor the peak value of k-th of sampled point, N is that the sampled point of a frame voice is total Number.
The energy value of one frame voice was not only related with sampled value size therein, but also had with sampled point quantity wherein included It closing, the peak value of and sampled value, that is, above-mentioned generally comprises positive value and negative value, and without the concern for positive and negative values when calculating energy value, because This this step defines the energy value of a frame voice using the quadratic sum of sampled value.
Step S10503 determines front and back breakpoint: if the energy value of continuous N frame voice is higher than preset threshold, by continuous N frame First frame voice in voice higher than preset value is defined as the preceding breakpoint of a segment of audio, if beginning lower than from the energy value of M+1 frame Preset threshold, and continue a preset duration, then M+1 frame voice is defined as to the rear breakpoint of a segment of audio, intercepts preceding breakpoint with after Audio between breakpoint is one section of efficient voice field.
If the energy value of a few frame voices is lower than preset threshold, and the energy of continuous N frame voice before one section of voice fields Value is above preset threshold, then the breakpoint before energy value is defined as just above the first frame voice of preset threshold.If continuous M frame The energy value of voice is higher, and a subsequent frame speech energy value becomes smaller, and continues a preset duration, it is believed that in energy Being worth reduced place is rear breakpoint.The audio intercepted between preceding breakpoint and rear breakpoint is protected as one section of efficient voice field It deposits.
The corresponding audio duration of this step continuous N frame is smaller, and breaking point detection sensitivity is higher.This step is due to being to record to step on A possibility that account text in land, the voice fields of recording are relatively simple, longer pause occur is smaller therefore quiet in order to improve The sensitivity of sound detection, M value can be set to smaller value, between when corresponding audio a length of 200ms-400ms.
Preferably mute energy value is 0 to this step, therefore the preset threshold in this step takes 0 in the ideal situation, still In the voice fields of acquisition, often there is the background sound of some strength, this background sound is also mute, it is clear that energy value is higher than 0, It therefore is not usually 0 when preset threshold is arranged.The preset threshold of this step can be a dynamic threshold: can be to every When voice fields carry out breaking point detection, the average energy value of acquisition voice fields starting duration, such as voice fields starting first The average energy value E0 of the 100 frame voice of the average energy value E0 or preceding of 100ms-1000ms, by energy value E0 plus a coefficient or Multiplied by the coefficient for being greater than 1, the preset threshold of this step is obtained.
The present embodiment is handled the voice fields of acquisition by format conversion and mute detection technique, is more had The voice fields of effect are used for training set, keep trained speech recognition modeling more reliable.
In one embodiment, in step S1, multiple voice fields is inputted into preset speech recognition modeling and are trained, Speech recognition modeling after being trained, comprising:
Step S111 constructs upgrade of network model: preset neural network model is configured to input layer, hidden layer, defeated Layer out, input layer and hidden layer use a variety of connection types, and every kind of connection type corresponds to a connection weight, and hidden layer is through inclined It postpones, parameter is carried out by activation primitive and passes to output layer, initializes connection weight and the biasing of neural network model.
Neural network model is a kind of system model for having imitated biological neuron, the input layer of this step, hidden layer and Output layer uses the characteristics of neural network model, as soon as the connection between input layer and hidden layer all includes a bonding strength, It is the weighted value for acting on the signal by the connection, i.e. connection weight.Output function between hidden layer and output layer is known as Activation primitive.By neural network model after training, extracted with information characteristics, knowledge is summarized and the energy of learning and memory Power.By the adjusting to connection weight and biasing, improve the training of neural network model, obtain effective connection weight and partially It sets.
Step S112 calculates output layer: multiple voice fields being defined as training set, input neural network model is instructed Practice, calculates output layer, output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of company of i-th of voice fields Meet weight, bjFor the biasing of hidden layer.
The output of neural network model is obtained final depending on activation primitive by above-mentioned calculation formula in this step Export result.
Step S113 obtains speech text: by output layer YjIt is decoded by preset decoder, obtains voice text This.
Decoder is one of core of speech recognition system, and task is the signal to input, according to acoustic pronunciation dictionary, Language model or corpus find the word string that the signal can be exported with maximum probability.
When being decoded using decoder, viterbi algorithm can be used:
All account texts are added in preset corpus;By output layer YjIt is parsed by preset pronunciation dictionary Any sentence is labeled as S, is made of in sentence S n word, sentence s is defined as S=(w by a variety of sentences1, w2..., wn), The then probability P (S) of sentence S are as follows:
P (S)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence S, C (wn-1) it is the number that (n-1)th word occurs in corpus, C (wn-1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;Probability P (S) is maximum Sentence S is as final speech text.
Step S114, compares: speech text being compared with corresponding account text, when error rate is greater than training threshold value When, connection weight and biasing are adjusted, training is re-started, trains threshold value until error rate is not more than, the voice after being trained Identification model.
This step is being compared, and when determining error rate, can use character error rate value weriCalculation method, obtain mistake Accidentally rate value.
The present embodiment, using aforesaid way training, is obtained by inputting neural network model to all voice fields To neural network model can preferably identify voice fields, finally obtain that error rate is low, speech text with high accuracy.
Step S2, speech recognition: receiving log on request, preset recording software acquisition voice fields is called, by voice word Speech recognition is carried out in section input speech recognition modeling, obtains the corresponding account text of voice fields.
This step is when acquiring voice fields, in the following way: the log on request of employee is received, shows log-in interface, Log-in interface is equipped with start button, submitting button, after sign on is recorded in triggering, calls preset recording software, starts to record Sound, until stopping recording after triggering recording halt instruction.After instruction is submitted in triggering, the voice fields of recording are acquired.This step In, employee presses start button, triggering recording sign on, and employee presses start button, triggering recording halt instruction, member again Work presses submitting button, and instruction is submitted in triggering.
Voice fields are inputted in speech recognition modeling and carry out speech recognition, obtained speech text is voice fields pair The account text answered.
Step S3 judges and converts: judging account text whether into Chinese text, if Chinese text, then by account text Chinese part is converted into phonetic in this, obtains English text, otherwise, account text is English text.
Since the form of expression of system account is English text form, when account text is Chinese text, need Chinese in Chinese text is partially converted to phonetic, to form English text.This step judge account text whether be When text, the initial character of account text is extracted, is determined according to the byte numerical value of initial character, if 2, is then judged as Chinese text This.When conversion, the character that the byte numerical value in account text is 2 is extracted one by one, phonetic plug-in unit is turned by preset Chinese, by account Chinese part is converted into phonetic in number text.
Step S4, error correction comparison: carrying out error correction comparison for English text, obtains target unlock account, is unlocked using target Account is logged in.
When acquisition and identification log in voice fields, in fact it could happen that the deviation of oral account, there are the account texts that speech recognition goes out This deviation, therefore be directed to English text and also carry out error correction comparison, final target unlock account is obtained, this target solution is passed through Lock account carries out subsequent system and logs in judgement.
In one embodiment, step S4, comprising:
Step S401 calculates character error rate: obtaining all system accounts in system database, calculates separately each system The character error rate value wer of account and English texti, character error rate value weriCalculation formula it is as follows:
Wherein, SiIndicate the alphabetical number replaced when being calculated with i-th of system account, D is indicated and i-th of system account The alphabetical number deleted when number being calculated, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiTable Show total number alphabetical in i-th of system account.
In this step, in order to make to keep one between the word sequence of the English text identified and the word sequence of system account It causes, needs to be replaced, deletes or be inserted into certain words.The total number of the word of these replacements, deletion or insertion, divided by system account The percentage of the total number of middle word is character error rate value.If English text and a certain system account are completely the same, character error Rate value is 0.
Step S402 obtains target unlock account: obtaining character error rate value weriIn minimum value, minimum value is corresponding System account is defined as target unlock account.
In all character error rate values, minimum value is found first, it is believed that the corresponding English text of minimum value is to connect the most Close target unlocks account, hits library in order to prevent, this step can also preset error thresholds, and minimum value and error thresholds are carried out Compare, if being not more than error thresholds, the corresponding system account of this minimum value is defined as target unlock account.Otherwise, it returns Log in miscue.
The present embodiment in numerous system accounts, is identified closest to system account by the way of calculating character error rate value Number English text, be defined as target unlock account and carry out subsequent system logging in judgement, improve the precision of identification.
Voice login method of the present embodiment based on artificial intelligence is carried out deep by the voice fields recorded to each employee Degree study, trains preferable neural network model, and when logging in for employee, the voice fields that employee inputs are identified as account Text, accuracy of identification is high, by way of calculating character error rate value, further determines that system account, employee is without keyboard Any system account and password are inputted, landing approach is more simple and convenient, using the voice fields of recording as password is logged in, as Modification logging, it is not easy to be pretended to be by criminal and log in, log in more secure and reliable.
In one embodiment it is proposed that a kind of voice logon device based on artificial intelligence, as shown in Fig. 2, including such as Lower module:
Training module, for obtaining preset account text, the corresponding voice fields of acquisition account text, by multiple voices Field inputs preset speech recognition modeling and is trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by voice word for receiving log on request Speech recognition is carried out in section input speech recognition modeling, obtains the corresponding account text of voice fields;
Conversion module, then will be in account text if Chinese text for judging whether account text is Chinese text Literary part is converted into phonetic, obtains English text, otherwise, account text is English text;
Error correction comparison module is obtained target unlock account, is unlocked using target for English text to be carried out error correction comparison Account is logged in.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, it is stored in memory Computer-readable instruction, when computer-readable instruction is executed by processor, so that reality when processor executes computer-readable instruction Step in the voice login method based on artificial intelligence of existing the various embodiments described above.
In one embodiment it is proposed that a kind of storage medium for being stored with computer-readable instruction, computer-readable finger When order is executed by one or more processors so that one or more processors execute the various embodiments described above based on artificial intelligence Voice login method in step.Wherein, storage medium can be non-volatile memory medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
Some exemplary embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but It cannot be construed as a limitation to the scope of the present invention.It should be pointed out that for the ordinary skill people of this field For member, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to of the invention Protection scope.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. a kind of voice login method based on artificial intelligence characterized by comprising
Preset account text is obtained, the corresponding voice fields of the account text are acquired, multiple voice fields are inputted Preset speech recognition modeling is trained, the speech recognition modeling after being trained;
Log on request is received, preset recording software acquisition voice fields is called, the voice fields is inputted into the voice and are known Speech recognition is carried out in other model, obtains the corresponding account text of the voice fields;
Judge whether the account text is Chinese text, if Chinese text, then turns Chinese part in the account text It changes phonetic into, obtains English text, otherwise, the account text is English text;
The English text is subjected to error correction comparison, target unlock account is obtained, is logged in using target unlock account.
2. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that the acquisition is preset Account text acquires the corresponding voice fields of the account text, comprising:
Recording push table is obtained from system database, and the system account and correspondence of all employees are equipped in the recording push table Account text, read the account text one by one;
The account text of reading is shown by acquisition interface to the corresponding employee of the system account one by one, is passed through The acquisition interface receives the record command that the employee sends, and calls preset recording software, is adopted by the recording software Collect the voice fields that the employee records, the account text of the voice fields and displaying is associated and is saved;
The recording push table is traversed, corresponding at least one voice fields of each account text are obtained.
3. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that the acquisition is preset Account text, after acquiring the corresponding voice fields of the account text, comprising:
The format of all voice fields of acquisition is converted into identical audio format;
The voice fields after transducing audio format are subjected to mute detection, intercept efficient voice field.
4. the voice login method according to claim 3 based on artificial intelligence, which is characterized in that described by transducing audio The voice fields after format carry out mute detection, intercept efficient voice field, comprising:
The voice fields are split according to fixed duration, each cutting unit are defined as a frame voice, to every frame language The identical N number of sampled point of sound collecting quantity;
The energy value of every frame voice is calculated, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, fkFor the peak value of k-th of sampled point, N is the sampled point total number of a frame voice;
If the energy value of continuous N frame voice is higher than preset threshold, the first frame voice of preset value will be higher than in continuous N frame voice It is defined as the preceding breakpoint of a segment of audio, if beginning lower than preset threshold from the energy value of M+1 frame, and continues a preset duration, then M+1 frame voice is defined as to the rear breakpoint of a segment of audio, the audio intercepted between the preceding breakpoint and the rear breakpoint is one section Efficient voice field.
5. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that it is described will be multiple described Voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained, comprising:
Preset neural network model is configured to input layer, hidden layer, output layer, the input layer and the hidden layer use A variety of connection types, every kind of connection type correspond to a connection weight, the hidden layer after biasing, by activation primitive into Row parameter passes to the output layer, initializes connection weight and the biasing of the neural network model;
Multiple voice fields are defined as training set, the neural network model is inputted and is trained, calculate output layer, institute State output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of connection weight of i-th of voice fields Weight, bjFor the biasing of hidden layer;
By the output layer YjIt is decoded by preset decoder, obtains speech text;
The speech text is compared with corresponding account text, when error rate is greater than training threshold value, adjusts the company Weight and biasing are connect, training is re-started, until speech recognition mould of the error rate no more than the trained threshold value, after being trained Type.
6. the voice login method according to claim 5 based on artificial intelligence, which is characterized in that described by the output Layer YjIt is decoded by preset decoder, obtains speech text, comprising:
All account texts are added in preset corpus;
By the output layer YjA variety of sentences are parsed by preset pronunciation dictionary, any sentence is labeled as S, is had in sentence S N word composition, is defined as S=(w for sentence S1,w2,…,wn), then the probability P (S) of sentence S are as follows:
P (S)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence S, C (wn-1) it is the number that (n-1)th word occurs in corpus, C (wn- 1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;
Using the maximum sentence S of probability P (S) as final speech text.
7. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that described by the English Text carries out error correction comparison, obtains target unlock account, comprising:
All system accounts in system database are obtained, the word of each system account and the English text is calculated separately Error rate values weri, character error rate value weriCalculation formula it is as follows:
Wherein, SiThe alphabetical number replaced when indicating to be calculated with i-th of system account, D indicate with i-th of system account into The alphabetical number that row is deleted when calculating, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiIndicate i-th Alphabetical total number in a system account;
Obtain character error rate value weriIn minimum value, by the corresponding system account of minimum value be defined as target unlock account.
8. a kind of voice logon device based on artificial intelligence characterized by comprising
Training module acquires the corresponding voice fields of the account text for obtaining preset account text, will be multiple described Voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by the voice word for receiving log on request Section inputs in the speech recognition modeling and carries out speech recognition, obtains the corresponding account text of the voice fields;
Conversion module, for judging whether the account text is Chinese text, if Chinese text, then by the account text Chinese part is converted into phonetic, obtains English text, otherwise, the account text is English text;
Error correction comparison module obtains target unlock account, using the target for the English text to be carried out error correction comparison Unlock account is logged in.
9. a kind of computer equipment, which is characterized in that including memory and processor, being stored with computer in the memory can Reading instruction, when the computer-readable instruction is executed by the processor, so that the processor executes such as claim 1 to 7 Any one of voice login method described in claim based on artificial intelligence the step of.
10. a kind of storage medium for being stored with computer-readable instruction, which is characterized in that the computer-readable instruction is by one Or multiple processors are when executing, so that one or more processors are executed as described in any one of claims 1 to 7 claim The step of voice login method based on artificial intelligence.
CN201910424460.XA 2019-05-21 2019-05-21 Voice login method, device, equipment and storage medium based on artificial intelligence Pending CN110232917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910424460.XA CN110232917A (en) 2019-05-21 2019-05-21 Voice login method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910424460.XA CN110232917A (en) 2019-05-21 2019-05-21 Voice login method, device, equipment and storage medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN110232917A true CN110232917A (en) 2019-09-13

Family

ID=67861422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910424460.XA Pending CN110232917A (en) 2019-05-21 2019-05-21 Voice login method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN110232917A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995938A (en) * 2019-12-13 2020-04-10 上海优扬新媒信息技术有限公司 Data processing method and device
CN111508498A (en) * 2020-04-09 2020-08-07 携程计算机技术(上海)有限公司 Conversational speech recognition method, system, electronic device and storage medium
CN112765335A (en) * 2021-01-27 2021-05-07 上海三菱电梯有限公司 Voice calling landing system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178004A1 (en) * 2001-05-23 2002-11-28 Chienchung Chang Method and apparatus for voice recognition
CN102984152A (en) * 2012-11-27 2013-03-20 江苏乐买到网络科技有限公司 Password authentication method based on online shopping
CN103986826A (en) * 2014-05-12 2014-08-13 深圳市威富多媒体有限公司 Mobile terminal encrypting and decrypting method and device based on voice recognition
CN107395352A (en) * 2016-05-16 2017-11-24 腾讯科技(深圳)有限公司 Personal identification method and device based on vocal print
CN107731228A (en) * 2017-09-20 2018-02-23 百度在线网络技术(北京)有限公司 The text conversion method and device of English voice messaging
CN108632137A (en) * 2018-03-26 2018-10-09 平安科技(深圳)有限公司 Answer model training method, intelligent chat method, device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178004A1 (en) * 2001-05-23 2002-11-28 Chienchung Chang Method and apparatus for voice recognition
CN102984152A (en) * 2012-11-27 2013-03-20 江苏乐买到网络科技有限公司 Password authentication method based on online shopping
CN103986826A (en) * 2014-05-12 2014-08-13 深圳市威富多媒体有限公司 Mobile terminal encrypting and decrypting method and device based on voice recognition
CN107395352A (en) * 2016-05-16 2017-11-24 腾讯科技(深圳)有限公司 Personal identification method and device based on vocal print
CN107731228A (en) * 2017-09-20 2018-02-23 百度在线网络技术(北京)有限公司 The text conversion method and device of English voice messaging
CN108632137A (en) * 2018-03-26 2018-10-09 平安科技(深圳)有限公司 Answer model training method, intelligent chat method, device, equipment and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995938A (en) * 2019-12-13 2020-04-10 上海优扬新媒信息技术有限公司 Data processing method and device
CN110995938B (en) * 2019-12-13 2022-04-26 度小满科技(北京)有限公司 Data processing method and device
CN111508498A (en) * 2020-04-09 2020-08-07 携程计算机技术(上海)有限公司 Conversational speech recognition method, system, electronic device and storage medium
CN111508498B (en) * 2020-04-09 2024-01-30 携程计算机技术(上海)有限公司 Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium
CN112765335A (en) * 2021-01-27 2021-05-07 上海三菱电梯有限公司 Voice calling landing system
CN112765335B (en) * 2021-01-27 2024-03-08 上海三菱电梯有限公司 Voice call system

Similar Documents

Publication Publication Date Title
US9672825B2 (en) Speech analytics system and methodology with accurate statistics
US7689418B2 (en) Method and system for non-intrusive speaker verification using behavior models
WO2021073116A1 (en) Method and apparatus for generating legal document, device and storage medium
CN107305541A (en) Speech recognition text segmentation method and device
CN110232917A (en) Voice login method, device, equipment and storage medium based on artificial intelligence
CN101923855A (en) Test-irrelevant voice print identifying system
Levitan et al. Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
CN113628627B (en) Electric power industry customer service quality inspection system based on structured voice analysis
CN108877769B (en) Method and device for identifying dialect type
US20180308501A1 (en) Multi speaker attribution using personal grammar detection
Schultz et al. The ISL meeting room system
CN111523317B (en) Voice quality inspection method and device, electronic equipment and medium
CN110246509A (en) A kind of stack denoising self-encoding encoder and deep neural network structure for voice lie detection
CN109920447A (en) Recording fraud detection method based on sef-adapting filter Amplitude & Phase feature extraction
JP4143541B2 (en) Method and system for non-intrusive verification of speakers using behavior models
Wildermoth et al. GMM based speaker recognition on readily available databases
KR102407055B1 (en) Apparatus and method for measuring dialogue quality index through natural language processing after speech recognition
US20050043957A1 (en) Selective sampling for sound signal classification
Wray et al. Best practices for crowdsourcing dialectal arabic speech transcription
CN113051923B (en) Data verification method and device, computer equipment and storage medium
CA2621952A1 (en) System for excluding unwanted data from a voice recording
Tumminia et al. Diarization of legal proceedings. Identifying and transcribing judicial speech from recorded court audio
Manikandan et al. Speaker identification using a novel prosody with fuzzy based hierarchical decision tree approach
Navrátil et al. An instantiable speech biometrics module with natural language interface: Implementation in the telephony environment
Markowitz The many roles of speaker classification in speaker verification and identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination