CN110232917A - Voice login method, device, equipment and storage medium based on artificial intelligence - Google Patents
Voice login method, device, equipment and storage medium based on artificial intelligence Download PDFInfo
- Publication number
- CN110232917A CN110232917A CN201910424460.XA CN201910424460A CN110232917A CN 110232917 A CN110232917 A CN 110232917A CN 201910424460 A CN201910424460 A CN 201910424460A CN 110232917 A CN110232917 A CN 110232917A
- Authority
- CN
- China
- Prior art keywords
- account
- text
- voice
- preset
- fields
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 21
- 238000003062 neural network model Methods 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 7
- 230000002463 transducing effect Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000013461 design Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0815—Network architectures or network communication protocols for network security for authentication of entities providing single-sign-on or federations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to field of artificial intelligence more particularly to a kind of voice login method, device, equipment and storage mediums based on artificial intelligence.This method comprises: the corresponding voice fields of acquisition account text, multiple voice fields input speech recognition modeling is trained, speech recognition modeling is obtained;Log on request is received, voice fields are acquired, voice fields are inputted in speech recognition modeling, account text is obtained;Judge whether account text is Chinese text, if Chinese text, then Chinese part in account text is converted into phonetic, obtains English text;English text is subjected to error correction comparison, target unlock account is obtained, is logged in using target unlock account.The present invention is abandoned by way of inputting conventional keyboard, by the way of oral account, is logged in more simple and convenient.It is used as using speech recognition technology and logs in password, it is not easy to pretended to be by criminal and logs in, more secure and reliable.
Description
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of voice login methods based on artificial intelligence, dress
It sets, equipment and storage medium.
Background technique
With the development of science and technology more and more enterprises realize electronic office by enterprise's inline system.Usual feelings
Under condition, before enterprise staff is handled official business by enterprise's inline system, need first to log in the account of oneself, employee's account is to ensure employee just
The important outpost for being often used enterprise-wide computer, logging in Intranet etc., and because of the difference of department, employee also needs to belong to difference
Active Directory organization unit in manage, it is also desirable to possess different department privilege.Enterprise staff only correctly enters
After employee's account logs in, just the work such as mail transmission/reception, online editing can be carried out according to department privilege.
But existing enterprise's inline system, it is general that the channel of keyboard input employee's account is only provided, by obtaining employee
Account carries out the unlock of employee's account, completes to log in.Some Enterprises inline system also needs to input ID card No. unlock, realizes
It logs in, since identification card number code bit number is more, when input is easy to happen mistake, and logging in work to employee makes troubles.
Summary of the invention
In view of this, it is necessary to log in inconvenient problem for enterprise's inline system, provide a kind of based on artificial intelligence
Voice login method, device, equipment and storage medium.
A kind of voice login method based on artificial intelligence, comprising:
Preset account text is obtained, the corresponding voice fields of the account text are acquired, by multiple voice fields
It inputs preset speech recognition modeling to be trained, the speech recognition modeling after being trained;
Log on request is received, preset recording software acquisition voice fields is called, the voice fields is inputted into institute's predicate
Speech recognition is carried out in sound identification model, obtains the corresponding account text of the voice fields;
Judge whether the account text is Chinese text, if Chinese text, then by Chinese portion in the account text
Divide and be converted into phonetic, obtains English text, otherwise, the account text is English text;
The English text is subjected to error correction comparison, obtains target unlock account, is carried out using target unlock account
It logs in.
It is described to obtain preset account text in a kind of possible design, acquire the corresponding voice word of the account text
Section, comprising:
From system database obtain recording push table, it is described recording push table in equipped with all employees system account and
Corresponding account text reads the account text one by one;
The account text of reading is shown by acquisition interface to the corresponding employee of the system account one by one,
The record command that the employee sends is received by the acquisition interface, calls preset recording software, it is soft by the recording
Part acquires the voice fields that the employee records, and the account text of the voice fields and displaying is associated and is protected
It deposits;
The recording push table is traversed, corresponding at least one voice fields of each account text are obtained.
It is described to obtain preset account text in a kind of possible design, acquire the corresponding voice word of the account text
Duan Hou, comprising:
The format of all voice fields of acquisition is converted into identical audio format;
The voice fields after transducing audio format are subjected to mute detection, intercept efficient voice field.
In a kind of possible design, the voice fields after the format by transducing audio carry out mute detection, interception
Efficient voice field, comprising:
The voice fields are split according to fixed duration, each cutting unit are defined as a frame voice, to every
The identical N number of sampled point of frame voice collecting quantity;
The energy value of every frame voice is calculated, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, and fk is the peak value of k-th of sampled point, and N is that the sampled point of a frame voice is total
Number;
If the energy value of continuous N frame voice is higher than preset threshold, the first frame of preset value will be higher than in continuous N frame voice
Voice is defined as the preceding breakpoint of a segment of audio, if begin lower than preset threshold from the energy value of M+1 frame, and continue one it is default when
It is long, then M+1 frame voice is defined as to the rear breakpoint of a segment of audio, the audio intercepted between the preceding breakpoint and the rear breakpoint is
One section of efficient voice field.
It is described multiple voice fields are inputted into preset speech recognition modeling to instruct in a kind of possible design
Practice, the speech recognition modeling after being trained, comprising:
Preset neural network model is configured to input layer, hidden layer, output layer, the input layer and the hidden layer
Using a variety of connection types, every kind of connection type corresponds to a connection weight, and the hidden layer is after biasing, by activating letter
Number carries out parameter and passes to the output layer, initializes connection weight and the biasing of the neural network model;
Multiple voice fields are defined as training set, the neural network model is inputted and is trained, calculate output
Layer, the output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of company of i-th of voice fields
Meet weight, bjFor the biasing of hidden layer;
By the output layer YjIt is decoded by preset decoder, obtains speech text;
The speech text is compared with corresponding account text, when error rate is greater than training threshold value, adjusts institute
Connection weight and biasing are stated, training is re-started, until error rate is not more than the trained threshold value, the voice after being trained is known
Other model.
It is described by the output layer Y in a kind of possible designjIt is decoded by preset decoder, obtains voice
Text, comprising:
All account texts are added in preset corpus;
By the output layer YjA variety of sentences are parsed by preset pronunciation dictionary, any sentence is labeled as D, sentence
It is made of in D n word, sentence D is defined as D=(w1, w2..., wn), then the probability P (D) of sentence D are as follows:
P (D)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence D, C (wn-1) it is the number that (n-1)th word occurs in corpus, C
(wn-1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;
Using the maximum sentence D of probability P (D) as final speech text.
It is described that the English text is subjected to error correction comparison in a kind of possible design, obtain target unlock account, packet
It includes:
All system accounts in system database are obtained, each system account and the English text are calculated separately
Character error rate value weri, character error rate value weriCalculation formula it is as follows:
Wherein, SiIndicate the alphabetical number replaced when being calculated with i-th of system account, D is indicated and i-th of system account
The alphabetical number deleted when number being calculated, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiTable
Show total number alphabetical in i-th of system account;
Obtain character error rate value weriIn minimum value, by the corresponding system account of minimum value be defined as target unlock account
Number.
A kind of voice logon device based on artificial intelligence, comprising:
Training module acquires the corresponding voice fields of the account text for obtaining preset account text, will be multiple
The voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by institute's predicate for receiving log on request
Sound field inputs in the speech recognition modeling and carries out speech recognition, obtains the corresponding account text of the voice fields;
Conversion module, for judging whether the account text is Chinese text, if Chinese text, then by the account
Chinese part is converted into phonetic in text, obtains English text, otherwise, the account text is English text;
Error correction comparison module obtains target unlock account, using described for the English text to be carried out error correction comparison
Target unlock account is logged in.
A kind of computer equipment, including memory and processor are stored with computer-readable instruction in the memory, institute
When stating computer-readable instruction and being executed by the processor, so that the processor executes the above-mentioned voice based on artificial intelligence and steps on
The step of Lu Fangfa.
A kind of storage medium being stored with computer-readable instruction, the computer-readable instruction are handled by one or more
When device executes, so that the step of one or more processors execute the above-mentioned voice login method based on artificial intelligence.
Above-mentioned voice login method, device, equipment and storage medium based on artificial intelligence, including obtain preset account
Text acquires the corresponding voice fields of the account text, and multiple voice fields are inputted preset speech recognition modeling
It is trained, the speech recognition modeling after being trained;Log on request is received, preset recording software acquisition voice word is called
The voice fields are inputted in the speech recognition modeling and carry out speech recognition, obtain the corresponding account of the voice fields by section
Number text;Judge whether the account text is Chinese text, if Chinese text, then by part Chinese in the account text
It is converted into phonetic, obtains English text, otherwise, the account text is English text;The English text is subjected to error correction
It compares, obtains target unlock account, logged in using target unlock account.The present invention is by key entry system account
Mode replaces with the voice fields of speech recognition enterprise staff oral account, is abandoned by way of inputting conventional keyboard, using mouth
The mode stated logs in more simple and convenient.It is used as using speech recognition technology and logs in password, it is not easy to pretended to be by criminal and step on
Land, more secure and reliable.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.
Fig. 1 is the flow chart of the voice login method based on artificial intelligence in one embodiment of the invention;
Fig. 2 is the structure chart of the voice logon device based on artificial intelligence in one embodiment of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.
Fig. 1 is the flow chart of the voice login method based on artificial intelligence in one embodiment of the invention, such as Fig. 1 institute
Show, a kind of voice login method based on artificial intelligence, comprising the following steps:
Training speech recognition modeling: step S1 obtains preset account text, the corresponding voice word of acquisition account text
Multiple voice fields are inputted preset speech recognition modeling and are trained by section, the speech recognition modeling after being trained.
It generally all include the system account and password of the system that is registered in, system in the system database of enterprise's inline system
Account is usually Chinese phonetic alphabet tailing portion number, such as ZHANGSAN123, and account text includes Chinese text and English text,
Therefore there are two types of the corresponding voice fields of account text, one kind is Chinese speech field, and such as " Zhang San 123 ", another kind is English
Voice fields, such as " ZHANGSAN123 ".After being acquired respectively to two kinds of voice fields, instructed by speech recognition modeling
Practice, the speech recognition modeling after being trained.
In one embodiment, in step S1, preset account text, the corresponding voice word of acquisition account text are obtained
Section, comprising:
Step S101 reads account text: obtaining recording push table from system database, is equipped with institute in recording push table
There are the system account and corresponding account text of employee, reads account text one by one.
The recording push table of this step can be the account registration table of storage system account and password, in recording push table
It is stored with all system accounts and corresponding password and account text.Account text can be Chinese text, such as " Zhang San 123 "
Format, account text can also be with English texts, such as format of " ZHANGSAN123 ".
Step S102 acquires voice fields: one by one that the account text of reading is corresponding to system account by acquisition interface
Employee be shown, by acquisition interface receive employee send record command, call preset recording software, pass through recording
The voice fields that software collection employee records, the account text of voice fields and displaying is associated and is saved.
This step acquires voice fields when acquiring voice fields, to the registered employee in enterprise's inline system.?
When acquisition, corresponding employee is pushed to by preset acquisition interface, acquisition interface includes start button, submitting button, acquires boundary
Face also shows that the account text that needs acquire.Employee understands the particular content for needing to record according to account text.When employee triggers
After start button, record command is received, preset recording software is called, starts to record, until the stopping in triggering recording software
Instruction stops recording.In this step, employee presses one-shot button, triggering recording sign on, press again starting by
Button triggers halt instruction.Employee presses submitting button, and instruction is submitted in triggering, the voice fields that recording software is recorded and displaying
Account text saved together.
Step S103 traverses table: traversal recording push table obtains the corresponding at least one voice word of each account text
Section.
This step can read the Entered state of enterprise's inline system before account text is pushed to employee, will locate
It is less than preset push threshold value in online and push times, acquisition interface is pushed automatically, after the completion of push, to record
The corresponding push times of this system account add one in sound push table.Therefore recording push table can be traversed, push times are not reached
Account text to push threshold value continues to push, and until obtaining a plurality of voice fields, above-mentioned design can greatly increase instruction
Experienced sample.
The present embodiment obtains the voice fields that each employee records by way of pushing acquisition interface, as training sample
Originally, really and little with the subsequent voice fields difference for logging in acquisition due to the recording of employee, using voice fields as instruction
It is more reliable accurate to practice the speech recognition modeling that sample training goes out.
In one embodiment, in step S1, preset account text, the corresponding voice fields of acquisition account text are obtained
Afterwards, comprising:
Transducing audio format: the format of all voice fields of acquisition is converted into identical audio by step S104
Format.
This step is also handled voice fields after collecting the corresponding a plurality of voice fields of account text.Due to
In acquisition, the recording arrangement that employee uses may be different, therefore the audio format recorded also can be variant, in order to reduce training
When error, all voice fields acquired are carried out unified format by preset switching software by this step, such as are converted
At unified MP3 format, WMA format or WAV format etc..
Mute detection: voice fields after transducing audio format are carried out mute detection, intercept efficient voice by step S105
Field.
By carrying out mute detection to voice fields, mute by head and the tail intercepts this step, retains efficient voice word
Section part, in order to prevent the alignment during influence model training.
This step, can be in the following way when carrying out mute detection:
Step S10501 divides voice fields: voice fields being split according to fixed duration, by each cutting unit
It is defined as a frame voice, N number of sampled point identical to every frame voice collecting quantity.
Fixation duration in this step can be 20ms, 30ms etc., and voice fields are fixed duration according to this and are split,
Voice fields are divided into several frame voices.Even if due to speech of the same employee in recording process, for the same word
Volume may be also different, therefore before carrying out this step segmentation voice fields, voice fields can also be normalized
Processing: taking in every voice fields the maximum point of amplitude that its amplitude enlarges to the ratio for recording and widening close to 1, then by other
All the points are stretched in this ratio.
Step S10502 calculates energy value: calculating the energy value of every frame voice, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, fkFor the peak value of k-th of sampled point, N is that the sampled point of a frame voice is total
Number.
The energy value of one frame voice was not only related with sampled value size therein, but also had with sampled point quantity wherein included
It closing, the peak value of and sampled value, that is, above-mentioned generally comprises positive value and negative value, and without the concern for positive and negative values when calculating energy value, because
This this step defines the energy value of a frame voice using the quadratic sum of sampled value.
Step S10503 determines front and back breakpoint: if the energy value of continuous N frame voice is higher than preset threshold, by continuous N frame
First frame voice in voice higher than preset value is defined as the preceding breakpoint of a segment of audio, if beginning lower than from the energy value of M+1 frame
Preset threshold, and continue a preset duration, then M+1 frame voice is defined as to the rear breakpoint of a segment of audio, intercepts preceding breakpoint with after
Audio between breakpoint is one section of efficient voice field.
If the energy value of a few frame voices is lower than preset threshold, and the energy of continuous N frame voice before one section of voice fields
Value is above preset threshold, then the breakpoint before energy value is defined as just above the first frame voice of preset threshold.If continuous M frame
The energy value of voice is higher, and a subsequent frame speech energy value becomes smaller, and continues a preset duration, it is believed that in energy
Being worth reduced place is rear breakpoint.The audio intercepted between preceding breakpoint and rear breakpoint is protected as one section of efficient voice field
It deposits.
The corresponding audio duration of this step continuous N frame is smaller, and breaking point detection sensitivity is higher.This step is due to being to record to step on
A possibility that account text in land, the voice fields of recording are relatively simple, longer pause occur is smaller therefore quiet in order to improve
The sensitivity of sound detection, M value can be set to smaller value, between when corresponding audio a length of 200ms-400ms.
Preferably mute energy value is 0 to this step, therefore the preset threshold in this step takes 0 in the ideal situation, still
In the voice fields of acquisition, often there is the background sound of some strength, this background sound is also mute, it is clear that energy value is higher than 0,
It therefore is not usually 0 when preset threshold is arranged.The preset threshold of this step can be a dynamic threshold: can be to every
When voice fields carry out breaking point detection, the average energy value of acquisition voice fields starting duration, such as voice fields starting first
The average energy value E0 of the 100 frame voice of the average energy value E0 or preceding of 100ms-1000ms, by energy value E0 plus a coefficient or
Multiplied by the coefficient for being greater than 1, the preset threshold of this step is obtained.
The present embodiment is handled the voice fields of acquisition by format conversion and mute detection technique, is more had
The voice fields of effect are used for training set, keep trained speech recognition modeling more reliable.
In one embodiment, in step S1, multiple voice fields is inputted into preset speech recognition modeling and are trained,
Speech recognition modeling after being trained, comprising:
Step S111 constructs upgrade of network model: preset neural network model is configured to input layer, hidden layer, defeated
Layer out, input layer and hidden layer use a variety of connection types, and every kind of connection type corresponds to a connection weight, and hidden layer is through inclined
It postpones, parameter is carried out by activation primitive and passes to output layer, initializes connection weight and the biasing of neural network model.
Neural network model is a kind of system model for having imitated biological neuron, the input layer of this step, hidden layer and
Output layer uses the characteristics of neural network model, as soon as the connection between input layer and hidden layer all includes a bonding strength,
It is the weighted value for acting on the signal by the connection, i.e. connection weight.Output function between hidden layer and output layer is known as
Activation primitive.By neural network model after training, extracted with information characteristics, knowledge is summarized and the energy of learning and memory
Power.By the adjusting to connection weight and biasing, improve the training of neural network model, obtain effective connection weight and partially
It sets.
Step S112 calculates output layer: multiple voice fields being defined as training set, input neural network model is instructed
Practice, calculates output layer, output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of company of i-th of voice fields
Meet weight, bjFor the biasing of hidden layer.
The output of neural network model is obtained final depending on activation primitive by above-mentioned calculation formula in this step
Export result.
Step S113 obtains speech text: by output layer YjIt is decoded by preset decoder, obtains voice text
This.
Decoder is one of core of speech recognition system, and task is the signal to input, according to acoustic pronunciation dictionary,
Language model or corpus find the word string that the signal can be exported with maximum probability.
When being decoded using decoder, viterbi algorithm can be used:
All account texts are added in preset corpus;By output layer YjIt is parsed by preset pronunciation dictionary
Any sentence is labeled as S, is made of in sentence S n word, sentence s is defined as S=(w by a variety of sentences1, w2..., wn),
The then probability P (S) of sentence S are as follows:
P (S)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence S, C (wn-1) it is the number that (n-1)th word occurs in corpus, C
(wn-1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;Probability P (S) is maximum
Sentence S is as final speech text.
Step S114, compares: speech text being compared with corresponding account text, when error rate is greater than training threshold value
When, connection weight and biasing are adjusted, training is re-started, trains threshold value until error rate is not more than, the voice after being trained
Identification model.
This step is being compared, and when determining error rate, can use character error rate value weriCalculation method, obtain mistake
Accidentally rate value.
The present embodiment, using aforesaid way training, is obtained by inputting neural network model to all voice fields
To neural network model can preferably identify voice fields, finally obtain that error rate is low, speech text with high accuracy.
Step S2, speech recognition: receiving log on request, preset recording software acquisition voice fields is called, by voice word
Speech recognition is carried out in section input speech recognition modeling, obtains the corresponding account text of voice fields.
This step is when acquiring voice fields, in the following way: the log on request of employee is received, shows log-in interface,
Log-in interface is equipped with start button, submitting button, after sign on is recorded in triggering, calls preset recording software, starts to record
Sound, until stopping recording after triggering recording halt instruction.After instruction is submitted in triggering, the voice fields of recording are acquired.This step
In, employee presses start button, triggering recording sign on, and employee presses start button, triggering recording halt instruction, member again
Work presses submitting button, and instruction is submitted in triggering.
Voice fields are inputted in speech recognition modeling and carry out speech recognition, obtained speech text is voice fields pair
The account text answered.
Step S3 judges and converts: judging account text whether into Chinese text, if Chinese text, then by account text
Chinese part is converted into phonetic in this, obtains English text, otherwise, account text is English text.
Since the form of expression of system account is English text form, when account text is Chinese text, need
Chinese in Chinese text is partially converted to phonetic, to form English text.This step judge account text whether be
When text, the initial character of account text is extracted, is determined according to the byte numerical value of initial character, if 2, is then judged as Chinese text
This.When conversion, the character that the byte numerical value in account text is 2 is extracted one by one, phonetic plug-in unit is turned by preset Chinese, by account
Chinese part is converted into phonetic in number text.
Step S4, error correction comparison: carrying out error correction comparison for English text, obtains target unlock account, is unlocked using target
Account is logged in.
When acquisition and identification log in voice fields, in fact it could happen that the deviation of oral account, there are the account texts that speech recognition goes out
This deviation, therefore be directed to English text and also carry out error correction comparison, final target unlock account is obtained, this target solution is passed through
Lock account carries out subsequent system and logs in judgement.
In one embodiment, step S4, comprising:
Step S401 calculates character error rate: obtaining all system accounts in system database, calculates separately each system
The character error rate value wer of account and English texti, character error rate value weriCalculation formula it is as follows:
Wherein, SiIndicate the alphabetical number replaced when being calculated with i-th of system account, D is indicated and i-th of system account
The alphabetical number deleted when number being calculated, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiTable
Show total number alphabetical in i-th of system account.
In this step, in order to make to keep one between the word sequence of the English text identified and the word sequence of system account
It causes, needs to be replaced, deletes or be inserted into certain words.The total number of the word of these replacements, deletion or insertion, divided by system account
The percentage of the total number of middle word is character error rate value.If English text and a certain system account are completely the same, character error
Rate value is 0.
Step S402 obtains target unlock account: obtaining character error rate value weriIn minimum value, minimum value is corresponding
System account is defined as target unlock account.
In all character error rate values, minimum value is found first, it is believed that the corresponding English text of minimum value is to connect the most
Close target unlocks account, hits library in order to prevent, this step can also preset error thresholds, and minimum value and error thresholds are carried out
Compare, if being not more than error thresholds, the corresponding system account of this minimum value is defined as target unlock account.Otherwise, it returns
Log in miscue.
The present embodiment in numerous system accounts, is identified closest to system account by the way of calculating character error rate value
Number English text, be defined as target unlock account and carry out subsequent system logging in judgement, improve the precision of identification.
Voice login method of the present embodiment based on artificial intelligence is carried out deep by the voice fields recorded to each employee
Degree study, trains preferable neural network model, and when logging in for employee, the voice fields that employee inputs are identified as account
Text, accuracy of identification is high, by way of calculating character error rate value, further determines that system account, employee is without keyboard
Any system account and password are inputted, landing approach is more simple and convenient, using the voice fields of recording as password is logged in, as
Modification logging, it is not easy to be pretended to be by criminal and log in, log in more secure and reliable.
In one embodiment it is proposed that a kind of voice logon device based on artificial intelligence, as shown in Fig. 2, including such as
Lower module:
Training module, for obtaining preset account text, the corresponding voice fields of acquisition account text, by multiple voices
Field inputs preset speech recognition modeling and is trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by voice word for receiving log on request
Speech recognition is carried out in section input speech recognition modeling, obtains the corresponding account text of voice fields;
Conversion module, then will be in account text if Chinese text for judging whether account text is Chinese text
Literary part is converted into phonetic, obtains English text, otherwise, account text is English text;
Error correction comparison module is obtained target unlock account, is unlocked using target for English text to be carried out error correction comparison
Account is logged in.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, it is stored in memory
Computer-readable instruction, when computer-readable instruction is executed by processor, so that reality when processor executes computer-readable instruction
Step in the voice login method based on artificial intelligence of existing the various embodiments described above.
In one embodiment it is proposed that a kind of storage medium for being stored with computer-readable instruction, computer-readable finger
When order is executed by one or more processors so that one or more processors execute the various embodiments described above based on artificial intelligence
Voice login method in step.Wherein, storage medium can be non-volatile memory medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random
Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
Some exemplary embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but
It cannot be construed as a limitation to the scope of the present invention.It should be pointed out that for the ordinary skill people of this field
For member, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to of the invention
Protection scope.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (10)
1. a kind of voice login method based on artificial intelligence characterized by comprising
Preset account text is obtained, the corresponding voice fields of the account text are acquired, multiple voice fields are inputted
Preset speech recognition modeling is trained, the speech recognition modeling after being trained;
Log on request is received, preset recording software acquisition voice fields is called, the voice fields is inputted into the voice and are known
Speech recognition is carried out in other model, obtains the corresponding account text of the voice fields;
Judge whether the account text is Chinese text, if Chinese text, then turns Chinese part in the account text
It changes phonetic into, obtains English text, otherwise, the account text is English text;
The English text is subjected to error correction comparison, target unlock account is obtained, is logged in using target unlock account.
2. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that the acquisition is preset
Account text acquires the corresponding voice fields of the account text, comprising:
Recording push table is obtained from system database, and the system account and correspondence of all employees are equipped in the recording push table
Account text, read the account text one by one;
The account text of reading is shown by acquisition interface to the corresponding employee of the system account one by one, is passed through
The acquisition interface receives the record command that the employee sends, and calls preset recording software, is adopted by the recording software
Collect the voice fields that the employee records, the account text of the voice fields and displaying is associated and is saved;
The recording push table is traversed, corresponding at least one voice fields of each account text are obtained.
3. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that the acquisition is preset
Account text, after acquiring the corresponding voice fields of the account text, comprising:
The format of all voice fields of acquisition is converted into identical audio format;
The voice fields after transducing audio format are subjected to mute detection, intercept efficient voice field.
4. the voice login method according to claim 3 based on artificial intelligence, which is characterized in that described by transducing audio
The voice fields after format carry out mute detection, intercept efficient voice field, comprising:
The voice fields are split according to fixed duration, each cutting unit are defined as a frame voice, to every frame language
The identical N number of sampled point of sound collecting quantity;
The energy value of every frame voice is calculated, the calculation formula of the energy value is as follows:
Wherein, E is the energy value of a frame voice, fkFor the peak value of k-th of sampled point, N is the sampled point total number of a frame voice;
If the energy value of continuous N frame voice is higher than preset threshold, the first frame voice of preset value will be higher than in continuous N frame voice
It is defined as the preceding breakpoint of a segment of audio, if beginning lower than preset threshold from the energy value of M+1 frame, and continues a preset duration, then
M+1 frame voice is defined as to the rear breakpoint of a segment of audio, the audio intercepted between the preceding breakpoint and the rear breakpoint is one section
Efficient voice field.
5. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that it is described will be multiple described
Voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained, comprising:
Preset neural network model is configured to input layer, hidden layer, output layer, the input layer and the hidden layer use
A variety of connection types, every kind of connection type correspond to a connection weight, the hidden layer after biasing, by activation primitive into
Row parameter passes to the output layer, initializes connection weight and the biasing of the neural network model;
Multiple voice fields are defined as training set, the neural network model is inputted and is trained, calculate output layer, institute
State output layer YjCalculation formula it is as follows:
Yj=f ((∑ XiWij)+bj)
Wherein, f () is activation primitive, XiFor i-th of voice fields of input, WijFor j-th of connection weight of i-th of voice fields
Weight, bjFor the biasing of hidden layer;
By the output layer YjIt is decoded by preset decoder, obtains speech text;
The speech text is compared with corresponding account text, when error rate is greater than training threshold value, adjusts the company
Weight and biasing are connect, training is re-started, until speech recognition mould of the error rate no more than the trained threshold value, after being trained
Type.
6. the voice login method according to claim 5 based on artificial intelligence, which is characterized in that described by the output
Layer YjIt is decoded by preset decoder, obtains speech text, comprising:
All account texts are added in preset corpus;
By the output layer YjA variety of sentences are parsed by preset pronunciation dictionary, any sentence is labeled as S, is had in sentence S
N word composition, is defined as S=(w for sentence S1,w2,…,wn), then the probability P (S) of sentence S are as follows:
P (S)=P (w1)P(w2|w1)…P(wn|wn-1)
Wherein, wnFor the word of n-th of appearance in sentence S, C (wn-1) it is the number that (n-1)th word occurs in corpus, C (wn- 1wn) it is (n-1)th word and n-th of word together under state, the number occurred in corpus;
Using the maximum sentence S of probability P (S) as final speech text.
7. the voice login method according to claim 1 based on artificial intelligence, which is characterized in that described by the English
Text carries out error correction comparison, obtains target unlock account, comprising:
All system accounts in system database are obtained, the word of each system account and the English text is calculated separately
Error rate values weri, character error rate value weriCalculation formula it is as follows:
Wherein, SiThe alphabetical number replaced when indicating to be calculated with i-th of system account, D indicate with i-th of system account into
The alphabetical number that row is deleted when calculating, L indicate the alphabetical number being inserted into when being calculated with i-th of system account, NiIndicate i-th
Alphabetical total number in a system account;
Obtain character error rate value weriIn minimum value, by the corresponding system account of minimum value be defined as target unlock account.
8. a kind of voice logon device based on artificial intelligence characterized by comprising
Training module acquires the corresponding voice fields of the account text for obtaining preset account text, will be multiple described
Voice fields input preset speech recognition modeling and are trained, the speech recognition modeling after being trained;
Speech recognition module calls preset recording software acquisition voice fields, by the voice word for receiving log on request
Section inputs in the speech recognition modeling and carries out speech recognition, obtains the corresponding account text of the voice fields;
Conversion module, for judging whether the account text is Chinese text, if Chinese text, then by the account text
Chinese part is converted into phonetic, obtains English text, otherwise, the account text is English text;
Error correction comparison module obtains target unlock account, using the target for the English text to be carried out error correction comparison
Unlock account is logged in.
9. a kind of computer equipment, which is characterized in that including memory and processor, being stored with computer in the memory can
Reading instruction, when the computer-readable instruction is executed by the processor, so that the processor executes such as claim 1 to 7
Any one of voice login method described in claim based on artificial intelligence the step of.
10. a kind of storage medium for being stored with computer-readable instruction, which is characterized in that the computer-readable instruction is by one
Or multiple processors are when executing, so that one or more processors are executed as described in any one of claims 1 to 7 claim
The step of voice login method based on artificial intelligence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910424460.XA CN110232917A (en) | 2019-05-21 | 2019-05-21 | Voice login method, device, equipment and storage medium based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910424460.XA CN110232917A (en) | 2019-05-21 | 2019-05-21 | Voice login method, device, equipment and storage medium based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110232917A true CN110232917A (en) | 2019-09-13 |
Family
ID=67861422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910424460.XA Pending CN110232917A (en) | 2019-05-21 | 2019-05-21 | Voice login method, device, equipment and storage medium based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110232917A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110995938A (en) * | 2019-12-13 | 2020-04-10 | 上海优扬新媒信息技术有限公司 | Data processing method and device |
CN111508498A (en) * | 2020-04-09 | 2020-08-07 | 携程计算机技术(上海)有限公司 | Conversational speech recognition method, system, electronic device and storage medium |
CN112765335A (en) * | 2021-01-27 | 2021-05-07 | 上海三菱电梯有限公司 | Voice calling landing system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020178004A1 (en) * | 2001-05-23 | 2002-11-28 | Chienchung Chang | Method and apparatus for voice recognition |
CN102984152A (en) * | 2012-11-27 | 2013-03-20 | 江苏乐买到网络科技有限公司 | Password authentication method based on online shopping |
CN103986826A (en) * | 2014-05-12 | 2014-08-13 | 深圳市威富多媒体有限公司 | Mobile terminal encrypting and decrypting method and device based on voice recognition |
CN107395352A (en) * | 2016-05-16 | 2017-11-24 | 腾讯科技(深圳)有限公司 | Personal identification method and device based on vocal print |
CN107731228A (en) * | 2017-09-20 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | The text conversion method and device of English voice messaging |
CN108632137A (en) * | 2018-03-26 | 2018-10-09 | 平安科技(深圳)有限公司 | Answer model training method, intelligent chat method, device, equipment and medium |
-
2019
- 2019-05-21 CN CN201910424460.XA patent/CN110232917A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020178004A1 (en) * | 2001-05-23 | 2002-11-28 | Chienchung Chang | Method and apparatus for voice recognition |
CN102984152A (en) * | 2012-11-27 | 2013-03-20 | 江苏乐买到网络科技有限公司 | Password authentication method based on online shopping |
CN103986826A (en) * | 2014-05-12 | 2014-08-13 | 深圳市威富多媒体有限公司 | Mobile terminal encrypting and decrypting method and device based on voice recognition |
CN107395352A (en) * | 2016-05-16 | 2017-11-24 | 腾讯科技(深圳)有限公司 | Personal identification method and device based on vocal print |
CN107731228A (en) * | 2017-09-20 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | The text conversion method and device of English voice messaging |
CN108632137A (en) * | 2018-03-26 | 2018-10-09 | 平安科技(深圳)有限公司 | Answer model training method, intelligent chat method, device, equipment and medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110995938A (en) * | 2019-12-13 | 2020-04-10 | 上海优扬新媒信息技术有限公司 | Data processing method and device |
CN110995938B (en) * | 2019-12-13 | 2022-04-26 | 度小满科技(北京)有限公司 | Data processing method and device |
CN111508498A (en) * | 2020-04-09 | 2020-08-07 | 携程计算机技术(上海)有限公司 | Conversational speech recognition method, system, electronic device and storage medium |
CN111508498B (en) * | 2020-04-09 | 2024-01-30 | 携程计算机技术(上海)有限公司 | Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium |
CN112765335A (en) * | 2021-01-27 | 2021-05-07 | 上海三菱电梯有限公司 | Voice calling landing system |
CN112765335B (en) * | 2021-01-27 | 2024-03-08 | 上海三菱电梯有限公司 | Voice call system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9672825B2 (en) | Speech analytics system and methodology with accurate statistics | |
US7689418B2 (en) | Method and system for non-intrusive speaker verification using behavior models | |
WO2021073116A1 (en) | Method and apparatus for generating legal document, device and storage medium | |
CN107305541A (en) | Speech recognition text segmentation method and device | |
CN110232917A (en) | Voice login method, device, equipment and storage medium based on artificial intelligence | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
Levitan et al. | Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. | |
CN113628627B (en) | Electric power industry customer service quality inspection system based on structured voice analysis | |
CN108877769B (en) | Method and device for identifying dialect type | |
US20180308501A1 (en) | Multi speaker attribution using personal grammar detection | |
Schultz et al. | The ISL meeting room system | |
CN111523317B (en) | Voice quality inspection method and device, electronic equipment and medium | |
CN110246509A (en) | A kind of stack denoising self-encoding encoder and deep neural network structure for voice lie detection | |
CN109920447A (en) | Recording fraud detection method based on sef-adapting filter Amplitude & Phase feature extraction | |
JP4143541B2 (en) | Method and system for non-intrusive verification of speakers using behavior models | |
Wildermoth et al. | GMM based speaker recognition on readily available databases | |
KR102407055B1 (en) | Apparatus and method for measuring dialogue quality index through natural language processing after speech recognition | |
US20050043957A1 (en) | Selective sampling for sound signal classification | |
Wray et al. | Best practices for crowdsourcing dialectal arabic speech transcription | |
CN113051923B (en) | Data verification method and device, computer equipment and storage medium | |
CA2621952A1 (en) | System for excluding unwanted data from a voice recording | |
Tumminia et al. | Diarization of legal proceedings. Identifying and transcribing judicial speech from recorded court audio | |
Manikandan et al. | Speaker identification using a novel prosody with fuzzy based hierarchical decision tree approach | |
Navrátil et al. | An instantiable speech biometrics module with natural language interface: Implementation in the telephony environment | |
Markowitz | The many roles of speaker classification in speaker verification and identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |