CN110047477A - A kind of optimization method, equipment and the system of weighted finite state interpreter - Google Patents

A kind of optimization method, equipment and the system of weighted finite state interpreter Download PDF

Info

Publication number
CN110047477A
CN110047477A CN201910271141.XA CN201910271141A CN110047477A CN 110047477 A CN110047477 A CN 110047477A CN 201910271141 A CN201910271141 A CN 201910271141A CN 110047477 A CN110047477 A CN 110047477A
Authority
CN
China
Prior art keywords
token
finite state
weighted finite
data structure
interpreter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910271141.XA
Other languages
Chinese (zh)
Other versions
CN110047477B (en
Inventor
孙昊
欧阳鹏
尹首一
李秀东
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingwei Intelligent Technology Co Ltd
Original Assignee
Beijing Qingwei Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingwei Intelligent Technology Co Ltd filed Critical Beijing Qingwei Intelligent Technology Co Ltd
Priority to CN201910271141.XA priority Critical patent/CN110047477B/en
Publication of CN110047477A publication Critical patent/CN110047477A/en
Application granted granted Critical
Publication of CN110047477B publication Critical patent/CN110047477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides optimization method, system, computer equipment and the computer readable storage mediums of a kind of weighted finite state interpreter, are related to technical field of voice recognition.The system includes data structure optimization module, for optimizing to Token data structure;Switching mechanism models block, for carrying out transfer plus chain to acoustic output and language model with the Token data structure after optimizing, obtains weighted finite state interpreter;Interpreter cuts module, for cutting to the weighted finite state interpreter;Optimum route search module is searched in a manner of traverse node for the Token data structure in the corresponding search space of the weighted finite state interpreter after optimization, obtains optimal path.The present invention reduces the conversion between different structure, reduces the expense of memory and calculating, reduces the access times of internal storage data, optimizes optimum route search algorithm, realizes the purpose of boosting algorithm efficiency.

Description

A kind of optimization method, equipment and the system of weighted finite state interpreter
Technical field
Process field of the present invention about voice signal is concretely a kind of especially with regard to the identification technology of voice Optimization method, system, computer equipment and the computer readable storage medium of weighted finite state interpreter.
Background technique
Background that this section is intended to provide an explanation of the embodiments of the present invention set forth in the claims or context.Herein Description recognizes it is the prior art not because not being included in this section.
Speech recognition be the voice of the mankind is converted to it is computer-readable enter text.Speech recognition technology is in every field In effect it is increasingly prominent very strong.In a government office, commercial department, apply in family life, the work of people can be given Make and life brings great convenience.With the development of internet, digitizer and multimedia technology, speech recognition is more next More paid attention to, voice Related product also emerges one after another, such as the Siri of apple, the Echo of Amazon, Home of Google etc..This A little voice Related products were quickly introduced to the market in recent years, had received public favorable comment.Future speech identification will apply to more necks Domain, medical, intelligent vehicle-carried, smart home, education etc..
Speech recognition system, which generally comprises, obtains feature vector, acoustic model, language model and decoder.Such as Fig. 1 institute Show, O and W respectively the observation feature vector of training sentence and corresponding word sequence, and P (O | W) it is acoustic model probability, P (W) mark The matching degree for knowing speech acoustics feature and word sequence W, when P (W) P (O | W) reaches maximum value, word sequence W* is as speech recognition Output.It can be seen that decoder is one of core of speech recognition system, the quality of decoder will have a direct impact on final knowledge Other result it is excellent.In recent years, many decoding strategies and various decoding functions are applied in decoder.For example, HTK The Hvite decoding tool of (Hidden Markov Model Toolkit), Sphinx decoder, TODE decoder etc..These solutions The common drawback of code device is that the form application in the linguistries such as represented acoustics, voice, dictionary source is very raw in a decoder Firmly, so that modification operation later is very cumbersome.In order to find more flexible decoder architecture, " weighted finite state conversion The concept of machine " (WFST) is suggested, and theory is the grammar construct and characteristic with WFST model come simulation language.
The core concept of WFST is to mark acoustic output, language model with a weighted finite state interpreter respectively Know, a complete weighted finite state interpreter model is then integrated by combinational algorithm, so as to obtain needle To the search space of sample characteristics, optimal path is finally searched out in search space.So WFST mainly includes three parts: Finite state machine building, finite state machine cut, search for optimal path.
Finite state machine is a kind of simple and effective mathematical model.Since finite state machine all has over time and space There is high efficiency, so that it is widely used in speech recognition and natural language processing, core ideas is to identify traditional voice Mathematical model in system is converted into finite state machine model respectively, then the model after conversion is effectively integrated and optimized, and obtains To search space.So the advantages of operation, is to have unified the representation of model, so that integrating different resources becomes more It is easy, and greatlies simplify the complexity of speech recognition system.Weighted finite state interpreter (WFST) is limited A kind of special shape of state machine.
In finite state machine, with point identification state, the direction line segment with haircut indicates transfer, the character representation in transfer Input character.Original state is indicated with overstriking circle, and two-wire circle indicates final state.When a state is both original state, When being also final state, indicated with double thick line circles, as shown in Figure 2.
State contained by finite state machine be it is limited, only one of them is original state, the above final state of zero. Finite state machine is entered by original state, is shifted by inputting character, reaches next state, is turned when completing the last one The state reached after shifting is final state, then exports receiving, otherwise output refusal.Transfer is indicated with directive camber line.Have A series of state and transfer constitute path in limit state machine.
In order to better describe the characteristic of finite state machine, setup parameter weighted value assigns different transfers different power Weight, just generates weighted finite state machine.Weighted finite state machine is as shown in figure 3, the colon left side is input character, colon the right It is output character, is the weight of transfer on the right of oblique line.If a paths are P=p1 ... pi ... pn, wherein pi indicates the path I-th of transfer, wherein i=1,2 ..., n.W [pi] indicates the weighted value of path pi, and λ is the set of original state weight.η is eventually The only set of state weight.The weight of the original state of path P is λ [p1], and the weight of final state is η [pn], then path P institute There is the weighted value in transfer are as follows:
The optimal path of weighted finite state machine are as follows:
After generating weighted finite state machine, in order to reduce the search space size of speech recognition and improve identification effect Rate, will use in next step merge algorithm to the path of Similar Track and weight very little cut and merge generation simplify after Finite state machine.However, the search space process that finite state machine generates is more complicated since language model is very huge, and And search space is also larger, leads to current WFST algorithm very holding time and space resources, to the optimization work of WFST algorithm Make just very necessary.
The structure of WFST mileage evidence has StdLattice, Token, Lattice, CompactLattice.Wherein, StdLattice is mainly used in storage of the language model to data, and Token is after storing acoustic model and language model combination The weighted finite state machine of generation, Lattice and CompactLattice are used in GetBestPath (search optimal path) mould In block.Token is two-dimensional network topology structure, and Lattice is the table structure being converted to by Token structure, CompactLattice is the table structure obtained by Lattice conversion.Fig. 4 is Token structure, and Fig. 5 is CompactLattice structure.
As shown in the above, WFST mono- has shared 4 data structures and has stored to data, these four data structures point Do not play the role of different.In order to guarantee that the discrimination of speech recognition, WFST need to save a large amount of intermediate state and path, institute With the calculating and memory source for resulting in conversion, traversal and the storage consumption of these four data structures very more.Secondly, GetBestPath (search optimal path) module is very time-consuming module in WFST.It needs to calculate in GetBeastPath The cost of every possible path out, WFST use the very high Depth Priority Algorithm of complexity to realize the function, Cause the time-consuming of algorithm very high.Finally, having some redundant operations in WFST, can merging and optimizing.
Therefore, a kind of new weighted finite state interpreter how is provided, to solve existing weighted finite state interpreter The existing above problem is urgent technical problem to be solved in the field.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of optimization methods of weighted finite state interpreter, system, calculating Machine equipment and computer readable storage medium merge storage organization by compression, reduce the conversion between different structure, substantially The expense for reducing memory and calculating is greatly decreased the access times of internal storage data, realizes and promoted by optimizing the realization of WFST The purpose of efficiency of algorithm, by the searching algorithm of Optimizing Search optimal path, the exponential number for reducing traverse node realizes effect Rate is substantially improved.
It is an object of the invention to provide a kind of optimization methods of weighted finite state interpreter, comprising:
Token data structure is optimized;
Transfer plus chain are carried out with the Token data structure after optimizing to acoustic output and language model, obtaining weighting has Limit State Transformer;
The weighted finite state interpreter is cut;
In the corresponding search space of the weighted finite state interpreter by optimization after Token data structure with time The mode of joint-running point is searched for, and optimal path is obtained.
It is an object of the invention to provide a kind of optimization systems of weighted finite state interpreter, comprising:
Data structure optimization module, for being optimized to Token data structure;
Switching mechanism models block, for being carried out to acoustic output and language model with the Token data structure after optimizing Transfer plus chain, obtain weighted finite state interpreter;
Interpreter cuts module, for cutting to the weighted finite state interpreter;
Optimum route search module, for passing through optimization in the corresponding search space of the weighted finite state interpreter Token data structure afterwards is searched in a manner of traverse node, obtains optimal path.
It is an object of the invention to provide a kind of computer equipments, comprising: be adapted for carrying out each instruction processor and Equipment is stored, the storage equipment is stored with a plurality of instruction, and described instruction, which is suitable for being loaded by processor and being executed a kind of weight, to be had Limit the optimization method of State Transformer.
It is an object of the invention to provide a kind of computer readable storage mediums, are stored with computer program, the meter Calculation machine program is used to execute a kind of optimization method of weighted finite state interpreter.
The beneficial effects of the present invention are provide optimization method, the system, calculating of a kind of weighted finite state interpreter Machine equipment and computer readable storage medium merge storage organization by compression, reduce the conversion between different structure, substantially The expense for reducing memory and calculating is greatly decreased the access times of internal storage data, realizes and promoted by optimizing the realization of WFST The purpose of efficiency of algorithm, by the searching algorithm of Optimizing Search optimal path, the exponential number for reducing traverse node realizes effect Rate is substantially improved, and solves weighted finite state machine (WFST) in the prior art and occupies asking for very high time and space resources Topic.
For above and other objects, features and advantages of the invention can be clearer and more comprehensible, preferred embodiment is cited below particularly, And cooperate institute's accompanying drawings, it is described in detail below.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the architecture diagram of speech recognition;
Fig. 2 is the structure chart of finite state machine;
Fig. 3 is the structure chart of weighted finite state machine;
Fig. 4 is the structure chart of the storage organization Token of WFST in the prior art;
Fig. 5 is the structure chart of the storage organization CompactLattice of WFST in the prior art;
Fig. 6 is a kind of flow chart of the optimization method of weighted finite state interpreter provided in an embodiment of the present invention;
Fig. 7 is the specific flow chart of the step S101 in Fig. 6;
Fig. 8 is the specific flow chart of the step S102 in Fig. 6;
Fig. 9 is the specific flow chart of the step S303 in Fig. 8;
Figure 10 is the specific flow chart of the step S304 in Fig. 8;
Figure 11 is the specific flow chart of the step S104 in Fig. 6;
Figure 12 is a kind of structural schematic diagram of the optimization system of weighted finite state interpreter provided in an embodiment of the present invention;
Figure 13 is that data structure is excellent in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention Change the structural schematic diagram of module;
Figure 14 is interpreter building in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of module;
Figure 15 is that first transfer adds in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of chain module;
Figure 16 is that second transfer adds in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of chain module;
Figure 17 is that optimal path is searched in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of rope module.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, method or calculating Machine program product.Therefore, disclose can be with specific implementation is as follows by the present invention, it may be assumed that complete hardware, complete software (packet Include firmware, resident software, microcode etc.) or hardware and software combine form.
Below with reference to several representative embodiments of the invention, the principle and spirit of the present invention are explained in detail.
Figure 12 is a kind of structural schematic diagram of the optimization system of weighted finite state interpreter provided in an embodiment of the present invention, Referring to Figure 12, the optimization system of the weighted finite state interpreter includes:
Data structure optimization module 100, for being optimized to Token data structure.
Switching mechanism models block 200, for the Token data structure after optimizing to acoustic output and language model into Row transfer plus chain, obtain weighted finite state interpreter.
Interpreter cuts module 300, for cutting to the weighted finite state interpreter;
Optimum route search module 400, for passing through in the corresponding search space of the weighted finite state interpreter Token data structure after optimization is searched in a manner of traverse node, obtains optimal path.
That is, in the present invention, firstly, the access times of internal storage data are greatly decreased by the realization of optimization WFST, it is real The purpose of existing boosting algorithm efficiency.Secondly, compression merges storage organization, the conversion between different structure is reduced, is substantially reduced The expense of memory and calculating.Finally, the searching algorithm of optimization GetBestPath, the exponential number for reducing traverse node, are realized Efficiency is substantially improved.
In one embodiment of the invention, acoustic output can be obtained by acoustic output determining module, specifically, sound It learns output determining module and obtains acoustic output for voice to be input to an acoustic model.
Below with reference to specific attached drawing, technical solution of the present invention and innovative point is discussed in detail.
In the prior art, as many as there are four types of WFST data store organisations, conversion is carried out between data structure and being brought very greatly Resource consumption.The present invention integrates and optimizes for excessive data structure, to Token structure of modification in the prior art Upgrading, to realize that Lattice and CompactLattice have the function of supporting GetBestPath (search optimal path), from And data structure can be reduced into StdLattice, Token two, reduce the number of data conversion, so that computationally intensive Width is reduced, and the peak value of memory consumption reduces 50% or so,
Specifically, Figure 13 is to convert in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of Mechanism Modeling block, please refers to Figure 13, and switching mechanism modeling block 100 includes:
Data structure adding module 101, for adding in Token data structure for storing previous frame node address Token*last structure;
Data variable adding module 102, for adding the cost value for storing each path in Token data structure Tot_cost variable.
That is, first forward direction calculates the cost of each path and is saved in the prior art in search optimal path module In the node of last frame, the node of minimum cost is obtained, then retracts forward and obtains optimal path.However it is in the prior art The each path of Token structure is single-track link table, cannot achieve Backward Function.Therefore, the invention in Token structure In addition storage previous frame node address Token*last structure, so it is subsequent Token generate when can recorde One frame address of node solves the problems, such as that Token can not retract.In addition, original token structure does not have storage each path Cost value function, the invention in Token structure addition storage each path cost value tot_cost Variable.
In the prior art, Token is most important storage organization in WFST, it is two-dimensional topological structure, be can be convenient Expression interframe and frame in directional information, so the data after language model and acoustics models coupling are using Token structure Storage, however since data volume is very huge, so the creation of Token is also more time-consuming.
The creation of Token is there are two part in the prior art, and one is addition non-empty transfer ProcessEmitting, separately One is that addition idle running moves NoneProcessEmitting.The two steps are to separate to call in WFST, so creating Each node at least traverses twice when Token structure, and the present invention moves non-empty transfer ProcessEmitting and idle running NoneProcessEmitting is merged, to realize that node traverses number halves, the access time of memory is greatly reduced Number, since access internal storage data occupies the time-consuming of larger proportion in creation Token, to realize the creation part Token Efficiency optimization.
Specifically, Figure 14 is to convert in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of Mechanism Modeling block, please refers to Figure 14, and switching mechanism modeling block 200 includes:
First frame obtains module 201, for obtaining the first frame of the acoustic output as present frame, the acoustic output It is made of multiple frames;
Primary transfer obtains module 202, for once obtaining out the corresponding sky of the present frame in the language model Transfer and non-empty transfer;
First transfer plus chain module 203, for according to the corresponding idle running shift-in row transfer of the present frame plus chain;
Second transfer plus chain module 204, for being transferred into row transfer plus chain according to the corresponding non-empty of the present frame;
Frame spider module 205 is stored in for traversing multiple frames of the acoustic output, and by previous frame node address Token*last structure.
Figure 15 is that first transfer adds in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of chain module, referring to Figure 15, first transfer plus chain module 203 include:
First judgment module 2031, it is described for judging whether the corresponding idle running shifting of the present frame meets first threshold First threshold includes fixed threshold and dynamic threshold;
First adds chain module 2032, for when the first judgment module is judged as YES, the present frame to add chain;
First gives up module 2033, for giving up the present frame when the first judgment module is judged as NO.
Figure 16 is that second transfer adds in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of chain module, referring to Figure 16, second transfer plus chain module 204 include:
Second judgment module 2041, for judging whether the corresponding non-empty transfer of the present frame meets second threshold, institute Stating second threshold includes fixed threshold and dynamic threshold;
Second adds chain module 2042, for when second judgment module is judged as YES, the present frame to add chain;
Second gives up module 2043, for giving up the present frame when second judgment module is judged as NO.
That is, in the present invention, by the way that non-empty is shifted ProcessEmitting and idle running shifting NoneProcessEmitting, which merges, realizes that the amount of access of datarams halves.Specifically, when adding Token, in language mould Corresponding idle running shifting and non-empty transfer are once obtained in type, then successively judge whether the idle running shifting and non-empty transfer meet threshold Value adds chain if meeting, gives up if being unsatisfactory for.The generation of in the prior art plus chain threshold value is separate computations, Idle running is moved in the present invention and non-empty shifts after merging, first threshold, second threshold are unified calculations.
In the prior art, WFST most time-consuming module is GetBestPath (search optimal path) part. The target of GetBestPath is the cost for calculating each path, then selects the smallest path cost.In order to calculate each path Cost need using path search algorithm.WFST has used Depth Priority Algorithm.But Depth Priority Algorithm is multiple Miscellaneous degree is very high, very time-consuming.
Figure 17 is that optimal path is searched in a kind of optimization system of weighted finite state interpreter provided in an embodiment of the present invention The structural schematic diagram of rope module, referring to Figure 17, the optimum route search module 400 includes:
Cost value determining module 401, for arranging the Token chained list in frame, so that chained list discharges in order in frame, successively It traverses in frame and the Token chained list of interframe, adds up the weight of Token chain road according to points relationship, obtain each path Cost value stores into tot_cost variable, while recording the pointer of father node into Token*last;
Cost value comparison module 402, the Token chain of the last frame for taking the weighted finite state interpreter after cutting Table adds finalcost value, the cost size of more each node to the cost of each node;
Optimal path determining module 403, for taking the smallest cost node and the link that retracts is until first node, to obtain Optimal path.
That is, the Token data structure after present invention optimization, the interframe direction of Token is unidirectionally created, so interframe There is no inverted order directions, and because of the arrangement to chained list direction has been done in frame before looking for optimal path, so that node refers to To can only refer to a direction, also there is no inverted orders to be directed toward in same frame.So tired by the sequence of two dimensions in interframe and frame Cost is added not malfunction.The sequence of Token structural integrity saved in interframe and frame, so may be implemented using Token excellent Change node searching.
As shown in figure 4, Token is two-dimensional topological structure, first frame is node 1, and the second frame is node 2,3, third frame It is node 4,5,6, traversal order is exactly the sequence according to node 1,2,3,4,5,6.It completely remains single frames and interframe Information, since Token structure is (because Token is the sequential build according to frame) being directed toward there is no the inverted order of interframe, institute Depth Priority Algorithm is used with unnecessary when doing route searching, and the much lower node time of complexity can be used It goes through, the sequence of traversal carries out in the way of interframe rear in first frame.The complexity of Depth Priority Algorithm is O (n2), traversal Complexity be O (n), it can be seen that improved efficiencyTimes, since number of nodes n is very big, so improved efficiency width It spends larger.
As above it is a kind of optimization system of weighted finite state interpreter provided by the invention, storage is merged by compression Structure reduces the conversion between different structure, substantially reduces the expense of memory and calculating, by optimizing the realization of WFST, greatly Width reduces the access times of internal storage data, realizes the purpose of boosting algorithm efficiency, is calculated by the search of Optimizing Search optimal path Method, the exponential number for reducing traverse node, realizes being substantially improved for efficiency.
In addition, although being referred to several unit modules of system in the above detailed description, it is this to divide only simultaneously Non-imposed.In fact, embodiment according to the present invention, the feature and function of two or more above-described units can To embody in a unit.Equally, the feature and function of an above-described unit can also be served as reasons with further division Multiple units embody.Terms used above " module " and " unit ", can be realize predetermined function software and/or Hardware.Although module described in following embodiment is preferably realized with software, the group of hardware or software and hardware The realization of conjunction is also that may and be contemplated.
After describing the optimization system of weighted finite state interpreter of exemplary embodiment of the invention, connect down Come, is introduced with reference to method of the attached drawing to exemplary embodiment of the invention.The implementation of this method may refer to above-mentioned entirety Implementation, overlaps will not be repeated.
Fig. 6 is a kind of flow diagram of the optimization method of weighted finite state interpreter provided in an embodiment of the present invention, Fig. 6 is referred to, the optimization method of the weighted finite state interpreter includes:
S101: Token data structure is optimized.
S102: transfer plus chain are carried out with the Token data structure after optimizing to acoustic output and language model, added Weigh Finite State Transformer.
S103: the weighted finite state interpreter is cut;
S104: pass through the Token data knot after optimization in the corresponding search space of the weighted finite state interpreter Structure is searched in a manner of traverse node, obtains optimal path.
That is, in the present invention, firstly, the access times of internal storage data are greatly decreased by the realization of optimization WFST, it is real The purpose of existing boosting algorithm efficiency.Secondly, compression merges storage organization, the conversion between different structure is reduced, is substantially reduced The expense of memory and calculating.Finally, the searching algorithm of optimization GetBestPath, the exponential number for reducing traverse node, are realized Efficiency is substantially improved.
In one embodiment of the invention, acoustic output can be obtained by acoustic output determining module, specifically: Voice is input to an acoustic model, obtains acoustic output.
Below with reference to specific attached drawing, technical solution of the present invention and innovative point is discussed in detail.
In the prior art, as many as there are four types of WFST data store organisations, conversion is carried out between data structure and being brought very greatly Resource consumption.The present invention integrates and optimizes for excessive data structure, to Token structure of modification in the prior art Upgrading, to realize that Lattice and CompactLattice have the function of supporting GetBestPath (search optimal path), from And data structure can be reduced into StdLattice, Token two, reduce the number of data conversion, so that computationally intensive Width is reduced, and the peak value of memory consumption reduces 50% or so,
Specifically, Fig. 7 is right in a kind of optimization method of weighted finite state interpreter provided in an embodiment of the present invention The flow diagram that Token data structure optimizes, referring to Fig. 7, being optimized to Token data structure and including:
S201: the Token*last structure for storing previous frame node address is added in Token data structure;
S202: the tot_cost variable for storing the cost value of each path is added in Token data structure.
That is, first forward direction calculates the cost of each path and is saved in the prior art in search optimal path module In the node of last frame, the node of minimum cost is obtained, then retracts forward and obtains optimal path.However it is in the prior art The each path of Token structure is single-track link table, cannot achieve Backward Function.Therefore, the invention in Token structure In addition storage previous frame node address Token*last structure, so it is subsequent Token generate when can recorde One frame address of node solves the problems, such as that Token can not retract.In addition, original token structure does not have storage each path Cost value function, the invention in Token structure addition storage each path cost value tot_cost Variable.
In the prior art, Token is most important storage organization in WFST, it is two-dimensional topological structure, be can be convenient Expression interframe and frame in directional information, so the data after language model and acoustics models coupling are using Token structure Storage, however since data volume is very huge, so the creation of Token is also more time-consuming.
The creation of Token is there are two part in the prior art, and one is addition non-empty transfer ProcessEmitting, separately One is that addition idle running moves NoneProcessEmitting.The two steps are to separate to call in WFST, so creating Each node at least traverses twice when Token structure, and the present invention moves non-empty transfer ProcessEmitting and idle running NoneProcessEmitting is merged, to realize that node traverses number halves, the access time of memory is greatly reduced Number, since access internal storage data occupies the time-consuming of larger proportion in creation Token, to realize the creation part Token Efficiency optimization.
Specifically, Fig. 8 is step in a kind of optimization method of weighted finite state interpreter provided in an embodiment of the present invention The flow diagram of S102, referring to Fig. 8, step S102 includes:
S301: the first frame of the acoustic output is obtained as present frame, the acoustic output is made of multiple frames;
S302: the corresponding idle running of the present frame is once obtained out in the language model and is moved and non-empty transfer;
S303: according to the corresponding idle running shift-in row transfer of the present frame plus chain;
S304: row transfer plus chain are transferred into according to the corresponding non-empty of the present frame;
S305: multiple frames of the acoustic output are traversed, and previous frame node address is stored in Token*last structure.
Fig. 9 is the flow diagram of step S303, refers to Fig. 9, step S303 includes:
S401: judging whether the corresponding idle running shifting of the present frame meets first threshold, and the first threshold includes fixing Threshold value and dynamic threshold;
S402: when step S401 is judged as YES, the present frame adds chain;
S403: when step S401 is judged as NO, give up the present frame.
Figure 10 is the flow diagram of step S304, referring to Figure 10, step S304 includes:
S501: judging whether the corresponding non-empty transfer of the present frame meets second threshold, and the second threshold includes solid Determine threshold value and dynamic threshold;
S502: when step S501 is judged as YES, the present frame adds chain;
S503: when step S501 is judged as NO, give up the present frame.
That is, in the present invention, by the way that non-empty is shifted ProcessEmitting and idle running shifting NoneProcessEmitting, which merges, realizes that the amount of access of datarams halves.Specifically, when adding Token, in language mould Corresponding idle running shifting and non-empty transfer are once obtained in type, then successively judge whether the idle running shifting and non-empty transfer meet threshold Value adds chain if meeting, gives up if being unsatisfactory for.The generation of in the prior art plus chain threshold value is separate computations, Idle running is moved in the present invention and non-empty shifts after merging, first threshold, second threshold are unified calculations.
In the prior art, WFST most time-consuming module is GetBestPath (search optimal path) part. The target of GetBestPath is the cost for calculating each path, then selects the smallest path cost.In order to calculate each path Cost need using path search algorithm.WFST has used Depth Priority Algorithm.But Depth Priority Algorithm is multiple Miscellaneous degree is very high, very time-consuming.
Figure 11 is the specific flow chart of the step S104 in Fig. 6, referring to Figure 11, step S104 includes:
S601: arranging the Token chained list in frame, so that chained list discharges in order in frame, successively traverses in frame and interframe Token chained list adds up the weight of Token chain road according to points relationship, obtains the cost value of each path, and tot_ is arrived in storage In cost variable, while the pointer of father node is recorded into Token*last;
S602: the Token chained list of the last frame of the weighted finite state interpreter after cutting is taken, to each node Cost adds finalcost value, the cost size of more each node;
S603: taking the smallest cost node and the link that retracts is up to first node, to obtain optimal path.
That is, the Token data structure after present invention optimization, the interframe direction of Token is unidirectionally created, so interframe There is no inverted order directions, and because of the arrangement to chained list direction has been done in frame before looking for optimal path, so that node refers to To can only refer to a direction, also there is no inverted orders to be directed toward in same frame.So tired by the sequence of two dimensions in interframe and frame Cost is added not malfunction.The sequence of Token structural integrity saved in interframe and frame, so may be implemented using Token excellent Change node searching.
As shown in figure 4, Token is two-dimensional topological structure, first frame is node 1, and the second frame is node 2,3, third frame It is node 4,5,6, traversal order is exactly the sequence according to node 1,2,3,4,5,6.It completely remains single frames and interframe Information, since Token structure is (because Token is the sequential build according to frame) being directed toward there is no the inverted order of interframe, institute Depth Priority Algorithm is used with unnecessary when doing route searching, and the much lower node time of complexity can be used It goes through, the sequence of traversal carries out in the way of interframe rear in first frame.The complexity of Depth Priority Algorithm is O (n2), traversal Complexity be O (n), it can be seen that improved efficiencyTimes, since number of nodes n is very big, so improved efficiency width It spends larger.
As above it is a kind of optimization method of weighted finite state interpreter provided by the invention, storage is merged by compression Structure reduces the conversion between different structure, substantially reduces the expense of memory and calculating, by optimizing the realization of WFST, greatly Width reduces the access times of internal storage data, realizes the purpose of boosting algorithm efficiency, is calculated by the search of Optimizing Search optimal path Method, the exponential number for reducing traverse node, realizes being substantially improved for efficiency.
By the comparative experiments on pc it may be concluded that GetBestPath optimizes preceding time-consuming 11.376s, after optimization 1.479s.Total time-consuming 13.136s, 2.95s after optimization before WFST optimizes.WFST time-consuming after optimization reduces 77.54%, memory The peak value of consumption reduces 47.3%.The output result for optimizing front and back is completely the same, proves by example, the present invention is to decoded Discrimination is without influence.
The present invention also provides a kind of computer equipments, comprising: it is adapted for carrying out the processor and storage equipment of each instruction, The storage equipment is stored with a plurality of instruction, and described instruction is suitable for being loaded by processor and being executed a kind of weighted finite state conversion The optimization method of machine.
The present invention also provides a kind of computer readable storage mediums, are stored with computer program, the computer program For executing a kind of optimization method of weighted finite state interpreter.
It is improvement on hardware (for example, to diode, crystal that the improvement of one technology, which can be distinguished clearly, Pipe, switch etc. circuit structures improvement) or software on improvement (improvement for method flow).However, with technology The improvement of development, current many method flows can be considered as directly improving for hardware circuit.Designer is almost All corresponding hardware circuit is obtained by the way that improved method flow to be programmed into hardware circuit.Therefore, it cannot be said that one The improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (ProgrammableLogic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog2.Those skilled in the art It will be apparent to the skilled artisan that only needing method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages In, so that it may it is readily available the hardware circuit for realizing the logical method process.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.
It is also known in the art that other than realizing controller in a manner of pure computer readable program code, it is complete Entirely can by by method and step carry out programming in logic come so that controller with logic gate, switch, specific integrated circuit, programmable Logic controller realizes identical function with the form for being embedded in microcontroller etc..Therefore this controller is considered one kind Hardware component, and the structure that the device for realizing various functions for including in it can also be considered as in hardware component.Or Even, can will be considered as realizing the device of various functions either the software module of implementation method can be Hardware Subdivision again Structure in part.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer system (can be personal computer, server or network system etc.) executes the certain of each embodiment of the application or embodiment Method described in part.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, clothes Business device computer, hand system or portable system, plate system, multicomputer system, microprocessor-based system, set Top box, programmable consumer electronics system, network PC, minicomputer, mainframe computer including any of the above system or system Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected teleprocessing system of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage system.
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application there are many deformation and Variation is without departing from spirit herein, it is desirable to which the attached claims include these deformations and change without departing from the application's Spirit.

Claims (10)

1. a kind of optimization method of weighted finite state interpreter, which is characterized in that the described method includes:
Token data structure is optimized;
Transfer plus chain are carried out with the Token data structure after optimizing to acoustic output and language model, obtain weighted finite shape State interpreter;
The weighted finite state interpreter is cut;
By the Token data structure after optimization to traverse section in the corresponding search space of the weighted finite state interpreter The mode of point is searched for, and optimal path is obtained.
2. the method according to claim 1, wherein being optimized to Token data structure and including:
The Token*last structure for storing previous frame node address is added in Token data structure;
The tot_cost variable for storing the cost value of each path is added in Token data structure.
3. the method according to claim 1, wherein the method also includes:
Voice is input to an acoustic model, obtains acoustic output.
4. according to the method described in claim 2, it is characterized in that, to acoustic output and language model to optimize after Token data structure carries out transfer plus chain includes:
The first frame of the acoustic output is obtained as present frame, the acoustic output is made of multiple frames;
The corresponding idle running of the present frame is once obtained out in the language model to move and non-empty transfer;
According to the corresponding idle running shift-in row transfer of the present frame plus chain;
Row transfer plus chain are transferred into according to the corresponding non-empty of the present frame;
Multiple frames of the acoustic output are traversed, and previous frame node address is stored in Token*last structure.
5. according to the method described in claim 4, it is characterized in that, being added according to the corresponding idle running shift-in row transfer of the present frame Chain includes:
Judge whether the corresponding idle running shifting of the present frame meets first threshold, the first threshold includes fixed threshold and dynamic Threshold value;
When the judgment is yes, the present frame adds chain;
When the judgment is no, give up the present frame.
6. according to the method described in claim 4, it is characterized in that, being transferred into capable transfer according to the corresponding non-empty of the present frame The chain is added to include:
Judge whether the corresponding non-empty transfer of the present frame meets second threshold, the second threshold includes fixed threshold and moves State threshold value;
When the judgment is yes, the present frame adds chain;
When the judgment is no, give up the present frame.
7. according to the method described in claim 4, it is characterized in that, empty in the corresponding search of the weighted finite state interpreter Between in optimization after Token data structure searched in a manner of traverse node, obtaining optimal path includes:
The Token chained list in frame is arranged, so that chained list discharges in order in frame, successively traverses the Token chained list in frame with interframe, It adds up the weight of Token chain road according to points relationship, obtains the cost value of each path, store into tot_cost variable, The pointer of father node is recorded into Token*last simultaneously;
The Token chained list for taking the last frame of the weighted finite state interpreter after cutting, adds to the cost of each node Finalcost value, the cost size of more each node;
It takes the smallest cost node and the link that retracts is up to first node, to obtain optimal path.
8. a kind of optimization system of weighted finite state interpreter, which is characterized in that the system comprises:
Data structure optimization module, for being optimized to Token data structure;
Switching mechanism models block, for being shifted to acoustic output and language model with the Token data structure after optimizing Add chain, obtains weighted finite state interpreter;
Interpreter cuts module, for cutting to the weighted finite state interpreter;
Optimum route search module, after in the corresponding search space of the weighted finite state interpreter through optimization Token data structure is searched in a manner of traverse node, obtains optimal path.
9. a kind of computer equipment characterized by comprising it is adapted for carrying out the processor and storage equipment of each instruction, it is described Storage equipment is stored with a plurality of instruction, and described instruction is suitable for being loaded by processor and being executed such as claim 1 to 7 any one institute A kind of optimization method for the weighted finite state interpreter stated.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer program, the computer program is used for Execute a kind of optimization method of weighted finite state interpreter as claimed in any one of claims 1 to 7.
CN201910271141.XA 2019-04-04 2019-04-04 Optimization method, equipment and system of weighted finite state converter Active CN110047477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910271141.XA CN110047477B (en) 2019-04-04 2019-04-04 Optimization method, equipment and system of weighted finite state converter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910271141.XA CN110047477B (en) 2019-04-04 2019-04-04 Optimization method, equipment and system of weighted finite state converter

Publications (2)

Publication Number Publication Date
CN110047477A true CN110047477A (en) 2019-07-23
CN110047477B CN110047477B (en) 2021-04-09

Family

ID=67276250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910271141.XA Active CN110047477B (en) 2019-04-04 2019-04-04 Optimization method, equipment and system of weighted finite state converter

Country Status (1)

Country Link
CN (1) CN110047477B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933119A (en) * 2020-08-18 2020-11-13 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating voice recognition network
CN111968648A (en) * 2020-08-27 2020-11-20 北京字节跳动网络技术有限公司 Voice recognition method and device, readable medium and electronic equipment
CN112259082A (en) * 2020-11-03 2021-01-22 苏州思必驰信息科技有限公司 Real-time voice recognition method and system
CN112989136A (en) * 2021-04-19 2021-06-18 河南科技大学 Simplification method and system of finite state automatic machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968989A (en) * 2012-12-10 2013-03-13 中国科学院自动化研究所 Improvement method of Ngram model for voice recognition
US10008200B2 (en) * 2013-12-24 2018-06-26 Kabushiki Kaisha Toshiba Decoder for searching a path according to a signal sequence, decoding method, and computer program product
CN108694939A (en) * 2018-05-23 2018-10-23 广州视源电子科技股份有限公司 voice search optimization method, device and system
CN108962271A (en) * 2018-06-29 2018-12-07 广州视源电子科技股份有限公司 Multi-weighted finite state transducer merging method, device, equipment and storage medium
WO2018232591A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Sequence recognition processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968989A (en) * 2012-12-10 2013-03-13 中国科学院自动化研究所 Improvement method of Ngram model for voice recognition
US10008200B2 (en) * 2013-12-24 2018-06-26 Kabushiki Kaisha Toshiba Decoder for searching a path according to a signal sequence, decoding method, and computer program product
WO2018232591A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Sequence recognition processing
CN108694939A (en) * 2018-05-23 2018-10-23 广州视源电子科技股份有限公司 voice search optimization method, device and system
CN108962271A (en) * 2018-06-29 2018-12-07 广州视源电子科技股份有限公司 Multi-weighted finite state transducer merging method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHEHUAI CHEN1, JUSTIN LUITJENS, HAINAN XU: "A GPU-based WFST Decoder with Exact Lattice Generation", 《INTERSPEECH》 *
丁佳伟,刘加,张卫强: "WFST解码器词图生成算法中的非活跃节点检测与内存优化", 《中国科学院大学学报》 *
姚煜,RYAD CHELLALI: "基于双向长短时记忆-联结时序分类和加权有限状态转换器的端到端中文语音识别系统", 《计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933119A (en) * 2020-08-18 2020-11-13 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating voice recognition network
CN111968648A (en) * 2020-08-27 2020-11-20 北京字节跳动网络技术有限公司 Voice recognition method and device, readable medium and electronic equipment
CN112259082A (en) * 2020-11-03 2021-01-22 苏州思必驰信息科技有限公司 Real-time voice recognition method and system
CN112989136A (en) * 2021-04-19 2021-06-18 河南科技大学 Simplification method and system of finite state automatic machine

Also Published As

Publication number Publication date
CN110047477B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN110047477A (en) A kind of optimization method, equipment and the system of weighted finite state interpreter
CN105893349B (en) Classification tag match mapping method and device
US20210089874A1 (en) Ultra-low power keyword spotting neural network circuit
CN107832844A (en) A kind of information processing method and Related product
CN109033303B (en) Large-scale knowledge graph fusion method based on reduction anchor points
CN107832432A (en) A kind of search result ordering method, device, server and storage medium
CN108735201A (en) continuous speech recognition method, device, equipment and storage medium
EP3970012A1 (en) Scheduling operations on a computation graph
US20170323638A1 (en) System and method of automatic speech recognition using parallel processing for weighted finite state transducer-based speech decoding
CN111341299B (en) Voice processing method and device
US20220004547A1 (en) Method, apparatus, system, device, and storage medium for answering knowledge questions
JP2010044637A (en) Data processing apparatus, method, and program
CN109119067A (en) Phoneme synthesizing method and device
WO2017177484A1 (en) Voice recognition-based decoding method and device
JP2022031863A (en) Word slot recognition method, device and electronic apparatus
CN115455171A (en) Method, device, equipment and medium for mutual retrieval and model training of text videos
CN113360683B (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
CN116401502A (en) Method and device for optimizing Winograd convolution based on NUMA system characteristics
CN114490922B (en) Natural language understanding model training method and device
CN113689868B (en) Training method and device of voice conversion model, electronic equipment and medium
CN113486659B (en) Text matching method, device, computer equipment and storage medium
CN114330717A (en) Data processing method and device
WO2023093909A1 (en) Workflow node recommendation method and apparatus
US20210034987A1 (en) Auxiliary handling of metadata and annotations for a question answering system
CN115757735A (en) Intelligent retrieval method and system for power grid digital construction result resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Hao

Inventor after: OuYang Peng

Inventor after: Li Xiudong

Inventor after: Wang Bo

Inventor before: Sun Hao

Inventor before: OuYang Peng

Inventor before: Yin Shouyi

Inventor before: Li Xiudong

Inventor before: Wang Bo