CN103065633B - Speech recognition decoding efficiency optimization method - Google Patents

Speech recognition decoding efficiency optimization method Download PDF

Info

Publication number
CN103065633B
CN103065633B CN201210580290.2A CN201210580290A CN103065633B CN 103065633 B CN103065633 B CN 103065633B CN 201210580290 A CN201210580290 A CN 201210580290A CN 103065633 B CN103065633 B CN 103065633B
Authority
CN
China
Prior art keywords
path
node
arc
score
scores
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210580290.2A
Other languages
Chinese (zh)
Other versions
CN103065633A (en
Inventor
鹿晓亮
赵志伟
陈旭
尚丽
吴晓如
于振华
潘青华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Medical Technology Co ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201210580290.2A priority Critical patent/CN103065633B/en
Publication of CN103065633A publication Critical patent/CN103065633A/en
Application granted granted Critical
Publication of CN103065633B publication Critical patent/CN103065633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method for optimizing the decoding efficiency of voice recognition, which is realized by the following steps: for every three frames of voice feature vectors, performing Viterbi dynamic programming in arcs, wherein at most three scores and corresponding paths can be output on each arc, and the three scores and paths respectively correspond to the output of three continuous different frames; according to a Viterbi algorithm, the three scores and the corresponding paths are transmitted to subsequent nodes of the arc for competition; reserving a winner on the node, and continuing to expand to a subsequent arc of the node when the next three frames arrive; for the last frame of voice feature vector, the path which is transmitted to the last node of the decoding network and is won is the optimal path; and backtracking the optimal path to obtain a corresponding word sequence, namely an identification result. The invention saves the memory access amount in the identification process and improves the efficiency of the whole system by adopting the frame semi-synchronization method with optimized efficiency.

Description

A kind of speech recognition decoder efficiency optimization method
Technical field
The present invention relates to and a kind of in Continuous Speech Recognition System, carry out speech recognition decoder efficiency optimization method, for promoting concurrent way based on the speech recognition system of cloud computing and recognition speed.
Background technology
Universal along with speech voice input function on the intelligent terminals such as mobile phone and application, user uses the scene of phonetic entry to get more and more on the intelligent terminals such as mobile phone.And mostly these application scenarioss are to carry out based on cloud computing, intelligent terminal is responsible for recording and Audio data compression, and the identified server then data being sent to high in the clouds identifies, recognition result returns to intelligent terminal again.For the speech recognition system based on cloud computing, if concurrent way and the recognition speed of separate unit identified server can be promoted, the identified server of equal number can support the use of more users simultaneously, thus can save a large amount of hardware cost for whole cloud computing platform.But, in order to promote speech recognition effect, often training language model in large scale and acoustic model, loading by the decoding network of these model constructions the internal memory getting up usually to need tens G.Speech recognition process needs to inquire about in the internal memory of tens G continually, and particularly when multipath concurrence, the bandwidth that internal memory reads can become the bottleneck of system for restricting efficiency (concurrent way and recognition speed).
Current Continuous Speech Recognition System as shown in Figure 1, comprises following several part: end-point detection, feature extraction, decoding and result export.In several modules of Continuous Speech Recognition System, decoder module calculated amount accounting maximum (accounting for more than 80%), internal memory reads also the most frequent, and being the most critical module affecting whole system efficiency (concurrent way and recognition speed), is also need most the nucleus module carrying out efficiency optimization.
Current decoding scheme is decoded based on the Viterbi of frame synchronization.First the semantic network of language model is extended to search network based on model state layer by acoustic model by system, and its schematic diagram as shown in Figure 2.This based on the search network of state node in all acoustic model states repeated arrangement in chronological order, make the status Bar of each time point all correspond to a frame speech characteristic vector.During search, calculate the cumulative path probability of each row state node relative to input speech frame respectively.When searching last frame voice, the state node with cumulative maximum probability is optimum node, by just obtaining optimum decoding status switch from the backtracking of this node executing state, thus obtains corresponding word sequence.
An actual decoding network is as shown in Figure 3: wherein, each red point represents a node in decoding network, and each rectangle represents an arc in decoding network, and each arc comprises 3 states, the state in this state corresponding diagram 2.Concrete algorithm flow is as follows: (1), for each frame speech characteristic vector, first carries out dynamic programming in arc, each arc can export at most a score and corresponding path; (2) according to Viterbi algorithm, this score and path are delivered in this arc subsequent node and are at war with, and retain winner; (3) remain into the winner on node, continue when next frame arrives to expand to this node follow-up go out arc get on; (4) for last frame speech characteristic vector, last node of decoding network (Final) is delivered to and the path of winning is optimal path; (5) recall optimal path, corresponding word sequence can be obtained, be recognition result.
For existing decoding technique, time each frame feature vector arrives, the node on decoding network all to access its all go out arc, and be delivered to follow-up arc get on this node being competed the score of winning and corresponding Viterbi path.For the Continuous Speech Recognition System particularly based on speech cloud, its decoding network can take the internal memory of tens G, namely the arc that goes out of access node represents and will access it and go out all internal memories corresponding to arc, when multipath concurrence (namely multiple user uses same identified server to use the service of identification simultaneously), the node of simultaneously accessing the internal memory of diverse location has hundreds of thousands or even up to a million, and internal storage access huge is like this kind of challenge for the memory bandwidth of the server of current mainstream configuration.Because memory bandwidth is not enough, causes wait during internal storage access, thus have impact on the recognition speed of whole recognition system.
Summary of the invention
The technology of the present invention is dealt with problems: overcome the deficiencies in the prior art, a kind of speech recognition decoder efficiency optimization method is provided, in the decoding network of large internal memory is decoded, internal storage access number of times can be reduced, avoid the bottleneck of memory bandwidth deficiency, thus optimize the recognition efficiency of Continuous Speech Recognition System.
The technology of the present invention solution: a kind of speech recognition decoder efficiency optimization method, its feature is: compared with traditional frame synchronization decoding algorithm, maximum difference is: be not that each frame speech characteristic vector all will carry out Viterbi, but every three frames carry out a Viterbi, be called the decoding algorithm that frame half is synchronous, its realization flow is as follows:
(1) for every three frame speech characteristic vectors, first in arc, carry out Viterbi dynamic programming, each arc can export at most three scores and corresponding path, the output of three scores and corresponding three the continuous different frames of path difference;
(2) according to Viterbi algorithm, these three scores and corresponding path are delivered in the subsequent node of this arc be at war with (with score and the path competition of corresponding frame);
(3) remain into the winner on node, continue to expand to when lower three frames arrive this node follow-up go out arc get on;
(4) for last frame speech characteristic vector, last node of decoding network (Final) is delivered to and the path of winning is optimal path;
(5) recall optimal path, obtain corresponding word sequence, be recognition result.
In described step (2), competition process performing step is as follows: for each node, have one or more of arc and be attached thereto; At a time t, has one or more of arc to this node bang path (with a score, this score portrays the possibility in this path to every paths), each arc can transmit three paths to this node, respectively the path of corresponding t-2, t-1 and t; The path of the synchronization that all arcs pass over is at war with according to score, and the path that score is the highest is retained, and all the other paths are deleted.
The present invention's advantage is compared with prior art: the present invention is in speech recognition decoder process, have employed frame half synchronous method, for the decoding network of large internal memory, can effectively reduce internal storage access number of times, thus when EMS memory access bandwidth is limited, significantly can promotes the efficiency of speech recognition decoder, promote concurrent way and recognition speed, for hardware cost is saved in the speech recognition based on cloud computing, optimizing user is experienced.
Accompanying drawing explanation
Fig. 1 is Continuous Speech Recognition System schematic diagram;
Fig. 2 is the schematic diagram each arc comprising 3 states;
Fig. 3 is an actual simple decoding network;
Fig. 4 is realization flow figure of the present invention.
Embodiment
Present invention employs frame half synchronous method that (particularly based on the speech recognition of cloud computing) in a kind of speech recognition for large internal memory carries out efficiency optimization, to save the internal storage access amount in identifying, thus promote the efficiency of whole system.
Compare with traditional frame synchronization algorithm, the maximum difference of frame half synchronized algorithm is exactly that every three frames carry out a Viterbi dynamic programming algorithm, its realization flow as shown in Figure 4:
1. first carry out the planning in t+1 moment, the renewal of each state is as follows:
q t+1(2)=max[q t(1)+a 12,q t(2)+a 22]+b 2(a t+1)
q t+1(3)=max[q t(2)+a 23,q t(3)+a 88]+b 8(a t+1)
Carry out the planning in t+2 moment again, the update mode of each state is as follows:
q t+2(2)=max[q t+1(1)+a 12,q t+1(2)+a 22]+b 2(a t+2)
q t+2(3)=max[q t+1(2)+a 28,q t+1(3)+a 88]+b 8(a t+2)
Then carry out the planning in t+3 moment, the update mode of each state is as follows:
q t+8(2)=max[q t+2(1)+a 12,q t+2(2)+a 22]+b 2(a t+8)
q t+8(3)=max[q t+2(2)+a 28,q t+2(3)+a 88]+b 8(a t+a)
Wherein q tt () represents the score of i-th state when t frame; b i(a t) represent dividing of the likelihood of the corresponding j state of t frame; a ijrepresent the transition probability from i-th state to a jth state.
2., according to Viterbi algorithm, the subsequent node that these three scores and path are delivered to this arc is at war with (with score and the path competition of corresponding frame);
Competition process performing step is as follows: for each node, has one or more of arc and is attached thereto; At a time t, has one or more of arc to this node bang path (with a score, this score portrays the possibility in this path to every paths), each arc can transmit three paths to this node, respectively the path of corresponding t-2, t-1 and t; The path of the synchronization that all arcs pass over is at war with according to score, and the path that score is the highest is retained, and all the other paths are deleted.
3. remain into the winner on node, continue to expand to when lower three frames arrive this node follow-up go out arc get on;
4., for last frame speech characteristic vector, be delivered to last node of decoding network (Final) and the path of winning is optimal path;
5. recall optimal path, corresponding word sequence can be obtained, be recognition result.
As can be seen from above-mentioned flow process, compare with traditional decoding algorithm based on frame synchronization, for the node on decoding network, every three frames of frame half synchronized algorithm just transmit once backward, namely every three frames just can access once this node all go out internal memory corresponding to arc, thus make internal storage access amount be reduced to original 1/3rd, the internal memory brought due to memory bandwidth bottleneck is waited for and greatly reduces, finally can bring tremendous increase to the efficiency of whole recognition system.
Non-elaborated part of the present invention belongs to techniques well known.
The above; be only part embodiment of the present invention, but protection scope of the present invention is not limited thereto, any those skilled in the art are in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.

Claims (2)

1. a speech recognition decoder efficiency optimization method, is characterized in that performing step is as follows:
(1) for every three frame speech characteristic vectors, first in arc, carry out Viterbi dynamic programming, each arc can export at most three scores and corresponding path, the output of three scores and corresponding three the continuous different frames of path difference;
(2) according to Viterbi algorithm, these three scores and corresponding path are delivered in the subsequent node of this arc and are at war with, produce three new optimal paths down to transmit, until be delivered to last node of decoding network, produce optimal identification result, described competition refers to and the score of corresponding frame and path competition
(3) remain into the winner on node, the follow-up arc continuing to expand to this node when lower three frames arrive gets on;
(4) for last frame speech characteristic vector, last node of decoding network is delivered to and the path of winning is optimal path;
(5) recall optimal path, obtain corresponding word sequence, be recognition result.
2. speech recognition decoder efficiency optimization method according to claim 1, is characterized in that: in described step (2), competition process performing step is as follows: for each node, have one or more of arc and be attached thereto; At a time t, have one or more of arc to this node bang path, every paths is with a score, and this score portrays the possibility in this path, each arc can transmit three paths to this node, respectively the path of corresponding t-2, t-1 and t; The path of the synchronization that all arcs pass over is at war with according to score, and the path that score is the highest is retained, and all the other paths are deleted.
CN201210580290.2A 2012-12-27 2012-12-27 Speech recognition decoding efficiency optimization method Active CN103065633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210580290.2A CN103065633B (en) 2012-12-27 2012-12-27 Speech recognition decoding efficiency optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210580290.2A CN103065633B (en) 2012-12-27 2012-12-27 Speech recognition decoding efficiency optimization method

Publications (2)

Publication Number Publication Date
CN103065633A CN103065633A (en) 2013-04-24
CN103065633B true CN103065633B (en) 2015-01-14

Family

ID=48108233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210580290.2A Active CN103065633B (en) 2012-12-27 2012-12-27 Speech recognition decoding efficiency optimization method

Country Status (1)

Country Link
CN (1) CN103065633B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529027B (en) * 2015-12-14 2019-05-31 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN111081226B (en) * 2018-10-18 2024-02-13 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200926140A (en) * 2007-12-11 2009-06-16 Inst Information Industry Method and system of generating and detecting confusion phones of pronunciation
KR101217525B1 (en) * 2008-12-22 2013-01-18 한국전자통신연구원 Viterbi decoder and method for recognizing voice
CN102436816A (en) * 2011-09-20 2012-05-02 安徽科大讯飞信息科技股份有限公司 Method and device for decoding voice data
CN102543071B (en) * 2011-12-16 2013-12-11 安徽科大讯飞信息科技股份有限公司 Voice recognition system and method used for mobile equipment

Also Published As

Publication number Publication date
CN103065633A (en) 2013-04-24

Similar Documents

Publication Publication Date Title
CN112100349B (en) Multi-round dialogue method and device, electronic equipment and storage medium
CN106951468B (en) Talk with generation method and device
CN104616655B (en) The method and apparatus of sound-groove model automatic Reconstruction
US7529671B2 (en) Block synchronous decoding
CN102543071B (en) Voice recognition system and method used for mobile equipment
CN105336324B (en) A kind of Language Identification and device
CN109741754A (en) A kind of conference voice recognition methods and system, storage medium and terminal
CN109616108A (en) More wheel dialogue interaction processing methods, device, electronic equipment and storage medium
CN109448719A (en) Establishment of Neural Model method and voice awakening method, device, medium and equipment
CN111090727B (en) Language conversion processing method and device and dialect voice interaction system
CN102710488B (en) Method for realizing virtual network mapping
CN103093755A (en) Method and system of controlling network household appliance based on terminal and Internet voice interaction
CN108389575A (en) Audio data recognition methods and system
CN109754790A (en) A kind of speech recognition system and method based on mixing acoustic model
CN109992239A (en) Voice traveling method, device, terminal and storage medium
WO2017177484A1 (en) Voice recognition-based decoding method and device
CN112466288A (en) Voice recognition method and device, electronic equipment and storage medium
CN103065633B (en) Speech recognition decoding efficiency optimization method
CN106782513B (en) Speech recognition realization method and system based on confidence level
CN109473104A (en) Speech recognition network delay optimization method and device
CN110942763A (en) Voice recognition method and device
CN102073704A (en) Text classification processing method, system and equipment
CN109450459A (en) A kind of polarization code FNSC decoder based on deep learning
CN111081254B (en) Voice recognition method and device
CN103185599B (en) A kind of vehicle-mounted end data handling system and geographic information data processing platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: IFLYTEK Co.,Ltd.

Address before: 230088 Mount Huangshan Road, hi tech Development Zone, Anhui, Hefei 616

Patentee before: ANHUI USTC IFLYTEK Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20170823

Address after: 230000, Hefei province high tech Zone, 2800 innovation Avenue, 288 innovation industry park, H2 building, room two, Anhui

Patentee after: Anhui Puji Information Technology Co.,Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee before: IFLYTEK Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 230000, Hefei province high tech Zone, 2800 innovation Avenue, 288 innovation industry park, H2 building, room two, Anhui

Patentee after: Anhui Xunfei Medical Information Technology Co.,Ltd.

Address before: 230000, Hefei province high tech Zone, 2800 innovation Avenue, 288 innovation industry park, H2 building, room two, Anhui

Patentee before: Anhui Puji Information Technology Co.,Ltd.

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 230000, Hefei province high tech Zone, 2800 innovation Avenue, 288 innovation industry park, H2 building, room two, Anhui

Patentee after: ANHUI IFLYTEK MEDICAL INFORMATION TECHNOLOGY CO.,LTD.

Address before: 230000, Hefei province high tech Zone, 2800 innovation Avenue, 288 innovation industry park, H2 building, room two, Anhui

Patentee before: Anhui Puji Information Technology Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Patentee after: Anhui Xunfei Medical Co.,Ltd.

Address before: Room 288, H2 / F, phase II, innovation industrial park, 2800 innovation Avenue, high tech Zone, Hefei, Anhui 230000

Patentee before: ANHUI IFLYTEK MEDICAL INFORMATION TECHNOLOGY CO.,LTD.

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Patentee after: IFLYTEK Medical Technology Co.,Ltd.

Address before: 230088 floor 23-24, building A5, No. 666, Wangjiang West Road, high tech Zone, Hefei, Anhui Province

Patentee before: Anhui Xunfei Medical Co.,Ltd.