CN107767873A - A kind of fast and accurately offline speech recognition equipment and method - Google Patents

A kind of fast and accurately offline speech recognition equipment and method Download PDF

Info

Publication number
CN107767873A
CN107767873A CN201710986788.1A CN201710986788A CN107767873A CN 107767873 A CN107767873 A CN 107767873A CN 201710986788 A CN201710986788 A CN 201710986788A CN 107767873 A CN107767873 A CN 107767873A
Authority
CN
China
Prior art keywords
speech recognition
module
result
fast
multithreading
Prior art date
Application number
CN201710986788.1A
Other languages
Chinese (zh)
Inventor
李锐
赖蔚蔚
黄煜坤
刘伟林
黄优哲
周聪
周一聪
郭志达
Original Assignee
广东电网有限责任公司惠州供电局
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东电网有限责任公司惠州供电局 filed Critical 广东电网有限责任公司惠州供电局
Priority to CN201710986788.1A priority Critical patent/CN107767873A/en
Publication of CN107767873A publication Critical patent/CN107767873A/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing

Abstract

The present invention relates to the technical field of speech recognition, more particularly, to a kind of fast and accurately offline speech recognition equipment and method.A kind of fast and accurately offline speech recognition equipment, wherein, the phonetic segmentation module being connected including voice activation detection module, with voice activation detection module, the multithreading speech recognition device being connected with phonetic segmentation module, described multithreading speech recognition device connection result merging module.The present invention need not increase extras or the existing computing device of upgrading, be identified parallel using the method for multithreading, make full use of existing computing resource.By the way of overlapping cutting, although sub-fraction needs repeat decoding, the identification of mistake when can avoid not weighing, the effect of identification ensure that.The present invention in the case where not increasing cost, can ensure that the recording file of overlength being capable of fast and accurate true identification.

Description

A kind of fast and accurately offline speech recognition equipment and method
Technical field
The present invention relates to the technical field of speech recognition, more particularly, to a kind of fast and accurately offline speech recognition Device and method.
Background technology
Speech recognition is a kind of technology that digital speech is converted to the word that computer is appreciated that.Recent years, language Sound identification technology obtains remarkable break-throughs, and speech recognition technology gradually enters into the life of people, and the life, work to us are brought just Profit.Speech recognition technology is in industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product etc. at present Every field starts to apply.The present invention mainly focuses on offline speech recognition(I.e. to the identification of recording file), such as meeting transcription With the discriminance analysis of phone customer service voices, explained below by taking meeting transcription and phone customer service voices as an example.
Most of momentous conference is required for that the content and result of important spokesman or participant discussion people will be recorded, and is in time Meeting is formed to summarize and preserved with digest document.Earliest using natural person as recorder, but this method wastes time and energy.And with Spinning out for the time of meeting, artificial memory can become less accurate, furthermore some meetings stenographer is inconvenient to participate in.So mesh Before gradually adopt microphone to record meeting, after the meeting using manually or automatically speech recognition technology, meeting is entered to advance Row is summarized.In recent years, it with the lifting of automatic speech recognition accuracy rate, can gradually substitute manual transcription, save human cost, Liberate mankind the mechanical work of repetition, human resources allocation optimization.In some conference scenarios, minutes are generally required first Units concerned and personnel are issued in time, this requires enough to speech recognition system quick and accurate!
Phone customer service is the scene of another natural-sounding, on the one hand unit of operation is to need to obtain client from phone customer service Demand feed back to administrative department in time, the demand to user is analyzed and self-improvement, solves the practical problem of user, carries Consumer's Experience is risen, forms positive feedback closed loop, the final service level and competitiveness for lifting company.On the other hand it is also required to collect The service quality of contact staff, it is easy to objective examination contact staff.
The duration of phone customer service and meeting is typically long.Phone customer service usually requires user in the more of contact staff Under wheel dialogue guiding, excavate the particular problem and Producing reason of customer service exactly, and finally solve problem, generally it is short then A few minutes, long then dozens of minutes.Phone customer service needs speech recognition that client rapidly and accurately is transcribed into word, to return in time Shelves.Conference scenario, in general, the important meeting duration is often very long, and recording file is all bigger(Dozens of minutes is to several Individual hour).For phone customer service and meeting, accurate recognition result, great challenge are provided in a short time.Tradition solution Certainly method has two kinds:
1. because speech recognition needs many computing resources, more preferable computer processor can be used and increase internal memory, this Sample can improve the speed of identification, reduce the identification time used.
2. recording file to be cut into some small documents at random, multithreading then is used to the small documents after these cuttings The method of parallelism recognition, finally the result after identification is stitched together again.
The defects of prior art, is that cost can be dramatically increased by improving the solution of machine configuration.The master of random cutting The place that to want problem be cutting there may be voice, causes normal voice to be cut off, influences recognition accuracy;Furthermore, it is possible to The place of cutting is Jing Yin, but in the centre of a complete words(Such as the minibreak between word), this also influences voice The accuracy rate of identification.
The content of the invention
The present invention is to overcome at least one defect described in above-mentioned prior art, there is provided a kind of fast and accurately offline voice Identification device and method so that speech recognition system is accomplished fast and accurate to the recognition result of super large recording file.
The technical scheme is that:A kind of fast and accurately offline speech recognition equipment, wherein, including voice activation inspection The phonetic segmentation module for survey module, being connected with voice activation detection module, the multithreading voice being connected with phonetic segmentation module are known Other device, described multithreading speech recognition device connection result merging module.
The present invention mainly improves second workaround:By voice activation detection algorithm, to accurately identify in recording Jing Yin place;Using the method for overlapping cutting and result merger so that identification of the speech recognition system to super large recording file As a result accomplish fast and accurate.
Using the method for described fast and accurately offline speech recognition equipment, wherein:Comprise the following steps:
S1. the signal of input is divided into voice and Jing Yin by voice activation detection module;
S2. phonetic segmentation module is according to the result of voice activation detection module, by the long phonetic segmentation of input into overlapping voice Fragment;
S3. the sound bite after cutting is admitted to multithreading speech recognition device;
S4. speech recognition device utilizes the resource of machine in the case where not increasing machine configuration as far as possible, is known using multithreading Method for distinguishing, the sound bite of input is identified as word and exported to give result merging module;
S5. result merging module suitably abandons the knowledge of redundancy according to the temporal information of cutting and the result of identification at cutting Other result, and result is merged into final output.
Compared with prior art, beneficial effect is:The present invention need not increase extras or the existing calculating of upgrading Equipment, it is identified parallel using the method for multithreading, makes full use of existing computing resource.By the way of overlapping cutting, Although sub-fraction needs repeat decoding, the identification of mistake when can avoid not weighing, the effect of identification ensure that.The present invention It in the case where not increasing cost, can ensure that the recording file of overlength being capable of fast and accurate true identification.
Brief description of the drawings
Fig. 1 is overall flow schematic diagram of the present invention.
Embodiment
Accompanying drawing being given for example only property explanation, it is impossible to be interpreted as the limitation to this patent;It is attached in order to more preferably illustrate the present embodiment Scheme some parts to have omission, zoom in or out, do not represent the size of actual product;To those skilled in the art, Some known features and its explanation may be omitted and will be understood by accompanying drawing.Being given for example only property of position relationship described in accompanying drawing Explanation, it is impossible to be interpreted as the limitation to this patent.
As shown in figure 1, a kind of fast and accurately offline speech recognition equipment, wherein, including voice activation detection module, with The phonetic segmentation module of voice activation detection module connection, the multithreading speech recognition device being connected with phonetic segmentation module, it is described Multithreading speech recognition device connection result merging module.
The present invention mainly improves second workaround:By voice activation detection algorithm, to accurately identify in recording Jing Yin place;Using the method for overlapping cutting and result merger so that identification of the speech recognition system to super large recording file As a result accomplish fast and accurate.
Using the method for described fast and accurately offline speech recognition equipment, wherein:Comprise the following steps:
S1. the signal of input is divided into voice and Jing Yin by voice activation detection module;
S2. phonetic segmentation module is according to the result of voice activation detection module, by the long phonetic segmentation of input into overlapping voice Fragment;
S3. the sound bite after cutting is admitted to multithreading speech recognition device;
S4. speech recognition device utilizes the resource of machine in the case where not increasing machine configuration as far as possible, is known using multithreading Method for distinguishing, the sound bite of input is identified as word and exported to give result merging module;
S5. result merging module suitably abandons the knowledge of redundancy according to the temporal information of cutting and the result of identification at cutting Other result, and result is merged into final output.
In the present embodiment, the quantity of speech recognition device is 4, can identify multiple voice.
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not pair The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description To make other changes in different forms.There is no necessity and possibility to exhaust all the enbodiments.It is all this All any modification, equivalent and improvement made within the spirit and principle of invention etc., should be included in the claims in the present invention Protection domain within.

Claims (3)

1. a kind of fast and accurately offline speech recognition equipment, it is characterised in that swash including voice activation detection module, with voice The phonetic segmentation module of detection module connection living, the multithreading speech recognition device being connected with phonetic segmentation module, described is multi-thread Journey speech recognition device connection result merging module.
2. utilize the method for the fast and accurately offline speech recognition equipment described in claim 1, it is characterised in that:Including following Step:
S1. the signal of input is divided into voice and Jing Yin by voice activation detection module;
S2. phonetic segmentation module is according to the result of voice activation detection module, by the long phonetic segmentation of input into overlapping voice Fragment;
S3. the sound bite after cutting is admitted to multithreading speech recognition device;
S4. speech recognition device utilizes the resource of machine in the case where not increasing machine configuration as far as possible, is known using multithreading Method for distinguishing, the sound bite of input is identified as word and exported to give result merging module;
S5. result merging module suitably abandons the knowledge of redundancy according to the temporal information of cutting and the result of identification at cutting Other result, and result is merged into final output.
CN201710986788.1A 2017-10-20 2017-10-20 A kind of fast and accurately offline speech recognition equipment and method CN107767873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710986788.1A CN107767873A (en) 2017-10-20 2017-10-20 A kind of fast and accurately offline speech recognition equipment and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710986788.1A CN107767873A (en) 2017-10-20 2017-10-20 A kind of fast and accurately offline speech recognition equipment and method

Publications (1)

Publication Number Publication Date
CN107767873A true CN107767873A (en) 2018-03-06

Family

ID=61268507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710986788.1A CN107767873A (en) 2017-10-20 2017-10-20 A kind of fast and accurately offline speech recognition equipment and method

Country Status (1)

Country Link
CN (1) CN107767873A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109727595A (en) * 2018-12-29 2019-05-07 神思电子技术股份有限公司 A kind of software design approach of speech recognition server
CN110351445A (en) * 2019-06-19 2019-10-18 成都康胜思科技有限公司 A kind of high concurrent VOIP recording service system based on intelligent sound identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199956A (en) * 2014-09-16 2014-12-10 成都博智维讯信息技术有限公司 Method for searching erp (enterprise resource planning) data voice
CN104538033A (en) * 2014-12-29 2015-04-22 江苏科技大学 Parallelized voice recognizing system based on embedded GPU system and method
CN104916283A (en) * 2015-06-11 2015-09-16 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN105827417A (en) * 2016-05-31 2016-08-03 安徽声讯信息技术有限公司 Voice quick recording device capable of performing modification at any time in conference recording

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199956A (en) * 2014-09-16 2014-12-10 成都博智维讯信息技术有限公司 Method for searching erp (enterprise resource planning) data voice
CN104538033A (en) * 2014-12-29 2015-04-22 江苏科技大学 Parallelized voice recognizing system based on embedded GPU system and method
CN104916283A (en) * 2015-06-11 2015-09-16 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN105827417A (en) * 2016-05-31 2016-08-03 安徽声讯信息技术有限公司 Voice quick recording device capable of performing modification at any time in conference recording

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109727595A (en) * 2018-12-29 2019-05-07 神思电子技术股份有限公司 A kind of software design approach of speech recognition server
CN110351445A (en) * 2019-06-19 2019-10-18 成都康胜思科技有限公司 A kind of high concurrent VOIP recording service system based on intelligent sound identification

Similar Documents

Publication Publication Date Title
US9747925B2 (en) Speaker association with a visual representation of spoken content
US10546595B2 (en) System and method for improving speech recognition accuracy using textual context
US20200105278A1 (en) Diarization using linguistic labeling
US20190333118A1 (en) Cognitive product and service rating generation via passive collection of user feedback
JP6667504B2 (en) Orphan utterance detection system and method
McLaren et al. The Speakers in the Wild (SITW) speaker recognition database.
US9304657B2 (en) Audio tagging
EP3254453B1 (en) Conference segmentation based on conversational dynamics
Albanie et al. Emotion recognition in speech using cross-modal transfer in the wild
US20180102126A1 (en) System and method for semantically exploring concepts
US9837072B2 (en) System and method for personalization of acoustic models for automatic speech recognition
US8977573B2 (en) System and method for identifying customers in social media
Heldner et al. Pauses, gaps and overlaps in conversations
US9171547B2 (en) Multi-pass speech analytics
Tur et al. The CALO meeting speech recognition and understanding system
Waibel et al. Advances in automatic meeting record creation and access
Anguera et al. Speaker diarization: A review of recent research
US9070369B2 (en) Real time generation of audio content summaries
CN107911646B (en) Method and device for sharing conference and generating conference record
EP2609588B1 (en) Speech recognition using language modelling
CN1333363C (en) Audio signal processing apparatus and audio signal processing method
CN106686339B (en) Electronic meeting intelligence
Morgan et al. The meeting project at ICSI
EP3254454B1 (en) Conference searching and playback of search results
CN101099147B (en) Dialogue supporting apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination