CN107230478A - A kind of voice information processing method and system - Google Patents

A kind of voice information processing method and system Download PDF

Info

Publication number
CN107230478A
CN107230478A CN201710302993.1A CN201710302993A CN107230478A CN 107230478 A CN107230478 A CN 107230478A CN 201710302993 A CN201710302993 A CN 201710302993A CN 107230478 A CN107230478 A CN 107230478A
Authority
CN
China
Prior art keywords
speech recognition
submodule
voice
obtains
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710302993.1A
Other languages
Chinese (zh)
Inventor
王泓喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taizhou Jiji Intellectual Property Operation Co.,Ltd.
Original Assignee
Shanghai Feixun Data Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Feixun Data Communication Technology Co Ltd filed Critical Shanghai Feixun Data Communication Technology Co Ltd
Priority to CN201710302993.1A priority Critical patent/CN107230478A/en
Publication of CN107230478A publication Critical patent/CN107230478A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations

Abstract

The invention provides a kind of voice information processing method and system, its method includes:S100 obtains the voice messaging of user;S200 intercepts the voice messaging, obtains multiple sound bites;S300 recognizes the sound bite, obtains corresponding speech recognition fragment;The S400 processing speech recognition fragment, obtains voice identification result.System includes acquisition module, obtains the voice messaging of user;Interception module, intercepts the voice messaging, obtains multiple sound bites;Identification module, recognizes the sound bite, obtains corresponding speech recognition fragment;First processing module, handles the speech recognition fragment, obtains voice identification result.The present invention carries out speech recognition during realizing voice recording, reducing user needs after the completion of voice recording, speech recognition can be carried out and export the stand-by period of sound result, shortened recording time delay while normal recognition result is not influenceed, improve user experience.

Description

A kind of voice information processing method and system
Technical field
The present invention relates to technical field of voice recognition, espespecially a kind of voice information processing method and system.
Background technology
With flourishing for the communication technology, the application of speech recognition is more and more extensive, and various network service instruments are for example micro- The meanss of communication such as letter, Tencent QQ progressively turn into one of main tool that mass-communication is linked up.Wherein, the operation of speech message is simple Property, convenience are extensively liked by user.In the intelligent terminals such as current mobile phone, computer, voice can be provided by means of communication Input, output function.
In the prior art, the identifying schemes of current speech recognition do not make consideration for identification time started length, The stand-by period of user will be longer when identification is shorter, and not only the stand-by period is veryer long but also knows for the speech recognition of user when longer It is not imperfect, have a strong impact on the use demand of user.And prior art is after voice recording terminates, then to send out recording result Deliver to sound identification module and carry out speech recognition, record length adds recognition time, causes the unnecessary stand-by period, wastes Time, influence the usage experience of user.
The content of the invention
It is an object of the invention to provide a kind of voice information processing method and system, realize and carry out language during voice recording Sound is recognized, after the completion of reduction user waiting voice is recorded.
The technical scheme that the present invention is provided is as follows:
A kind of voice information processing method, including step:S100 is periodically gathered during user recording and is recognized use The voice messaging at family, obtains speech recognition fragment;The S200 processing speech recognition fragment, obtains voice identification result.
The present invention carries out speech recognition during realizing voice recording, and reducing user needs after the completion of voice recording, Speech recognition can be carried out and export the stand-by period of sound result, when shortening recording while normal recognition result is not influenceed Prolong, improve user experience.
Further, the step S100 includes step:S110 is during user recording, according to preset collection rule The voice messaging of user is gathered, current speech segment is obtained;S120 recognizes the current speech segment according to speech recognition library, obtains To speech recognition fragment;S130 obtains next sound bite and performs step S110-130, until user terminates recording;Wherein, The default collection rule is according to the equal acquisition mode of time interval.
Further, S110 also includes step:S111 judges whether the current speech segment is blank sound bite;If It is to perform step S112;Otherwise, step S120 is performed;S112 deletes the current speech segment, and performs step S130.
Further, the step S200 includes step:S210 according to collection time sequencing, by the speech recognition piece Section is ranked up integration, obtains institute's speech recognition result.
Further, the step S200 also includes step:S220 exports the voice and known according to the time sequencing of collection Other fragment, obtains institute's speech recognition result.
The present invention also provides a kind of speech information processing system, including:Control module and processing module;The processing module Communicated to connect with the control module;The control module, periodically gathers and recognizes the language of user during user recording Message ceases, and obtains speech recognition fragment;The processing module, handles the speech recognition piece that the control module identification is obtained Section, obtains voice identification result.
Further, the control module includes:Gather submodule and identification submodule;It is described collection submodule with it is described Recognize submodule communication connection;The collection submodule, during user recording, gathers user's according to default collection rule Voice messaging, obtains current speech segment, sends the current speech segment to the identification submodule;The identification submodule Block, receives the current speech segment that the collection submodule is sent, the current speech piece is recognized according to speech recognition library Section, obtains speech recognition fragment;The collection submodule also obtains and sends next sound bite to the identification submodule, directly Terminate recording to user;The identification submodule also receives next sound bite that the collection submodule is sent, according to Speech recognition library recognizes next sound bite, obtains speech recognition fragment, until user terminates recording;Wherein, it is described pre- If collection rule is according to the equal acquisition mode of time interval.
Further, the control module also includes:Judging submodule and deletion submodule, the judging submodule difference Communicated to connect with the collection submodule, the deletion submodule and the identification submodule;The judging submodule, judges institute Whether state current speech segment is blank sound bite;Judge the current speech segment for blank sound bite if so, sending Result to the deletion submodule;Otherwise, send judge the current speech segment not for blank sound bite result extremely The identification submodule;The deletion submodule, receives the judged result that the judging submodule is sent, and deletes the current language Tablet section.
Further, the processing module includes:Sorting sub-module;The sorting sub-module communicates with the control module Connection;The sorting sub-module, according to the time sequencing of collection, is ranked up integration by the speech recognition fragment, obtains institute Speech recognition result.
Further, the processing module also includes:Output sub-module, the output sub-module leads to the control module Letter connection;The output sub-module, according to the time sequencing of collection, exports the speech recognition fragment, obtains the voice and knows Other result
A kind of voice information processing method and system provided by the present invention, can bring the following beneficial effect of at least one Really:
1st, the present invention is during recording, and the sound bite that collection recording is obtained carries out speech recognition, compared to traditional language Sound identification method, processing voice identification result faster, reduces the time of the typing of user's waiting voice and speech recognition.
2nd, according to fifo queue, (FIFO is First Input First Output abbreviation, FIFO team to the present invention Row, this is a kind of traditional sequentially execution method, and the instruction being introduced into first is completed and retired from office, and and then just performs Article 2 instruction. A kind of data buffer of first in first out) carry out acquisition voice messaging, and by fifo queue carry out speech recognition, for compared with Prolonged Recording Process can not only efficiently reduce the stand-by period of voice recording and speech recognition, can also make complete Speech recognition.
3rd, the present invention carries out speech recognition during realizing voice recording, and solving user needs after the completion of voice recording, The problem of speech recognition being carried out.
4th, the present invention shortens recording time delay while normal recognition result is not influenceed, and improves user experience.
5th, the present invention can delete invalid voice fragment, help user more rapidly to carry out speech recognition.
Brief description of the drawings
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of speech signal analysis side Above-mentioned characteristic, technical characteristic, advantage and its implementation of method and system are further described.
Fig. 1 is a kind of flow chart of one embodiment of voice information processing method of the invention;
Fig. 2 is a kind of flow chart of another embodiment of voice information processing method of the invention;
Fig. 3 is a kind of flow chart of another embodiment of voice information processing method of the invention;
Fig. 4 is a kind of flow chart of another embodiment of voice information processing method of the invention;
Fig. 5 is a kind of structural representation of one embodiment of speech information processing system of the invention;
Fig. 6 is a kind of structural representation of another embodiment of speech information processing system of the invention;
Fig. 7 is a kind of structural representation of another embodiment of speech information processing system of the invention;
Fig. 8 is a kind of structural representation of another embodiment of speech information processing system of the invention;
Fig. 9 is a kind of flow chart of an example of voice information processing method of the invention.
Embodiment
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, control is illustrated below The embodiment of the present invention.It should be evident that drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing, and obtain other embodiments.
To make only to schematically show part related to the present invention in simplified form, each figure, they are not represented Its as product practical structures.In addition, so that simplified form is readily appreciated, there is identical structure or function in some figures Part, only symbolically depicts one of those, or has only marked one of those.Herein, " one " is not only represented " only this ", can also represent the situation of " more than one ".
With reference to shown in Fig. 1, the present invention provides a kind of one embodiment of voice information processing method, including:
S110 is periodically gathered during user recording and is recognized the voice messaging of user, obtains speech recognition fragment;
The S120 processing speech recognition fragment, obtains voice identification result.
In the embodiment of the present invention, realize and speech recognition is carried out during voice recording, reducing user needs in voice recording After the completion of, speech recognition can be carried out and export the stand-by period of sound result, while normal recognition result is not influenceed Shorten recording time delay, improve user experience.
With reference to shown in Fig. 2, the present invention provides a kind of another embodiment of voice information processing method, including:
S210 gathers the voice messaging of user according to preset collection rule during user recording, obtains current language Tablet section;
S220 recognizes the current speech segment according to speech recognition library, obtains speech recognition fragment;
S230 obtains next sound bite and performs step S210-230, until user terminates recording;
The speech recognition fragment is ranked up integration by S240 according to the time sequencing of collection, is obtained the voice and is known Other result.
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
In the embodiment of the present invention, the foundation of specific speech recognition library, prior art has a lot, herein not fine explanation. During recording, the sound bite that collection recording is obtained carries out speech recognition, compared to traditional voice identification method, handles language Sound recognition result faster, reduces the time of the typing of user's waiting voice and speech recognition.Acquisition voice is carried out according to fifo queue Information, and speech recognition is carried out by fifo queue, for shorter recording, sound identification module, which is needed not wait for, reaches voice Recognition time could carry out speech recognition after starting, it is to avoid the increase unnecessary stand-by period, for the recording of long period Journey can not only efficiently reduce the stand-by period of voice recording and speech recognition, can also make complete speech recognition.With Family can set default collection rule according to oneself hobby, demand.Avoid causing the unnecessary stand-by period, during saving Between lifted user usage experience.Acquisition voice messaging is carried out according to fifo queue, and speech recognition is carried out by fifo queue, Recording Process for the long period can not only efficiently reduce the stand-by period of voice recording and speech recognition, can also do Go out complete speech recognition.For example, user's first sets collection rule to be to carry out interception voice messaging per 1S in Recording Process, that User starts after recording, and the collection rule set according to user's first collects first 1S sound bite Y1, second 1S n-th of 1S of sound bite Y2 ... ... sound bite Yn, then after sound bite Y1 is collected, pass through voice Identification module carries out speech recognition, obtains speech recognition fragment S1, obtains after sound bite Y2, entered by sound identification module Row speech recognition, obtains speech recognition fragment S2, the like, during recording, once collection obtains corresponding voice Speech recognition can be just carried out after fragment immediately and obtains corresponding speech recognition fragment, speech recognition fragment is saved, Sequencing arrangement is carried out according to the time order and function order of acquisition, then almost complete language is instantly obtained after End of Tape Sound recognition result, lifts the efficiency of speech recognition.
Technology in the embodiment of the present invention can be applied to be controlled including indoor equipment, in terms of voice dialogue robot, By carrying out the function of speech recognition during voice recording in recording, solving user needs after the completion of voice recording, The problem of speech recognition being carried out, and shortening recording time delay while normal recognition result is not influenceed, and user Voice command is quickly converted into voice recognition commands and inputted to intelligent home device, intelligent robot, so as to more facilitate fast The voice recognition commands promptly obtained according to identification control intelligent home device, intelligent robot, without user with hand come Operation, voice operating is more rapid compared to manually operated, improves user experience.So avoid to do shopping such as Taobao Platform, causes user to prefer to manual service of transferring due to the inefficiency of speech recognition, improves the utilization rate of speech recognition, The wasting of resources of voice service is reduced, the workload of human customer is reduced, labour cost is reduced.The embodiment of the present invention can also be applied In speech searching system, such as Baidu's phonetic search is a kind of brand-new search pattern, and user can use voice to say search Intention, such as saying " weather will be how tomorrow ", " way of Spicy diced chicken with peanuts ", user during speaking, just can side obtain Take family information of speaking and just carry out speech recognition, the embodiment of the present invention can be instantly obtained desired result, output character version The phonetic search such as " how is weather tomorrow ", " way of Spicy diced chicken with peanuts " allow user to remove the cumbersome of typewriting from, make the whole mistake of search Journey is more smooth, more convenient.
With reference to shown in Fig. 3, the present invention provides a kind of another embodiment of voice information processing method, including:
S310 gathers the voice messaging of user according to preset collection rule during user recording, obtains current language Tablet section;
S320 recognizes the current speech segment according to speech recognition library, obtains speech recognition fragment;
S330 exports the speech recognition fragment, obtains institute's speech recognition result according to the time sequencing of collection;
S340 obtains next sound bite and performs step S310-330, until user terminates recording.
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
The embodiment of the present invention, during recording, the sound bite that collection recording is obtained carries out speech recognition, handles language Sound identification is fast, reduces period of reservation of number.Acquisition voice messaging is carried out according to fifo queue, and voice is carried out by fifo queue Identification, the Recording Process for the long period can not only efficiently reduce the stand-by period of voice recording and speech recognition, Complete speech recognition can be made.Such as general speech recognition effective time is 30S, the record if user's second is spoken without a break Sound recorded 60S, because recording time is long, and it is long not only result in the recording stand-by period, and because voice messaging is long, leads Sound identification module is caused intactly to identify the recording substance of user's second.
The embodiment of the present invention can also be applied and phonetic dialing, Voice Navigation, dictation data inputting etc. field.For example, listening Write in Data Input Process, user side speech utterance identification module just exports the content that user speaks in typing column at once, tool Body starts after recording, and the collection rule set according to user's second collects first 0.5S sound bite X1, second 0.5S n-th of 0.5S of sound bite X2 ... ... sound bite Xn, then after sound bite X1 is collected, pass through Sound identification module carries out speech recognition, obtains speech recognition fragment B1, the like.During recording, once collection Speech recognition can just be carried out immediately and obtain corresponding speech recognition fragment by obtaining after corresponding sound bite, according to collection Time sequencing, exports the speech recognition fragment, obtains institute's speech recognition result.If user's second finds the word on typing column Which part has different from the content that oneself is spoken, and the part of the wrong identification can also be found out according to time sequencing, carries out Re-recognize.
With reference to shown in Fig. 4, the present invention provides a kind of another embodiment of voice information processing method, including:
S410 gathers the voice messaging of user according to preset collection rule during user recording, obtains current language Tablet section;
S420 judges whether the current speech segment is blank sound bite;If so, performing step S430;Otherwise, hold Row step S440;
S430 deletes the current speech segment, and performs step S450;
S440 recognizes the current speech segment according to speech recognition library, obtains speech recognition fragment;
S450 obtains next sound bite and performs step S410-S450, until user terminates recording;
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
In the embodiment of the present invention, invalid voice fragment can be deleted, helps user more rapidly to carry out speech recognition. In preprocessing process before speech recognition, according to the skill such as sound wave change frequency during the speaking of user and sound wave change fluctuation Art can identify user speech information, and which is partly efficient voice part, and which is invalid voice part, and mark user is empty Bai Yuyin time point, and remove invalid voice partial information i.e. blank sound bite.For example assume the adopting according to 2S of user third Collection rule carry out interception user speech information, it is also assumed that user third speak beginning time point be 14:30, user is 14:33- 14:36 periods did not speak, that is, detected and the Jing Yin of 3s occur.Collection rule so according to embodiments of the present invention, 14:33- 14:The sound bite of 35 this interception is the sound bite of blank, this sound bite is marked, at this point it is possible to think The initial speech information is invalid, and sound identification module can not carry out speech recognition to it
The present embodiment by speech recognition technology by that can reduce key-press input, enhancing and the interactivity of user;By adopting With fifo queue, realize multichannel microphone and share a speech recognition engine, improve engine utilization rate.
With reference to shown in Fig. 5, the present invention provides a kind of one embodiment of speech information processing system 1000, including:Control Module and processing module;The processing module is communicated to connect with the control module;
The control module, periodically gathers and recognizes during user recording the voice messaging of user, obtain voice Recognize fragment;
The processing module, handles the speech recognition fragment that the control module identification is obtained, obtains speech recognition As a result.
In the embodiment of the present invention, realize and speech recognition is carried out during voice recording, reducing user needs in voice recording After the completion of, speech recognition can be carried out and export the stand-by period of sound result, while normal recognition result is not influenceed Shorten recording time delay, improve user experience.
With reference to shown in Fig. 6, it will not be repeated here with upper one embodiment identical part.The present invention provides a kind of voice letter Another embodiment of processing system 1000 is ceased, including:The control module includes:Gather submodule and identification submodule;Institute State collection submodule and the identification submodule communication connection;The processing module includes:Sorting sub-module;The sequence submodule Block is communicated to connect with the control module;
The collection submodule, during user recording, the voice messaging of user is gathered according to default collection rule, is obtained Current speech segment is obtained, the current speech segment is sent to the identification submodule;
The identification submodule, receives the current speech segment that the collection submodule is sent, according to speech recognition Storehouse recognizes the current speech segment, obtains speech recognition fragment;
The collection submodule also obtains and sends next sound bite to the identification submodule, until user terminates record Sound;
The identification submodule also receives next sound bite that the collection submodule is sent, according to speech recognition Storehouse recognizes next sound bite, obtains speech recognition fragment, until user terminates recording;
The sorting sub-module, according to the time sequencing of collection, is ranked up integration by the speech recognition fragment, obtains Institute's speech recognition result;
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
In the embodiment of the present invention, the foundation of specific speech recognition library, prior art has a lot, herein not fine explanation. During recording, the sound bite that collection recording is obtained carries out speech recognition, compared to traditional voice identification method, handles language Sound recognition result faster, reduces the time of the typing of user's waiting voice and speech recognition.Acquisition voice is carried out according to fifo queue Information, and speech recognition is carried out by fifo queue, the Recording Process for the long period can not only efficiently reduce voice Recording and the stand-by period of speech recognition, complete speech recognition can also be made.User can be according to oneself hobby, demand To set default collection rule.Avoid causing the unnecessary stand-by period, the usage experience that the time of saving lifts user.According to Fifo queue carry out acquisition voice messaging, and by fifo queue carry out speech recognition, for the long period Recording Process not The stand-by period of voice recording and speech recognition can be only efficiently reduced, complete speech recognition can also be made.The present invention Technology in embodiment can be applied to be controlled including indoor equipment, in terms of voice dialogue robot, passes through voice recording mistake The function of speech recognition is carried out in journey in recording, solving user needs after the completion of voice recording, can carry out voice knowledge Other problem, and while normal recognition result is not influenceed shorten recording time delay, and user voice command promptly It is converted into voice recognition commands to input to intelligent home device, intelligent robot, so that more conveniently according to recognizing Voice recognition commands control intelligent home device, the intelligent robot arrived, is operated, voice operating phase without user with hand It is more rapider than manually operated, improve user experience.Specific example is shown in corresponding method embodiment.Realize voice recording process Middle carry out speech recognition, reducing user needs after the completion of voice recording, can carry out speech recognition and export sound result Stand-by period, while normal recognition result is not influenceed shorten recording time delay, improve user experience.
With reference to shown in Fig. 7, it will not be repeated here with upper one embodiment identical part.The present invention provides a kind of voice letter Another embodiment of processing system 1000 is ceased, including:The processing module also includes:Output sub-module, the output submodule Block is communicated to connect with the control module;
The output sub-module, according to the time sequencing of collection, exports the speech recognition fragment, obtains the voice and knows Other result.
Specifically, the present embodiment is during recording, it can enter immediately once collection is obtained after corresponding sound bite Row speech recognition obtains corresponding speech recognition fragment, according to the time sequencing of collection, exports the speech recognition fragment, Obtain institute's speech recognition result.If user's second finds which the word segment on typing column has different from the content that oneself is spoken , because acquisition time is regular, the sound bite can be found according to the time sequencing of collection and re-start identification, greatly Big lifting user experience.Realize and speech recognition is carried out during voice recording, reducing user needs to complete in voice recording Afterwards, speech recognition can be carried out and exports the stand-by period of sound result, shortened while normal recognition result is not influenceed Recording time delay, improves user experience.
With reference to shown in Fig. 8, the present invention provides a kind of another embodiment of speech information processing system 1000, including:Institute Stating control module includes:Gather submodule, identification submodule, judging submodule and delete submodule;The judging submodule point Do not communicated to connect with the collection submodule, the deletion submodule and the identification submodule;
The collection submodule, during user recording, the voice messaging of user is gathered according to default collection rule, is obtained Current speech segment is obtained, the current speech segment is sent to the judging submodule;
Whether the judging submodule, it is blank sound bite to judge the current speech segment;Judge institute if so, sending Result that current speech segment is blank sound bite is stated to the deletion submodule;Otherwise, send and judge the current speech Fragment is not the result of blank sound bite to the identification submodule;
The deletion submodule, receives the judged result that the judging submodule is sent, and deletes the current speech segment;
The identification submodule, receives the current speech segment that the collection submodule is sent, according to speech recognition Storehouse recognizes the current speech segment, obtains speech recognition fragment;
The collection submodule also obtains and sends next sound bite to the judging submodule, until user terminates record Sound;
The identification submodule also receives next sound bite that the collection submodule is sent, according to speech recognition Storehouse recognizes next sound bite, obtains speech recognition fragment, until user terminates recording.
In the embodiment of the present invention, invalid voice fragment can be deleted, helps user more rapidly to carry out speech recognition. In preprocessing process before speech recognition, according to the skill such as sound wave change frequency during the speaking of user and sound wave change fluctuation Art can identify user speech information, and which is partly efficient voice part, and which is invalid voice part, and removes invalid Phonological component information is blank sound bite.Realize and speech recognition is carried out during voice recording, reducing user needs in voice After the completion of recording, speech recognition can be carried out and export the stand-by period of sound result, not influence normal recognition result Shorten recording time delay simultaneously, improve user experience.
With reference to shown in Fig. 9, the present invention provides an a kind of example of voice information processing method, including:
1st, recording starts.
2nd, recording module is kept in Recording Process, is intercepted successively for 2S/ times.
3rd, file is intercepted.
4th, recording result is sent to sound identification module and carries out voice dictation.
5th, voice dictation result is put into fifo queue.
6th, semantics recognition module constantly carries out semantics recognition to the sentence in queue, and semantic analysis understands sentence.
7th, according to semantics recognition result, send command adapted thereto or answer result, so as to complete a whole set of speech recognition.
In the embodiment of the present invention, it is not special case that 2S/ times, which carries out interception, can be set according to the hobby and demand of user Put the temporal frequency of interception.Realize and speech recognition carried out during voice recording, reducing user needs after the completion of voice recording, Speech recognition can be carried out and export the stand-by period of sound result, shorten recording while normal recognition result is not influenceed Time delay, improves user experience.By using FIFO fifo queues, realize multichannel microphone and share a speech recognition Engine, improves engine utilization rate.Reduce for shorter recording, sound identification module, which is needed not wait for, reaches the speech recognition time It could carry out speech recognition after beginning, reduce the stand-by period of speech recognition, the Recording Process for the long period not only can be with The stand-by period of voice recording and speech recognition is efficiently reduced, complete speech recognition can also be made.This programme is in recording Time uses two second time, is once recorded within every two seconds, and recording result then is sent into sound identification module is identified, It is put into after recognition result in fifo queues, so continuous recording result is all in queue, then in semantics recognition module to splicing Sentence is identified, so as to reach the effect of Rapid Speech identification.Realize and speech recognition is carried out during voice recording, reduce and use Family is needed after the completion of voice recording, can be carried out speech recognition and be exported the stand-by period of sound result, not influence just Shorten recording time delay while normal recognition result, improve user experience.
It should be noted that above-described embodiment can independent assortment as needed.Described above is only the preferred of the present invention Embodiment, it is noted that for those skilled in the art, is not departing from the premise of the principle of the invention Under, some improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of voice information processing method, it is characterised in that including step:
S100 is periodically gathered during user recording and is recognized the voice messaging of user, obtains speech recognition fragment;
The S200 processing speech recognition fragment, obtains voice identification result.
2. a kind of voice information processing method according to claim 1, it is characterised in that the step S100 includes step Suddenly:
S110 gathers the voice messaging of user according to preset collection rule during user recording, obtains current speech piece Section;
S120 recognizes the current speech segment according to speech recognition library, obtains speech recognition fragment;
S130 obtains next sound bite and performs step S110-130, until user terminates recording;
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
3. a kind of voice information processing method according to claim 2, it is characterised in that the step S110 also includes step Suddenly:
S111 judges whether the current speech segment is blank sound bite;If so, performing step S112;Otherwise, step is performed Rapid S120;
S112 deletes the current speech segment, and performs step S130.
4. a kind of voice information processing method according to claim 1, it is characterised in that the step S200 includes step Suddenly:
The speech recognition fragment is ranked up integration by S210 according to the time sequencing of collection, obtains the speech recognition knot Really.
5. a kind of voice information processing method according to claim any one of 1-4, it is characterised in that the step S200 Also include step:
S220 exports the speech recognition fragment, obtains institute's speech recognition result according to the time sequencing of collection.
6. a kind of speech information processing system, it is characterised in that including:Control module and processing module;The processing module with The control module communication connection;
The control module, periodically gathers and recognizes during user recording the voice messaging of user, obtain speech recognition Fragment;
The processing module, handles the speech recognition fragment that the control module identification is obtained, obtains voice identification result.
7. speech information processing system according to claim 6, it is characterised in that the control module includes:Collection Module and identification submodule;The collection submodule and the identification submodule communication connection;
The collection submodule, during user recording, the voice messaging of user is gathered according to default collection rule, is worked as Preceding sound bite, sends the current speech segment to the identification submodule;
The identification submodule, receives the current speech segment that the collection submodule is sent, is known according to speech recognition library Not described current speech segment, obtains speech recognition fragment;
The collection submodule also obtains and sends next sound bite to the identification submodule, until user terminates recording;
The identification submodule also receives next sound bite that the collection submodule is sent, and is known according to speech recognition library Not described next sound bite, obtains speech recognition fragment, until user terminates recording;
Wherein, the default collection rule is according to the equal acquisition mode of time interval.
8. speech information processing system according to claim 7, it is characterised in that the control module also includes:Judge Submodule and deletion submodule, the judging submodule gather submodule, the deletion submodule and the knowledge with described respectively Small pin for the case module is communicated to connect;
Whether the judging submodule, it is blank sound bite to judge the current speech segment;Judge described work as if so, sending Preceding sound bite for blank sound bite result to the deletion submodule;Otherwise, send and judge the current speech segment It is not the result of blank sound bite to the identification submodule;
The deletion submodule, receives the judged result that the judging submodule is sent, and deletes the current speech segment.
9. speech information processing system according to claim 7, it is characterised in that the processing module includes:Sequence Module;The sorting sub-module is communicated to connect with the control module;
The sorting sub-module, according to the time sequencing of collection, integration is ranked up by the speech recognition fragment, obtains described Voice identification result.
10. the speech information processing system according to claim any one of 6-9, it is characterised in that the processing module is also Including:Output sub-module, the output sub-module is communicated to connect with the control module;
The output sub-module, according to the time sequencing of collection, exports the speech recognition fragment, obtains the speech recognition knot Really.
CN201710302993.1A 2017-05-03 2017-05-03 A kind of voice information processing method and system Pending CN107230478A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710302993.1A CN107230478A (en) 2017-05-03 2017-05-03 A kind of voice information processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710302993.1A CN107230478A (en) 2017-05-03 2017-05-03 A kind of voice information processing method and system

Publications (1)

Publication Number Publication Date
CN107230478A true CN107230478A (en) 2017-10-03

Family

ID=59933174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710302993.1A Pending CN107230478A (en) 2017-05-03 2017-05-03 A kind of voice information processing method and system

Country Status (1)

Country Link
CN (1) CN107230478A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784591A (en) * 2019-09-25 2020-02-11 福建新大陆软件工程有限公司 Intelligent voice automatic detection method, device and system
CN110797030A (en) * 2019-10-24 2020-02-14 秒针信息技术有限公司 Method and system for working hour statistics based on voice recognition
CN111508531A (en) * 2020-04-23 2020-08-07 维沃移动通信有限公司 Audio processing method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920947A (en) * 2006-09-15 2007-02-28 清华大学 Voice/music detector for audio frequency coding with low bit ratio
CN101588415A (en) * 2009-06-29 2009-11-25 中国农业大学 Voice service method and voice service system
CN101593520A (en) * 2008-05-27 2009-12-02 北京凌声芯语音科技有限公司 The implementation method that high-performance speech recognition coprocessor and association thereof handle
CN101848277A (en) * 2010-04-23 2010-09-29 中兴通讯股份有限公司 Mobile terminal and method for storing conversation contents in real time
CN102118886A (en) * 2010-01-04 2011-07-06 中国移动通信集团公司 Recognition method of voice information and equipment
CN102360187A (en) * 2011-05-25 2012-02-22 吉林大学 Chinese speech control system and method with mutually interrelated spectrograms for driver
CN102376305A (en) * 2011-11-29 2012-03-14 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN103366742A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input method and system
CN104157301A (en) * 2014-07-25 2014-11-19 广州三星通信技术研究有限公司 Method, device and terminal deleting voice information blank segment
CN104412323A (en) * 2012-06-25 2015-03-11 三菱电机株式会社 On-board information device
CN104769670A (en) * 2012-09-06 2015-07-08 萨热姆通信宽带简易股份有限公司 Device and method for supplying a reference audio signal to an acoustic processing unit
CN105653729A (en) * 2016-01-28 2016-06-08 努比亚技术有限公司 Device and method for indexing sound recording file
CN105702257A (en) * 2015-08-12 2016-06-22 乐视致新电子科技(天津)有限公司 Speech processing method and device
CN105989836A (en) * 2015-03-06 2016-10-05 腾讯科技(深圳)有限公司 Voice acquisition method, device and terminal equipment
CN106019592A (en) * 2016-07-15 2016-10-12 中国人民解放军63908部队 Augmented reality optical transmission-type helmet mounted display pre-circuit and control method thereof

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920947A (en) * 2006-09-15 2007-02-28 清华大学 Voice/music detector for audio frequency coding with low bit ratio
CN101593520A (en) * 2008-05-27 2009-12-02 北京凌声芯语音科技有限公司 The implementation method that high-performance speech recognition coprocessor and association thereof handle
CN101588415A (en) * 2009-06-29 2009-11-25 中国农业大学 Voice service method and voice service system
CN102118886A (en) * 2010-01-04 2011-07-06 中国移动通信集团公司 Recognition method of voice information and equipment
CN101848277A (en) * 2010-04-23 2010-09-29 中兴通讯股份有限公司 Mobile terminal and method for storing conversation contents in real time
CN102360187A (en) * 2011-05-25 2012-02-22 吉林大学 Chinese speech control system and method with mutually interrelated spectrograms for driver
CN102376305A (en) * 2011-11-29 2012-03-14 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN103366742A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input method and system
CN104412323A (en) * 2012-06-25 2015-03-11 三菱电机株式会社 On-board information device
CN104769670A (en) * 2012-09-06 2015-07-08 萨热姆通信宽带简易股份有限公司 Device and method for supplying a reference audio signal to an acoustic processing unit
CN104157301A (en) * 2014-07-25 2014-11-19 广州三星通信技术研究有限公司 Method, device and terminal deleting voice information blank segment
CN105989836A (en) * 2015-03-06 2016-10-05 腾讯科技(深圳)有限公司 Voice acquisition method, device and terminal equipment
CN105702257A (en) * 2015-08-12 2016-06-22 乐视致新电子科技(天津)有限公司 Speech processing method and device
CN105653729A (en) * 2016-01-28 2016-06-08 努比亚技术有限公司 Device and method for indexing sound recording file
CN106019592A (en) * 2016-07-15 2016-10-12 中国人民解放军63908部队 Augmented reality optical transmission-type helmet mounted display pre-circuit and control method thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784591A (en) * 2019-09-25 2020-02-11 福建新大陆软件工程有限公司 Intelligent voice automatic detection method, device and system
CN110797030A (en) * 2019-10-24 2020-02-14 秒针信息技术有限公司 Method and system for working hour statistics based on voice recognition
CN110797030B (en) * 2019-10-24 2022-06-07 上海明胜品智人工智能科技有限公司 Method and system for working hour statistics based on voice recognition
CN111508531A (en) * 2020-04-23 2020-08-07 维沃移动通信有限公司 Audio processing method and device

Similar Documents

Publication Publication Date Title
CN110096191B (en) Man-machine conversation method and device and electronic equipment
CN110049270B (en) Multi-person conference voice transcription method, device, system, equipment and storage medium
CN103440867B (en) Audio recognition method and system
CN109584876A (en) Processing method, device and the voice air conditioner of voice data
CN110689876B (en) Voice recognition method and device, electronic equipment and storage medium
CN106847285B (en) Robot and voice recognition method thereof
JP2002169588A (en) Text display device, text display control method, storage medium, program transmission device, and reception supporting method
JP2006146881A (en) Dialoguing rational agent, intelligent dialoguing system using this agent, method of controlling intelligent dialogue, and program for using it
CN107230478A (en) A kind of voice information processing method and system
CN108874904A (en) Speech message searching method, device, computer equipment and storage medium
CN102292766A (en) Method, apparatus and computer program product for providing compound models for speech recognition adaptation
CN106407198A (en) Question and answer information processing method and device
CN108882101B (en) Playing control method, device, equipment and storage medium of intelligent sound box
WO2017128775A1 (en) Voice control system, voice processing method and terminal device
CN110992955A (en) Voice operation method, device, equipment and storage medium of intelligent equipment
CN110377908A (en) Semantic understanding method, apparatus, equipment and readable storage medium storing program for executing
CN110995943B (en) Multi-user streaming voice recognition method, system, device and medium
CN112866086B (en) Information pushing method, device, equipment and storage medium for intelligent outbound
CN109992239A (en) Voice traveling method, device, terminal and storage medium
CN106601242A (en) Executing method and device of operation event and terminal
JP2021140134A (en) Method, device, electronic apparatus, computer readable storage medium, and computer program for recognizing speech
US20040042591A1 (en) Method and system for the processing of voice information
CN111933149A (en) Voice interaction method, wearable device, terminal and voice interaction system
CN112015879B (en) Method and device for realizing man-machine interaction engine based on text structured management
CN106782546A (en) Audio recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201102

Address after: 318015 no.2-3167, zone a, Nonggang City, no.2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province

Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd.

Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666

Applicant before: Phicomm (Shanghai) Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171003