CN108847237A

CN108847237A - continuous speech recognition method and system

Info

Publication number: CN108847237A
Application number: CN201810847817.0A
Authority: CN
Inventors: 潘晓明
Original assignee: Chongqing Pomelo Technology Co Ltd
Current assignee: Chongqing Pomelo Technology Co Ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2018-11-20

Abstract

The present invention relates to technical field of voice recognition, more particularly to a kind of continuous speech recognition method and system, Continuous Speech Recognition System, including speech signal analysis module and speech analysis module, speech signal analysis module is used to the voice messaging that user inputs intermittently being successively arranged in voice sequence according to voice messaging acquisition time on time list, then the corresponding voice sequence in the very short part of user is subjected to editing, the voice messaging after editing is combined into continuous voice messaging and is sent to the progress speech recognition of speech analysis module and parsing.This programme is not only suitable for can be carried out the user's use continuously spoken, and the user that can not be continuously spoken also is suitble to use.

Description

Continuous speech recognition method and system

Technical field

The present invention relates to technical field of voice recognition, and in particular to a kind of continuous speech recognition method and system.

Background technique

With the development of voice technology, automatic speech recognition technology has been widely used in the every field of life, Voice is changed into text and greatly facilitates people's lives demand, session recording is such as changed into text as meeting summary and is sent to Personnel participating in the meeting；The recording of interview is changed into text, compiles news release etc. on this basis.But turn by voice messaging Usually occur the presence of the place of mistake in the text converted when changing text into.

In order to convert speech information into text information, the Chinese patent document of Publication No. CN107305541A is disclosed A kind of speech recognition text segmentation method and device, this method include：End-point detection is carried out to voice data, obtains each voice segments And the beginning frame number of each voice segments and terminate frame number；Speech recognition is carried out to each voice segments, it is corresponding to obtain each voice segments Identify text；Extract the segmentation feature of the corresponding identification text of each voice segments；Segmentation feature and building in advance using extraction Segmented model, identification text corresponding to the voice data carries out segmentation detection, to determine position that needs are segmented；According to Segmentation testing result identification text corresponding to the voice data is segmented.Above scheme can be automatically realized to identification Text is segmented, and makes to identify that the structure of an article of text is more clear.

But above scheme is that voice collecting is carried out by way of voice segments, if user terminal is language using user The weaker child of sound organizational capacity, the stutter patient or sick and weak old man to wheeze, they have the characteristics that one it is common, be to speak It is possible and discontinuous when the time, but it is desultory, they say and may in short pause many times, and pause all do not have every time Regular, the dead time also has with short, may will be in short adopting if realizing the identification of voice segments according to the dead time Collection is divided into different voice segments, at this moment, carries out being identified as text information to each voice segments, can be due to using in the voice segments Do not finished if family and cause speech recognition at text information error rate it is higher.

Summary of the invention

The technical issues of solution of the invention, is to provide a kind of Continuous Speech Recognition System, to solve continuously to speak The high problem of user's text information error rate that voice messaging is identified as after input voice information.

Base case provided by the invention is：Continuous Speech Recognition System, including speech signal analysis module and voice solution Module is analysed, the voice messaging that speech signal analysis module is used to intermittently input user is believed on time list according to voice Breath acquisition time is successively arranged in voice sequence, the corresponding voice sequence in the very short part of user is then carried out editing, by editing Voice messaging afterwards is combined into continuous voice messaging and is sent to the progress speech recognition of speech analysis module and parsing.

The principle of the invention lies in：After the desultory input voice information of user, speech signal analysis module is by user The voice messaging inputted intermittently is successively arranged in voice sequence according to voice messaging acquisition time on time list, then The corresponding voice sequence in the very short part of user is subjected to editing, the voice messaging after editing is combined into continuous voice messaging and is sent out It gives speech analysis module and carries out speech recognition and parsing.

The advantage of the invention is that：The voice messaging of acquisition is reassembled into after continuous voice messaging and carries out voice again Parsing, avoids the voice messaging of user's a word from just stopping the acquisition of voice messaging when inputting complete not yet, that is, solves Only is carried out to part of speech information the problem of identification causes the text information error rate after identification to increase.With prior art phase Than this method is applicable not only to that the user of voice messaging input, equally, the user that can continuously speak can not be carried out intermittently Also it can be used.

It further, further include chat module and good friend's title remarks module, chat module carries out good friend's addition for user And it chats with different good friends in different chat interfaces；When user and good friend chat, the corresponding chat circle of a good friend Face；Good friend's title remarks module carries out different title remarks to different good friends for user.

By the setting of chat module, chat convenient for user and addition good friend and with good friend；It is standby by good friend's title Injection molding block searches out good friend and and good friend's expansion chat convenient for user.

It further, further include that voice continuously inputs jump module, voice continuously inputs jump module and exists simultaneously for user When switching is realized from different good friends chat between multiple chat interfaces, user speech input remarks in good friend's title remarks module After good friend's title, the chat interface of user and the good friend are opened, text of the voice messaging of at this moment user's input after parsing Information will be shown in the chat interface, be checked for user and user friend.

Due to that may not recognize between the good friend of user's addition, in the prior art, user needs in section at the same time It is interior to chat from different good friends, it just needs oneself to be manually switched to the chat interface between different good friends, existing skill The mode for switching chat interface in art is also normally the mode touched manually and realizes, may also result in chat interface when touching manually Handoff error may will will send information in the chat interface of mistake if user relatively worries, and information is caused accidentally to send, It makes troubles to using.The switching of chat interface, chat interface are realized in this programme in such a way that voice inputs good friend's title Easy switching and the problem for avoiding manual switching mistake reduce the problem of information is accidentally sent.After chat interface switching, use The voice messaging of family input will chat interface after handover carry out the text information of the corresponding parsing of the voice messaging and show, that User can quickly after a chat interface input voice information quickly and be switched to another chat interface after It is continuous to carry out voice messaging input.In addition, completing good friend's setting for illiterate old man and child, with the help of friend layman Afterwards, old man and child also can very easily chat with good friend.

It further, further include trigger button, it is continuously defeated that trigger button for user triggers voice when pinning trigger button Enter jump module start-up operation.

The setting of trigger button avoids chat when user mentions the title of another good friend when chatting with a good friend Interface jumps.

For above-mentioned Continuous Speech Recognition System, the present invention also provides a kind of continuous speech recognition method, including it is as follows Step：

S1, voice collecting：The voice messaging for needing to carry out speech recognition to user carries out continuous collecting；

S2, voice messaging arrange：Collected voice messaging is successively arranged on time list according to acquisition time Then the corresponding voice sequence of user's dwell portion is carried out editing, the voice messaging after editing is combined into company by voice sequence Continuous voice messaging；

S3, speech analysis and output：Speech recognition and parsing are carried out to the continuous speech information after combination, by voice messaging Text information after parsing carries out display output.

In step S1, continuous acquisition is carried out to the voice messaging of user's input, i.e. desultory speak of user will not Existing on the acquisition of voice messaging influences；In step S2, after the voice messaging of acquisition is reassembled into continuous voice messaging The speech analysis for carrying out step S3 again, avoids the voice messaging of user's a word from just stopping voice when inputting complete not yet The acquisition of information solves and only carries out identification to part of speech information the text information error rate after identification is caused raised to be asked Topic.Compared with prior art, this method is applicable not only to that the user of voice messaging input, equally, energy can not be carried out intermittently Enough users continuously to speak also can be used.

Further, in step S1, when carrying out continuous acquisition to the voice messaging of user's input, if user's pause duration reaches User sets preset duration and at this moment restarts to be acquired the voice messaging that user is inputting.

By user oneself set pause duration preset duration, just restart after finishing a word convenient for user into Row voice collecting avoids the too long text information for leading to parsing of voice messaging acquisition duration inputted to user from exporting the waiting time It is too long.

Detailed description of the invention

Fig. 1 is the logic diagram of Continuous Speech Recognition System in the embodiment of the present invention one；

Fig. 2 is the implementation flow chart of continuous speech recognition method in the embodiment of the present invention one；

Fig. 3 is the logic diagram of Continuous Speech Recognition System in the embodiment of the present invention three；

Fig. 4 is the implementation flow chart of continuous speech recognition method in the embodiment of the present invention three.

Specific embodiment

Embodiment one

As shown in Figure 1：A kind of Continuous Speech Recognition System, including：Including user terminal and server.Server and user Module is communicated terminal by wireless communication, and wireless communication module can select the Bluetooth communication mould of existing DX-BT18 model Block.User terminal can select robot, mobile phone or it is other can be for the portable electronic device of user.

User terminal includes：

Voice acquisition module, voice messaging when for speaking in real time to user are acquired, and will collect voice letter Breath is sent to server.

Text information output module is parsed, for receiving the text information of speech analysis module transmission, and is receiving text Text information is shown after this information.

Voice collecting interrupt module, pause duration reaches default when the input of its voice is arranged according to own situation for user The interruption of voice collecting is carried out when duration.For example, 4 seconds expression the words of pause of speaking usually have been finished, then preset duration is 4 Second, at this moment, voice acquisition module, which just starts to resurvey voice messaging, is sent to server.

Server includes：

Database, database is for all data informations in storage server.

Speech signal analysis module for receiving the voice messaging of voice acquisition module transmission, and is pressed on time list It is successively arranged in voice sequence according to acquisition time, the corresponding voice sequence of user's dwell portion is then subjected to editing, by editing Voice messaging afterwards is combined into continuous voice messaging and is sent to speech analysis module.It can be understood as unanimously inputting to user Voice messaging record, the recorded audio for part that then user does not speak carries out editing automatically, so that recorded audio Output is coherent.

Speech analysis module, for receiving the voice messaging of speech signal analysis module transmission, then by the voice messaging Being parsed into text information, (speech analysis can be used the existing voice of Iflytek limited liability company and know analytic technique progress voice Identification), and the text information after parsing is sent to text information output module.

In addition, the present embodiment also discloses a kind of continuous speech recognition as shown in Fig. 2, being directed to Continuous Speech Recognition System Method includes the following steps：

S1, voice collecting

Voice messaging when voice acquisition module in real time speaks to user is acquired, and will collect voice messaging transmission To server.

S2, voice messaging arrange

After speech signal analysis module in server receives the voice messaging of voice acquisition module transmission, arranged in the time It is successively arranged in voice sequence according to acquisition time on table, the corresponding voice sequence of user's dwell portion is then subjected to editing, Voice messaging after editing is combined into continuous voice messaging and is sent to speech analysis module.

S3, speech analysis and output

Speech analysis module, for receiving the voice messaging of speech signal analysis module transmission, then by the voice messaging Being parsed into text information, (speech analysis can be used the existing voice of Iflytek limited liability company and know analytic technique progress voice Identification), and the text information after parsing is sent to text information output module.Text information output module receives voice solution After analysing the voice messaging that module is sent, display output is carried out to text information.

Embodiment two

Embodiment two and embodiment one be not, in embodiment two, in order to enable the text information after parsing can Quickly output stops user after speech signal analysis module carries out voice sequence arrangement to voice messaging on time list While the corresponding voice sequence editing in part groups of clips unify a voice messaging side to the voice messaging after combined into Row parsing, i.e., be first first resolved by the voice messaging that editing is combined, after parse after the voice messaging that is combined.In addition, can also root Quick error correction is carried out according to text information of the existing voice error correction method to parsing, for example Publication No. CN103021412A is disclosed Audio recognition method.

Embodiment three

As shown in figure 3, embodiment three and the difference of embodiment two are that in embodiment three, user terminal further includes：

Chat module is simultaneously chatted with good friend in different chat interfaces for user progress good friend addition.User and When good friend chats, a good friend corresponds to a chat interface, and user can send voice messaging and text information in chat interface, The good friend of user equally can be with sending information information and voice messaging.

Good friend's title remarks module carries out remarks for title of the user to each good friend, and a good friend is one corresponding Title.

Voice continuously inputs jump module, realizes for user's switching between multiple chat interfaces simultaneously and chats from different good friends It when, user inputs the Yong Huyu in good friend's title remarks module after good friend's title of remarks by voice acquisition module voice The chat interface of the good friend is opened, and text information of the voice messaging of at this moment user's input after parsing will be in chat circle Face is shown, is checked for user and user friend.When user needs and other good friends chat, it is good that voice inputs another The title of friend, chat interface just jump to the chat interface of this good friend from the chat interface of a upper good friend, and at this moment user is defeated Text information of the voice messaging entered after parsing will be shown in this chat interface.For example, user becomes reconciled simultaneously Friendly A, good friend B and good friend C chat, it is assumed that A, B, C are respectively the title of these three good friends, if user is being in the chat of A Interface, but need to reply the information of good friend B at once, at this moment, at this moment user chats from the title of voice acquisition module input good friend B Its interface just jumps to the chat interface of good friend B from the chat interface of good friend A, user again input voice information when, voice messaging Text information after parsing will be shown in the chat interface with good friend B, if at this moment good friend C sends information, need to reply The information that friendly C is sent, user can then input the title of good friend C from voice acquisition module, into the chat interface with good friend C.

Operational module, including trigger button are triggered, voice is triggered when pinning trigger button for user and continuously inputs jump The work of revolving die block, chat interface jumps when user being avoided to mention the title of another good friend when chatting with a good friend.

As shown in figure 4, embodiment three and the difference of embodiment two also reside in, embodiment three also discloses a kind of continuous speech Recognition methods includes the following steps：

S1, addition good friend and remarks good friend's title

User carries out the addition of good friend by chat module, and it is standby then to carry out good friend's title by good friend's title remarks module Note.

S2, user and good friend's chat

User is chatted by chat module and good friend, and when chat, user passes through voice acquisition module and carries out voice letter Breath input, then user speaks, and to be accustomed to analysis module according to the voice messaging of the collected user and all users universal Habits information of speaking is analyzed to obtain the habits information of speaking of the user to the habit of speaking of the user, later speech analysis mould Root tuber parses the voice messaging that it is inputted according to the habits information of speaking of the user, and by the text information of parsing in user Display output is carried out with the chat interface that good friend is chatting, is checked for user and good friend.

S3, chat interface automatically switch

When other good friends send information to user, user needs timely return information, and at this moment, user can pin touching Button is sent out, the good friend's title chatted then is inputted from voice acquisition module voice, then voice is continuously inputted and jumped Module control chat interface jumps to the chat interface of user Yu the good friend, and user unclamps again trigger button, and at this moment, user is from language The voice messaging of sound acquisition module input will carry out display output in the chat interface after being parsed into text information.If user is also It need to be switched to the chat interface of other good friends, can be realized by repeating step S3.

What has been described above is only an embodiment of the present invention, and the common sense such as well known specific structure and characteristic are not made herein in scheme Excessive description, technical field that the present invention belongs to is all before one skilled in the art know the applying date or priority date Ordinary technical knowledge can know the prior art all in the field, and have using routine experiment hand before the date The ability of section, one skilled in the art can improve and be implemented in conjunction with self-ability under the enlightenment that the application provides This programme, some typical known features or known method should not become one skilled in the art and implement the application Obstacle.It should be pointed out that for those skilled in the art, without departing from the structure of the invention, can also make Several modifications and improvements out, these also should be considered as protection scope of the present invention, these all will not influence the effect that the present invention is implemented Fruit and patent practicability.The scope of protection required by this application should be based on the content of the claims, the tool in specification The records such as body embodiment can be used for explaining the content of claim.

Claims

1. Continuous Speech Recognition System, it is characterised in that：Including speech signal analysis module and speech analysis module, voice messaging The voice messaging that processing module is used to intermittently input user is successive according to voice messaging acquisition time on time list It is arranged in voice sequence, the corresponding voice sequence in the very short part of user is then subjected to editing, by the voice messaging group after editing It synthesizes continuous voice messaging and is sent to the progress speech recognition of speech analysis module and parsing.

2. Continuous Speech Recognition System according to claim 1, it is characterised in that：It further include chat module and good friend's title Remarks module, chat module carry out good friend's addition for user and chat with different good friends in different chat interfaces；With When family and good friend chat, the corresponding chat interface of a good friend；Good friend's title remarks module is for user to different good friends Carry out different title remarks.

3. Continuous Speech Recognition System according to claim 2, it is characterised in that：It further include that voice continuously inputs and jumps mould Block, voice continuously input jump module for user while when switching is realized from different good friends chat between multiple chat interfaces, In good friend's title remarks module after good friend's title of remarks, the chat interface of user and the good friend are opened for user speech input, At this moment text information of the voice messaging of user's input after parsing will show in the chat interface, for user and Family friend checks.

4. Continuous Speech Recognition System according to claim 3, it is characterised in that：It further include trigger button, trigger button Voice is triggered when pinning trigger button for user and continuously inputs jump module start-up operation.

5. a kind of continuous speech recognition method, includes the following steps：

S2, voice messaging arrange：Collected voice messaging is successively arranged in voice according to acquisition time on time list Then the corresponding voice sequence of user's dwell portion is carried out editing, the voice messaging after editing is combined into continuously by sequence Voice messaging；

S3, speech analysis and output：Speech recognition and parsing are carried out to the continuous speech information after combination, voice messaging is parsed Text information afterwards carries out display output.

6. continuous speech recognition method according to claim 5, it is characterised in that：In step S1, to the language of user's input When message breath carries out continuous acquisition, if user's pause duration reaches user, at this moment setting preset duration restarts to user just It is acquired in the voice messaging of input.