CN102956231A

CN102956231A - Voice key information recording device and method based on semi-automatic correction

Info

Publication number: CN102956231A
Application number: CN2011102433795A
Authority: CN
Inventors: 叶英; 孔吉; 刘佩林
Original assignee: Shanghai Jiaotong University; Fujitsu Ltd
Current assignee: Shanghai Jiaotong University; Fujitsu Ltd
Priority date: 2011-08-23
Filing date: 2011-08-23
Publication date: 2013-03-06
Anticipated expiration: 2031-08-23
Also published as: CN102956231B

Abstract

Disclosed are voice key information recording device and method based on semi-automatic correction in the technical field of voice identification. The device comprises a key information extraction unit and an information correction unit connected with the key information extraction unit. The key information extraction unit acquires text information which is not corrected and extracts key information before outputting to the information correction unit, and the information correction unit outputs text information determined by user feedback. Work load of manual correction is reduced by the semi-automatic information correction unit; special nouns such as place names and professional tool names are corrected by a database, and influence caused by knowledge quantity limitation of operators in manual correction is reduced; and key information in voice information is extracted, and accordingly effective information quantity of recorded information is increased.

Description

Voice key message pen recorder and method based on semi-automatic correction

Technical field

What the present invention relates to is the device and method in a kind of speech recognition technology field, specifically a kind of voice key message pen recorder and method based on semi-automatic correction, by voice signal is identified, and carry out record with textual form, when user's inconvenience was carried out written record to voice messaging, alternative user finished the written record of voice messaging.

Background technology

Be subject to the voice signal recognition technology, known pen recorder after the voice signal of receiving is identified automatically, by operator's manual synchronizing, the accuracy of recorded information to guarantee.Thus, so that the normal operation of pen recorder needs a large amount of manual synchronizing work; And mentioned some information in the voice signal may be because operator's knowledge quantity restriction itself can't be made effectively correction, such as some place names or professional tool title etc.

Known pen recorder is that whole voice messagings is carried out identification record.But, in the voice messaging a large amount of insignificant information can appear, such as the greeting in the communication process, auxiliary word, modal particle.Authentic and valid information only is the part of whole voice messagings usually.Identification is also recorded whole voice messagings, has strengthened on the one hand the workload that text message is proofreaied and correct; On the other hand, the information recording/that the user need to be such as greeting yet.Voicemail logging should be simplified as far as possible, and maximum effective informations is provided.

Find through the retrieval to prior art, british patent document GB2323693A, put down in writing a kind of " Speech to text conversion " (spoken and written languages converting system), this technology comprises that at least one user terminal is used for recorded speech, at least one automatic speech recognition processor is take the speech production of will record as text, and is used for the communication device of text feedback to terminal; Carry out voice document long-range and that the selective control transmission is recorded by server between this user terminal and the automatic speech recognition processor.This technology comprises that also one selects corrector, and by the business operation person of this technology the text message that identifies is carried out error correction, and the text message storage after will correcting at last also feeds back to the user.

But the prior art needs business operation person to identify full text information, and these text messages are speech recognition equipment Direct Recognition result.On the one hand, comprise a large amount of meaningless informations in the full text information, such as the greeting in the communication process, auxiliary word, modal particle, authentic and valid information only accounts for a wherein part, and business operation person identifies the workload that full text information has increased the weight of business operation person.On the other hand, among the speech recognition equipment Direct Recognition result, a category information is special noun, proper noun, is responsible for this type of information is carried out error correction by business operation person, so that the accuracy rate of error correction depends on business personnel's knowledge quantity, has the risk of wrong error correction; There is certain stylistic requirement in another kind of information, such as temporal information, can carry out automatic error-correcting by algorithm, if business operation person is transferred in the error correction of this type of information, has also increased the weight of business operation person's error correction workload.

Summary of the invention

The present invention is directed to the prior art above shortcomings, a kind of voice key message pen recorder and method based on semi-automatic correction is provided, by automanual information correction unit, reduced the workload of manual synchronizing; Utilize database that special noun such as place name, professional tool title are proofreaied and correct, reduced the impact that operator's knowledge quantity restriction causes in the manual synchronizing; Extract the key message in the voice messaging, thereby improve the effective information of institute's recorded information.

The present invention is achieved by the following technical solutions:

The present invention relates to a kind of voice key message pen recorder based on semi-automatic correction, comprise: key message extraction unit and the information correction unit that is attached thereto, wherein: the key message extraction unit exports the information correction unit to after obtaining not calibrated text message and extracting key message, the text message after the output user feedback of information correction unit is confirmed.

Described information correction unit comprises: module is confirmed in redundant information correction module, temporal information correction module, special noun correction module and user feedback, wherein: the input end of redundant information correction module links to each other with the key message extraction unit, and output terminal links to each other with the temporal information correction module and the key message after the redundant information correction is transferred to output terminal; The input end of temporal information correction module links to each other with the redundant information correction module, and output terminal links to each other with special noun correction module and the key message after redundant information correction, the temporal information correction is passed to output terminal; The input end of special noun correction module links to each other with the temporal information correction module, and output terminal is confirmed that module links to each other and the key message after redundant information correction, temporal information correction, the correction of special noun is passed to output terminal with user feedback.The input end of user feedback affirmation module links to each other with special noun correction module and the key message after redundant information correction, temporal information correction, the correction of special noun is exported after user feedback is confirmed.

Described key message extraction unit comprises: parser module and classifier modules, wherein: the input end of parser module links to each other with voice recognition unit, and output terminal links to each other with classifier modules and the word after syntactic analysis, phrase are transferred to output terminal; The input end of classifier modules links to each other with the parser module, and output terminal links to each other with the information correction unit and the key message in the sorted information is transferred to output terminal.

The present invention relates to a kind of voice key message recording method based on semi-automatic correction, may further comprise the steps:

The first step, the voice messaging of being located to obtain by the user obtains the expressed not calibrated text message of voice signal by speech recognition software.

Described speech recognition software adopts the speech recognition software Sphinx that increases income of CMU to carry out automatic speech recognition.

Second step, key message extraction unit successively not calibrated text message are carried out syntactic analysis and classification analysis obtains temporal information, key message part of speech and special name word information, and above-mentioned information is transferred to output terminal as key message.

Described syntactic analysis refers to: adopt parser to carry out syntactic analysis to the not calibrated text message that obtains, realize the participle to statement in the text message, thereby the continuous statement in the text message is converted into various words and/or phrase;

Described parser adopts the parser Stanford Parser that increases income of Stanford University to carry out syntactic analysis.

Described classification analysis refers to: after adopting the Part of Speech Tagging corpus to train to sorter, adopt sorter to sort out first temporal information and key message part of speech from above-mentioned word and/or phrase; After this adopt again dictionary commonly used that sorter is trained, and with sorter to the noun part in the key message part of speech, advance a part of class and obtain everyday words information and non-common word information; Non-common word information wherein is the special name word information.

Described sorter adopts Bayes's text classifier.

Described key message part of speech refers to: noun, verb, number, adjective, adverbial word, preposition and pronoun.

Described special name word information refers to: the non-common word information part in the noun.

The 3rd step, the information correction unit carries out successively to key message that redundant information is proofreaied and correct, temporal information is proofreaied and correct and the key message of special noun after proofreading and correct and finally obtaining proofreading and correct, to eliminate in the speech recognition process, because accent and recognition unit self performance affect the identification error that causes, guarantee the accuracy of information recording/.

Described redundant information is proofreaied and correct and is referred to:

1) calculate coding between wantonly one or two key message A and the B apart from d (A, B):

d (A, B) = \max {| sizeof (A) - sizeof (B) |, \frac{\underset{i}{Σ} {(a_{i} - b_{i})}^{2}}{\max {\underset{i}{Σ} {a_{i}}^{2}, \underset{i}{Σ} {b_{i}}^{2}}}},

Wherein: sizeof (X) is the byte number of the ASCII coding of key message X, x _iFor the numerical value of i byte of the ASCII of key message X coding, and if i＞sizeof (X), then x _i=0; When coding is identical key message apart from the key message A of d (A, B)=0 with key message B, coding then is similar key message apart from the key message A of 0＜d (A, B)＜T to key message B, the T similar threshold value.

2) identical key message is formed a similar key message set with similar key message, when key message C identical with any key message in existing some similar key message set, or to existing some similar key message set in over half similar, then key message C is added this key message set;

3) after determining similar key message set, choose the frequency of occurrences is the highest in this set key message M and should similar key message gather in remaining key message all replace with key message M.

Described temporal information is proofreaied and correct and is referred to:

A) temporal information in the extracting time information pair at first, and proving time information centering each element, namely H (time), M (divide), S (second), whether meet twenty four hours system, made in 60 minutes and made in 60 seconds;

Described extraction refers to: according to " point " in the key message, " minute ", " second " wording cuts apart, with " point " the element H that thinks temporal information centering before, " point " and " minute " between the element M of thinking temporal information centering, the element S of thinking temporal information centering between " dividing " and " second "; When finding respective element, then the respective element with temporal information centering is set to zero.

B) meet temporal information specification agreement when element, then carry out automatic calibration to attempting it, if can't automatic calibration, then this temporal information be regarded as the wrong temporal information of identification, and be passed to user feedback and confirm that module carries out manual error correction and confirm.

Described special noun is proofreaied and correct and is referred to: export the special name word information in speech recognition and the sorted key message to the external data base resource as the retrieval entry, then the mass data resource of external data base and the error correction strategies that provides thereof are provided.

Described external data base comprises: internet and electronic dictionary.

Key message after the 4th step, information correction unit will be proofreaied and correct is transferred to user feedback and confirms module, is finally confirmed by the user, to guarantee the accuracy of information.

Described final affirmation employing short message, voice call and mobile network transfer to the user and carry out feedback acknowledgment, and the information after the affirmation is stored in user's terminal to make memorandum.

Beneficial effect of the present invention comprises:

1) utilizes key message to extract and automatic calibration, reduced the workload of the text message after the identification being carried out manual synchronizing.

2) utilize the external data base resource that the special name word information is proofreaied and correct, operator's knowledge quantity has improved the accuracy of the special name word information being carried out text entry to the impact of information correction quality when having reduced manual synchronizing.

3) utilize key message to extract, only the information of important core is submitted to the user, filtered inessential even insignificant information, improved the effective information of recording text.

Description of drawings

Fig. 1 embodiment uses synoptic diagram.

The structural representation of Fig. 2 key message extraction unit.

Fig. 3 key message data structure.

Fig. 4 redundant information is proofreaied and correct processing flow chart.

Fig. 5 temporal information is proofreaied and correct processing flow chart.

Embodiment

The below elaborates to embodiments of the invention, and present embodiment is implemented under take technical solution of the present invention as prerequisite, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

Embodiment

As shown in Figure 1, present embodiment is undertaken using from the complete set identification of speech-to-text by introducing voice recognition unit 1100.The voice key message pen recorder based on semi-automatic correction related in the present embodiment comprises: key message extraction unit 1200 and information correction unit 1300, wherein: voice recognition unit 1100 carries out automatic speech recognition with voice signal and obtains the expressed not calibrated text message of voice signal

Described voice recognition unit adopts speech recognition software to realize, uses in the present embodiment the speech recognition software Sphinx that increases income of CMU.

Described key message extraction unit 1200 carries out the extraction of key message with the text message that voice recognition unit obtains.

As shown in Figure 2, the realization of described key message extraction unit 1200 can be formed by parser 1210 and sorter 1220 cascades.Voice recognition unit 1100 is identified the text message of acquisition as the input data of parser 1210.The output data of parser 1210 are as the input data of sorter 1220.

Described parser 1210 adopts the Chinese parsing order chinesesegmenter that provides among the parser Stanford Parser that increases income of Stanford University to realize.The input of parser is the text message of acquisition that speech recognition software is identified, and the output of parser still stores and is transferred to the output terminal sorter with textual form classifies.

Described sorter 1220 utilizes Bayes's text classifier to realize.The training corpus of Bayes's text classifier derives from the Part of Speech Tagging corpus.Bayes's text classifier in the present embodiment, through behind this training, word, phrase that getting final product right input end Stanford Parser provides are classified by time information and part of speech.After this, reuse this sorter, the noun in the classification results is further segmented, as sorter training corpus, sort out everyday words information and non-common word information in the word information with dictionary commonly used.Temporal information behind the two-stage classification namely is transferred to output terminal as temporal information and uses; Non-common word information is transferred to output terminal and uses namely as the special name word information; Everyday words information in the noun, verb, number, adjective, adverbial word, preposition, pronoun use to output terminal as other communications.

Defined data structure among Fig. 3 is deferred in the output of described key message extraction unit 1200, is specially: each bar key message structure KeyInfo Struct[i] be a structure that is formed by three territories of the InfoClass that classifies under key message InfoData, the key message, label InfoTag.The set of all these key message structures, i.e. structure array KeyInfo_Struct[], be the output of key message extraction unit.

As shown in Figure 1, described information correction unit 1300 comprises: module 1340 is confirmed in redundant information correction module 1310, temporal information correction module 1320, special noun correction module 1330 and user feedback, and wherein: redundant information correction module 1310 adopts speech recognition probability error correction algorithm that key message is carried out redundant information and proofreaies and correct.Definition two-dimensional array SimInfo[M] [N] be similar key message array.Its first dimension sequence number is similar key message set numbering, contained key message numbering in each similar key message set of the second dimension storage.Be SimInfo[2]={ Isosorbide-5-Nitrae, 5} represent similar key message set 2 and comprise key message Isosorbide-5-Nitrae, 5.As shown in Figure 4, redundant information correction module 1310 according to the error correction algorithm of speech recognition probability described in the embodiment content at first will each key message of input end input integrate with similar key message set.Judge whether can add in the process of similar key message set 1311, the employed similar threshold value T value of present embodiment is 0.1.After this, each the similar key message set of redundant information correction module 1310 traversals, and with remaining key message in this similar key message set of key message replacement that the frequency of occurrences is the highest in each similar key message set.

Described temporal information correction module 1320 fingers are proofreaied and correct the key message that does not wherein meet time specification for the temporal information of key message extraction unit 1200 classification gained.In the present embodiment the hourage of the temporal information of Chinese speech is proofreaied and correct.Temporal information correction module 1320 at first extracts from the temporal information that key message extraction unit 1200 extracts and obtains temporal information to (H, M, S).In the present embodiment with " point " of Chinese, " time " before information as temporal information to element H, think the element M of temporal information centering with the information between " point " of Chinese, " time " and " minute ", think the element S of temporal information centering with " dividing " information afterwards of Chinese.Present embodiment is to each element H of temporal information centering, M, and S detects, and judges whether it meets time specification; And element H proofreaied and correct.

Because the value of element H can only be 0～24, thus in H only may for " several " as " five " or " tens " such as " 15 " or " twenties " such as " 21 " these three kinds of patterns; And the lowest order of rear two kinds of patterns " several " only may be 0～9.

As shown in Figure 5, be the trimming process of element H: when the hourage of the temporal information of identification gained only has one to be that the pattern of H is then skipped correction to this information for " several ".Otherwise, then when being not " ten ", the inferior low level of hourage has the speech recognition of mistake, and be " ten " with its automatic calibration; Equally, when the lowest order of hourage is " ten ", be " four " with its automatic calibration.

Temporal information correcting unit 1320 in the present embodiment, to detecting the wrong temporal information of identification that does not meet time specification but can't automatic calibration, InfoTag territory at the key message structure KeyInfo of this temporal information Struct indicates, and is transferred to output terminal.

Resulting special noun in 1330 pairs of key message extraction units of described special noun correction module module 1200 utilizes the external data base resource to proofread and correct.In the present embodiment, utilize internet database that special noun is proofreaied and correct by the mechanism for correcting errors of search engine.That is: will extract the special name word information of gained, submit to the search api interface that external search engine such as Google provide; After connecting into the internet and searching for, in back page, the key message of submitting to has been carried out error correction, then with the error correction result of search engine as proofreading and correct the result.Search engine error correction result's extraction can be by text-processing software such as grep, and sed filters acquisition to the text of back page.

Key message after above-mentioned three kinds of automatic calibrations are processed feeds back to the user at last, carries out final user feedback and proofreaies and correct 1340.In the present embodiment, this part namely utilizes the UI program to realize mutual with the user by display screen, input equipment.

By analysis, take word as statistical unit, under voice environment, the ratio that the key message of present embodiment defined accounts for full detail is 61.8%.After namely adopting the key message extraction unit that present embodiment proposes, be submitted to the information that output terminal carries out speech recognition correction and only be 61.8% of full detail.And the voice messaging that is aided with the three types automatic calibration is proofreaied and correct, and has also alleviated the workload of manual synchronizing.

Claims

1. voice key message pen recorder based on semi-automatic correction, it is characterized in that, comprise: key message extraction unit and the information correction unit that is attached thereto, wherein: the key message extraction unit exports the information correction unit to after obtaining not calibrated text message and extracting key message, the text message after the output user feedback of information correction unit is confirmed.

2. the voice key message pen recorder based on semi-automatic correction according to claim 1, it is characterized in that, described information correction unit comprises: module is confirmed in redundant information correction module, temporal information correction module, special noun correction module and user feedback, wherein: the input end of redundant information correction module links to each other with the key message extraction unit, and output terminal links to each other with the temporal information correction module and the key message after the redundant information correction is transferred to output terminal; The input end of temporal information correction module links to each other with the redundant information correction module, and output terminal links to each other with special noun correction module and the key message after redundant information correction, the temporal information correction is passed to output terminal; The input end of special noun correction module links to each other with the temporal information correction module, and output terminal is confirmed that module links to each other and the key message after redundant information correction, temporal information correction, the correction of special noun is passed to output terminal with user feedback.The input end of user feedback affirmation module links to each other with special noun correction module and the key message after redundant information correction, temporal information correction, the correction of special noun is exported after user feedback is confirmed.

3. the voice key message pen recorder based on semi-automatic correction according to claim 1, it is characterized in that, described key message extraction unit comprises: parser module and classifier modules, wherein: the input end of parser module links to each other with voice recognition unit, and output terminal links to each other with classifier modules and the word after syntactic analysis, phrase are transferred to output terminal; The input end of classifier modules links to each other with the parser module, and output terminal links to each other with the information correction unit and the key message in the sorted information is transferred to output terminal.

4. the voice key message recording method based on semi-automatic correction is characterized in that, may further comprise the steps:

The first step, the voice messaging of being located to obtain by the user obtains the expressed not calibrated text message of voice signal by speech recognition software;

Second step, key message extraction unit successively not calibrated text message are carried out syntactic analysis and classification analysis obtains temporal information, key message part of speech and special name word information and transfers to output terminal as key message;

In the 3rd step, the information correction unit carries out successively to key message that redundant information is proofreaied and correct, temporal information is proofreaied and correct and the key message of special noun after proofreading and correct and finally obtaining proofreading and correct;

Key message after the 4th step, information correction unit will be proofreaied and correct is transferred to user feedback and confirms module, is finally confirmed by the user.

5. voice key message according to claim 4 recording method, it is characterized in that, described syntactic analysis refers to: adopt parser to carry out syntactic analysis to the not calibrated text message that obtains, realization is to the participle of statement in the text message, thereby the continuous statement in the text message is converted into various words and/or phrase.

6. voice key message according to claim 4 recording method, it is characterized in that, described classification analysis refers to: after adopting the Part of Speech Tagging corpus to train to sorter, adopt sorter to sort out first temporal information and key message part of speech from above-mentioned word and/or phrase; After this adopt again dictionary commonly used that sorter is trained, and with sorter to the noun part in the key message part of speech, advance a part of class and obtain everyday words information and non-common word information; Non-common word information wherein is the special name word information.

7. voice key message according to claim 6 recording method is characterized in that, described sorter adopts Bayes's text classifier.

8. voice key message according to claim 6 recording method is characterized in that, described key message part of speech refers to: noun, verb, number, adjective, adverbial word, preposition and pronoun; Described special name word information refers to: the non-common word information part in the noun.

9. voice key message according to claim 4 recording method is characterized in that, described redundant information is proofreaied and correct and referred to:

d (A, B) = \max {| sizeof (A) - sizeof (B) |, \frac{\underset{i}{Σ} {(a_{i} - b_{i})}^{2}}{\max {\underset{i}{Σ} {a_{i}}^{2}, \underset{i}{Σ} {b_{i}}^{2}}}},

Wherein: sizeof (X) is the byte number of the ASCII coding of key message X, x _iFor the numerical value of i byte of the ASCII of key message X coding, and if i＞sizeof (X), then x _i=0; When coding is identical key message apart from the key message A of d (A, B)=0 with key message B, coding then is similar key message apart from the key message A of 0＜d (A, B)＜T to key message B, the T similar threshold value;

10. voice key message according to claim 4 recording method is characterized in that, described temporal information is proofreaied and correct and referred to:

A) temporal information in the extracting time information pair at first, and proving time information centering each element, namely whether H, M, s meet twenty four hours system, 60 minutes systems and 60 seconds systems;

11. voice key message according to claim 10 recording method, it is characterized in that, described extraction refers to: according to " point " in the key message, " minute ", " second " wording cuts apart, with " point " the element H that thinks temporal information centering before, " point " and " minute " between the element M of thinking temporal information centering, the element S of thinking temporal information centering between " dividing " and " second "; When finding respective element, then the respective element with temporal information centering is set to zero.

12. voice key message according to claim 4 recording method, it is characterized in that, described special noun is proofreaied and correct and is referred to: export the special name word information in speech recognition and the sorted key message to the external data base resource as the retrieval entry, then the mass data resource of external data base and the error correction strategies that provides thereof are provided.

13. voice key message according to claim 4 recording method is characterized in that, described final affirmation employing short message, voice call and mobile network transfer to the user and carry out feedback acknowledgment, and the information after the affirmation is stored in user's terminal to make memorandum.