CN105513586A - Speech recognition result display method and speech recognition result display device - Google Patents

Speech recognition result display method and speech recognition result display device Download PDF

Info

Publication number
CN105513586A
CN105513586A CN201510958817.4A CN201510958817A CN105513586A CN 105513586 A CN105513586 A CN 105513586A CN 201510958817 A CN201510958817 A CN 201510958817A CN 105513586 A CN105513586 A CN 105513586A
Authority
CN
China
Prior art keywords
word
sequence
determined
speech recognition
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510958817.4A
Other languages
Chinese (zh)
Inventor
李世龙
贺利强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510958817.4A priority Critical patent/CN105513586A/en
Publication of CN105513586A publication Critical patent/CN105513586A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Abstract

The invention discloses a speech recognition result display method and a speech recognition result display device. The speech recognition result display method comprises the following steps: receiving a speech signal to be recognized; carrying out speech recognition on the speech signal to get an intermediate recognition result, wherein the intermediate recognition result includes non-determined words and determined words; and displaying the non-determined words and the determined words on a screen in the process of speech recognition. With the method, the intermediate recognition result can be displayed during speech recognition, and display of the intermediate recognition result can be accelerated.

Description

The display packing of voice identification result and device
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of display packing and device of voice identification result.
Background technology
Speech recognition can be word by speech recognition.When speech recognition, only can show final recognition result to user, in identifying, there is no information feed back to user.But, in order to make user in identifying, perceiving identification progress and content feed, middle recognition result can be continued to be presented at (referred to as upper screen) on screen in identifying.What such as user will input is the weather of tomorrow " Beijing how ", so in identifying, voice are that burst continues input, the display of middle recognition result is also cumulative process, as " Beijing "-> Beijing tomorrow "-> the weather of tomorrow " the Beijing "-> weather of tomorrow " Beijing how ".
In correlation technique, in the middle of showing during recognition result, what show is all determine word, determine word due to what return, result is shielded so just can produce on first after needing the voice of certain time length, the speed of upper screen also relies on the speed determining that word produces simultaneously, and upper screen time and speed have certain delayed.
Summary of the invention
The present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.
For this reason, one object of the present invention is the display packing proposing a kind of voice identification result, recognition result in the middle of the method can show in speech recognition process, and can accelerate the display of middle recognition result.
Another object of the present invention is the display device proposing a kind of voice identification result.
For achieving the above object, the display packing of the voice identification result that first aspect present invention embodiment proposes, comprising: receive voice signal to be identified; Carry out speech recognition to described voice signal, obtain middle recognition result, described middle recognition result comprises: non-determined word and determine word; In speech recognition process, screen shows described non-determined word and describedly determines word.
The display packing of the voice identification result that first aspect present invention embodiment proposes, by in speech recognition process, screen shows non-determined word and determines word, recognition result in the middle of can showing when speech recognition, and the display of middle recognition result can be accelerated.
For achieving the above object, the display device of the voice identification result that second aspect present invention embodiment proposes, comprising: receiver module, for receiving voice signal to be identified; Identification module, for carrying out speech recognition to described voice signal, obtains middle recognition result, and described middle recognition result comprises: non-determined word and determine word; Display module, in speech recognition process, screen shows described non-determined word and describedly determines word.
The display device of the voice identification result that second aspect present invention embodiment proposes, by in speech recognition process, screen shows non-determined word and determines word, recognition result in the middle of can showing when speech recognition, and the display of middle recognition result can be accelerated.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the schematic flow sheet of the display packing of the voice identification result that one embodiment of the invention proposes;
Fig. 2 is a kind of schematic diagram obtaining middle recognition result in the embodiment of the present invention;
Fig. 3 is the embodiment of the present invention according to the schematic diagram of the first word sequence and the second word sequence identification determination word and non-determined word;
Fig. 4 is the another kind of schematic diagram obtaining middle recognition result in the embodiment of the present invention;
Fig. 5 is the schematic diagram that in the embodiment of the present invention, word and non-determined word are determined in display;
When Fig. 6 is speech recognition in the embodiment of the present invention, upper screen display is shown and initiates the schematic diagram of search;
Fig. 7 is the structural representation of the display device of the voice identification result that another embodiment of the present invention proposes.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar module or has module that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.On the contrary, embodiments of the invention comprise fall into attached claims spirit and intension within the scope of all changes, amendment and equivalent.
Fig. 1 is the schematic flow sheet of the display packing of the voice identification result that one embodiment of the invention proposes, and the method comprises:
S11: receive voice signal to be identified.
Such as, the voice signal that user inputs continuously can be received, using the voice signal that inputs continuously as voice signal to be identified.
S12: carry out speech recognition to described voice signal, obtains middle recognition result, and described middle recognition result comprises: non-determined word and determine word.
When the continuous input speech signal of user, can identify continuously voice signal, after each moment identifies, obtain middle recognition result corresponding to corresponding moment.
In the process of speech recognition, after the voice signal of current time to current time identifies, the recognition result of current time can be obtained.Because voice signal is input continuously, recognition result corresponding to subsequent time can be obtained equally when subsequent time identification.In the recognition result do not obtained in the same time, corresponding same voice, may obtain different recognition results, such as, can identify " ferry-boat " at current time, and subsequent time, along with the input of new voice, can be identified as " Baidu ".
Determine that word refers to that, according to the confirmable word of the voice signal of current time, non-determined word refers to according to the unascertainable word of the voice signal of current time.Wherein, the voice signal of current time comprises from phonetic entry until the voice signal of current time.Determine that word has large probability to appear in final recognition result, and non-determined word may be corrected in follow-up identifying.
Such as, the voice signal that user will input is the voice of " using Baidu.com; you just know " correspondence, when supposing that current time is the voice signal identifying " Baidu " correspondence, because current time does not have other reference information, be in uncertain state, its recognition result is non-determined word, suppose that the result that current time identifies is " ferry-boat ", then this result can be called non-determined word.At subsequent time, when supposing that the voice got are voice of " using Baidu.com " correspondence, such as, can determine that the word combined with " once " is " Baidu " instead of " ferry-boat " according to preset rules, then the determination word of subsequent time is " Baidu ".Similar, " once " is the non-determined word of subsequent time.
In some embodiments, see Fig. 2, carry out speech recognition to described voice signal, the flow process obtaining middle recognition result can comprise:
S21: corresponding each current time, speech recognition is carried out to the voice signal of described current time, obtain the first word sequence and the second word sequence, described first word sequence is the word sequence that optimum state metastasis sequence is corresponding, and described second word sequence is the word sequence that suboptimum state metastasis sequence is corresponding.
Speech recognition is continuous print, therefore, using each moment of carrying out speech recognition as current time, thus can carry out speech recognition at each current time.
When current time carries out speech recognition, after receiving the voice signal of current time, based on acoustic model and language module, the probability score of multiple candidate state transitions sequence and correspondence thereof can be obtained by dynamic programming algorithm.Wherein, the candidate state transitions sequence that probability score is the highest is called optimum state metastasis sequence, and probability score time high candidate state transitions sequence is called suboptimum state metastasis sequence.
Such as, acoustic model is HMM model, and dynamic programming algorithm is viterbi algorithm.
When speech recognition, recognition network can be set up according to acoustic model, language model etc., for the voice signal of input, in recognition network, search for optimal path as recognition result.When searching for, can according to the marking of the marking of acoustic model and language model, determine the probability score in each path, and select alternatively path, multiple path according to probability score, by the state composition candidate state transitions sequence on each path candidate, in addition, each candidate state transitions sequence pair answers one group of word sequence, thus using word sequence corresponding for optimum state metastasis sequence as recognition result.
Such as, see Fig. 3, the word that the word sequence (the first word sequence) that optimum state metastasis sequence is corresponding comprises respectively: W1, W2, W3, W4 and W5.The word that the word sequence (the second word sequence) that suboptimum state metastasis sequence is corresponding comprises is respectively: W1, W2, W3, W4 and W6.
S22: the middle recognition result described first word sequence being defined as current time, wherein, word identical with described second word sequence in described first word sequence is defined as the determination word of described current time, words different from described second word sequence in described first word sequence is defined as the non-determined word of described current time.
Such as, as shown in Figure 3, the first word sequence comprises: W1, W2, W3, W4 and W5, then the middle recognition result of current time is: W1, W2, W3, W4 and W5.
Because word identical with the second word sequence in the first word sequence is: W1, W2, W3 and W4, different word is: W5, then W1, W2, W3 and W4 are the determination words of current time, and W5 is non-determined word.
In some embodiments, see Fig. 4, carry out speech recognition to described voice signal, the flow process obtaining middle recognition result can comprise:
S41: obtain the first Output rusults, described first Output rusults comprises: the word mark of each word and the output time point of each word in the first word sequence, and described first word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the first moment is corresponding.
Wherein, the first moment can represent with T1, and the first moment can be the optional moment of carrying out speech recognition.
When the first moment carried out speech recognition, the optimum state metastasis sequence in the first moment and the word sequence of correspondence can be obtained.In addition, when not carrying out speech recognition in the same time, the output time point of each word can also be recorded.Word mark is such as the ID of word or is exactly word itself.
Such as, the first Output rusults corresponding to T1 moment comprises: { w1, t1}, { w2, t2}, { w3, t3}, { w4, t4}, wherein, w1-w4 is word mark respectively, and t1-t4 is output time point respectively.
S42: obtain the second Output rusults, described second Output rusults comprises: the word mark of each word and the output time point of each word in the second word sequence, and described second word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the second moment is corresponding.
Wherein, the second moment was the moment of the first moment after prefixed time interval, and the second moment such as represented with T2, then T2=T1+t, and wherein, t is prefixed time interval, and t can rule of thumb infer and/or experimental result is determined, such as, and t=200ms.
Such as, the second Output rusults that the T2 moment is corresponding comprises: { w1, t1}, { w2, t2}, { w3, t3}, { w4, t4'}, { w5, t5}.
S43: the middle recognition result described second word sequence being defined as described second moment, wherein, word corresponding with same section in described first Output rusults in described second Output rusults is defined as the determination word in described second moment, word corresponding with different piece in described first Output rusults in described second Output rusults is defined as the non-determined word in described second moment.
Such as, word sequence corresponding for w1-w5 is defined as the middle recognition result in T2 moment, because w1-w3 and corresponding output time are o'clock identical in the first Output rusults with the second Output rusults, output time corresponding to w4 is o'clock different in two Output rusults, and add w5 in the second Output rusults, therefore, the determination word in T2 moment comprises word corresponding to w1-w3, and non-determined word comprises word corresponding to w4 and w5.
In addition, when word and non-determined word are determined in acquisition, the flow process shown in Fig. 2 can be adopted separately or also can adopt separately the flow process shown in Fig. 4, or, all right flow process shown in composition graphs 2 and Fig. 4, when in conjunction with two flow processs, first can perform a flow process and perform another flow process again, or, also can combine according to other rules.
S13: show described non-determined word and describedly determine word on screen.
In the present embodiment, not only can show and determine word, non-determined word can also be shown.
Owing to determining that the generation of word needs to gather follow-up voice messaging, such as, when getting the voice of " using Baidu.com " correspondence, what receive before could determining is " Baidu " but not " ferry-boat ", therefore, determine that the generation speed of word is slow, and the generation of non-determined word is the voice not needing subsequent acquisition, such as, what current time obtained is " Baidu " corresponding voice, no matter then current time obtains recognition result is " Baidu " or " ferry-boat ", can it can be used as non-determined word, therefore, the generation velocity ratio of non-determined word is very fast.
By showing non-determined word on screen, can faster to user feedback content.
In some embodiments, see Fig. 5, screen shows non-determined word and determines that the flow process of word can comprise:
S51: after identifying non-determined word, is presented at described non-determined word on screen immediately.
Such as, after identifying non-determined word " ferry-boat ", be presented on screen immediately, and do not need by the time to determine that word generation shows again.
With described, S52: after identifying and determining word, determines that corresponding non-determined word replaced in word, and show on screen and describedly determine word.
Such as, see Fig. 6, middle recognition result during speech recognition can be divided into and determine word and non-determined word, just non-determined word is shown immediately after identifying non-determined word, after identifying and determining word, with determining that corresponding non-determined word replaced in word, thus can constantly by determining that non-determined word replaced in word when showing.Wherein, in Fig. 6, represent non-determined word with lowercase, represent with capitalization and determine word.
In addition, the non-determined word and determine that word can adopt different display modes when showing.Such as, non-determined word grey represents, determines that word represents with highlighted.
With " using Baidu.com, you just know " for example, then the change procedure of content screen shown can comprise:
1) { " uncertain_word ": " ferry-boat " }
2) { " certain_word ": " Baidu ", " uncertain_word ": " once " }
3) { " certain_word ": " using Baidu.com " }
4) { " certain_word ": " using Baidu.com ", " uncertain_word ": " you just know " }
5) " certain_word ": you just know using Baidu.com " }
Accordingly, the effect on screen is as follows:
Ferry-boat (grey) → Baidu (highlighted) once (grey) → using Baidu.com (highlighted) → using Baidu.com (highlighted) you just know that (grey) → you just know (highlighted) using Baidu.com
Wherein, the display format of the word of the content representation in () before it, as ferry-boat (grey) represents on screen, " ferry-boat " this word grey represents.
For foregoing, show 3 times when only word is determined in display, and after also showing non-determined word, 5 contents can be shown on screen.
Further, after obtaining final recognition result, can search for according to final recognition result.Such as, search for the weather of Beijing tomorrow and Search Results is shown to user.
In the present embodiment, by speech recognition process, screen shows non-determined word and determines word, recognition result in the middle of can showing when speech recognition, and the display of middle recognition result can be accelerated.Further, by algorithms of different identification determination word and non-determined word, algorithms of different can be selected according to actual needs, meet different scene needs, and by conjunction with algorithms of different, can recognition accuracy be improved.Further, by with determining that corresponding non-determined word replaced in word, the real-time change of content on screen can be realized.Further, by word and the multi-form display of non-determined word will be determined, user can be facilitated to check, promote Consumer's Experience.
Fig. 7 is the structural representation of the display device of the voice identification result that another embodiment of the present invention proposes, and this device 70 comprises: receiver module 71, identification module 72 and display module 73.
Receiver module 71, for receiving voice signal to be identified.
Such as, the voice signal that user inputs continuously can be received, using the voice signal that inputs continuously as voice signal to be identified.
Identification module 72, for carrying out speech recognition to described voice signal, obtains middle recognition result, and described middle recognition result comprises: non-determined word and determine word.
When the continuous input speech signal of user, can identify continuously voice signal, after each moment identifies, obtain middle recognition result corresponding to corresponding moment.
In the process of speech recognition, after the voice signal of current time to current time identifies, the recognition result of current time can be obtained.Because voice signal is input continuously, recognition result corresponding to subsequent time can be obtained equally when subsequent time identification.In the recognition result do not obtained in the same time, corresponding same voice, may obtain different recognition results, such as, can identify " ferry-boat " at current time, and subsequent time, along with the input of new voice, can be identified as " Baidu ".
Determine that word refers to that, according to the confirmable word of the voice signal of current time, non-determined word refers to according to the unascertainable word of the voice signal of current time.Wherein, the voice signal of current time comprises from phonetic entry until the voice signal of current time.Determine that word has large probability to appear in final recognition result, and non-determined word may be corrected in follow-up identifying.
Such as, the voice signal that user will input is the voice of " using Baidu.com; you just know " correspondence, when supposing that current time is the voice signal identifying " Baidu " correspondence, because current time does not have other reference information, be in uncertain state, its recognition result is non-determined word, suppose that the result that current time identifies is " ferry-boat ", then this result can be called non-determined word.At subsequent time, when supposing that the voice got are voice of " using Baidu.com " correspondence, such as, can determine that the word combined with " once " is " Baidu " instead of " ferry-boat " according to preset rules, then the determination word of subsequent time is " Baidu ".Similar, " once " is the non-determined word of subsequent time.
Optionally, described identification module 72 specifically for:
Corresponding each current time, speech recognition is carried out to the voice signal of described current time, obtain the first word sequence and the second word sequence, described first word sequence is the word sequence that optimum state metastasis sequence is corresponding, and described second word sequence is the word sequence that suboptimum state metastasis sequence is corresponding;
Described first word sequence is defined as the middle recognition result of current time, wherein, word identical with described second word sequence in described first word sequence is defined as the determination word of described current time, words different from described second word sequence in described first word sequence is defined as the non-determined word of described current time.
Optionally, described identification module 72 specifically for:
Obtain the first Output rusults, described first Output rusults comprises: the word mark of each word and the output time point of each word in the first word sequence, and described first word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the first moment is corresponding;
Obtain the second Output rusults, described second Output rusults comprises: the word mark of each word and the output time point of each word in the second word sequence, and described second word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the second moment is corresponding;
Described second word sequence is defined as the middle recognition result in described second moment, wherein, word corresponding with same section in described first Output rusults in described second Output rusults is defined as the determination word in described second moment, word corresponding with different piece in described first Output rusults in described second Output rusults is defined as the non-determined word in described second moment.
Display module 73, in speech recognition process, screen shows described non-determined word and describedly determines word.
In the present embodiment, not only can show and determine word, non-determined word can also be shown.
Owing to determining that the generation of word needs to gather follow-up voice messaging, such as, when getting the voice of " using Baidu.com " correspondence, what receive before could determining is " Baidu " but not " ferry-boat ", therefore, determine that the generation speed of word is slow, and the generation of non-determined word is the voice not needing subsequent acquisition, such as, what current time obtained is " Baidu " corresponding voice, no matter then current time obtains recognition result is " Baidu " or " ferry-boat ", can it can be used as non-determined word, therefore, the generation velocity ratio of non-determined word is very fast.
By showing non-determined word on screen, can faster to user feedback content.
Optionally, described display module 73 specifically for:
After identifying non-determined word, immediately described non-determined word is presented on screen;
After identifying and determining word, determine that corresponding non-determined word replaced in word with described, and show on screen and describedly determine word.
Described non-determined word and describedly determine that word adopts different display format.
Such as, non-determined word grey represents, determines that word represents with highlighted.
In the present embodiment, the particular content of module see the associated description in said method embodiment, can not repeat them here.
In the present embodiment, by speech recognition process, screen shows non-determined word and determines word, recognition result in the middle of can showing when speech recognition, and the display of middle recognition result can be accelerated.Further, by algorithms of different identification determination word and non-determined word, algorithms of different can be selected according to actual needs, meet different scene needs, and by conjunction with algorithms of different, can recognition accuracy be improved.Further, by with determining that corresponding non-determined word replaced in word, the real-time change of content on screen can be realized.Further, by word and the multi-form display of non-determined word will be determined, user can be facilitated to check, promote Consumer's Experience.
It should be noted that, in describing the invention, term " first ", " second " etc. only for describing object, and can not be interpreted as instruction or hint relative importance.In addition, in describing the invention, except as otherwise noted, the implication of " multiple " refers at least two.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is that the hardware that can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, this program perform time, step comprising embodiment of the method one or a combination set of.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of unit exists, also can be integrated in a module by two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.If described integrated module using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.
The above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (10)

1. a display packing for voice identification result, is characterized in that, comprising:
Receive voice signal to be identified;
Carry out speech recognition to described voice signal, obtain middle recognition result, described middle recognition result comprises: non-determined word and determine word;
In speech recognition process, screen shows described non-determined word and describedly determines word.
2. method according to claim 1, is characterized in that, describedly carries out speech recognition to described voice signal, obtains middle recognition result, comprising:
Corresponding each current time, speech recognition is carried out to the voice signal of described current time, obtain the first word sequence and the second word sequence, described first word sequence is the word sequence that optimum state metastasis sequence is corresponding, and described second word sequence is the word sequence that suboptimum state metastasis sequence is corresponding;
Described first word sequence is defined as the middle recognition result of current time, wherein, word identical with described second word sequence in described first word sequence is defined as the determination word of described current time, words different from described second word sequence in described first word sequence is defined as the non-determined word of described current time.
3. method according to claim 1, is characterized in that, describedly carries out speech recognition to described voice signal, obtains middle recognition result, comprising:
Obtain the first Output rusults, described first Output rusults comprises: the word mark of each word and the output time point of each word in the first word sequence, and described first word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the first moment is corresponding;
Obtain the second Output rusults, described second Output rusults comprises: the word mark of each word and the output time point of each word in the second word sequence, and described second word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the second moment is corresponding;
Described second word sequence is defined as the middle recognition result in described second moment, wherein, word corresponding with same section in described first Output rusults in described second Output rusults is defined as the determination word in described second moment, word corresponding with different piece in described first Output rusults in described second Output rusults is defined as the non-determined word in described second moment.
4. method according to claim 1, is characterized in that, describedly on screen, shows described non-determined word and describedly determine word, comprising:
After identifying non-determined word, immediately described non-determined word is presented on screen;
After identifying and determining word, determine that corresponding non-determined word replaced in word with described, and show on screen and describedly determine word.
5. method according to claim 4, is characterized in that, described non-determined word and describedly determine that word adopts different display format.
6. method according to claim 5, is characterized in that, is describedly presented on screen by described non-determined word, comprising: appeared dimmed by described non-determined word on screen;
Described display on screen describedly determines word, comprising: on screen, to determine that word is shown as highlighted by described.
7. a display device for voice identification result, is characterized in that, comprising:
Receiver module, for receiving voice signal to be identified;
Identification module, for carrying out speech recognition to described voice signal, obtains middle recognition result, and described middle recognition result comprises: non-determined word and determine word;
Display module, in speech recognition process, screen shows described non-determined word and describedly determines word.
8. device according to claim 7, is characterized in that, described identification module specifically for:
Corresponding each current time, speech recognition is carried out to the voice signal of described current time, obtain the first word sequence and the second word sequence, described first word sequence is the word sequence that optimum state metastasis sequence is corresponding, and described second word sequence is the word sequence that suboptimum state metastasis sequence is corresponding;
Described first word sequence is defined as the middle recognition result of current time, wherein, word identical with described second word sequence in described first word sequence is defined as the determination word of described current time, words different from described second word sequence in described first word sequence is defined as the non-determined word of described current time.
9. device according to claim 7, is characterized in that, described identification module specifically for:
Obtain the first Output rusults, described first Output rusults comprises: the word mark of each word and the output time point of each word in the first word sequence, and described first word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the first moment is corresponding;
Obtain the second Output rusults, described second Output rusults comprises: the word mark of each word and the output time point of each word in the second word sequence, and described second word sequence is the word sequence that the optimum state metastasis sequence that obtains after carrying out speech recognition to the voice signal in the second moment is corresponding;
Described second word sequence is defined as the middle recognition result in described second moment, wherein, word corresponding with same section in described first Output rusults in described second Output rusults is defined as the determination word in described second moment, word corresponding with different piece in described first Output rusults in described second Output rusults is defined as the non-determined word in described second moment.
10. device according to claim 7, is characterized in that, described display module specifically for:
After identifying non-determined word, immediately described non-determined word is presented on screen;
After identifying and determining word, determine that corresponding non-determined word replaced in word with described, and show on screen and describedly determine word.
CN201510958817.4A 2015-12-18 2015-12-18 Speech recognition result display method and speech recognition result display device Pending CN105513586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510958817.4A CN105513586A (en) 2015-12-18 2015-12-18 Speech recognition result display method and speech recognition result display device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510958817.4A CN105513586A (en) 2015-12-18 2015-12-18 Speech recognition result display method and speech recognition result display device

Publications (1)

Publication Number Publication Date
CN105513586A true CN105513586A (en) 2016-04-20

Family

ID=55721515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510958817.4A Pending CN105513586A (en) 2015-12-18 2015-12-18 Speech recognition result display method and speech recognition result display device

Country Status (1)

Country Link
CN (1) CN105513586A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782560A (en) * 2017-03-06 2017-05-31 海信集团有限公司 Determine the method and device of target identification text
CN106910503A (en) * 2017-04-26 2017-06-30 海信集团有限公司 Method, device and intelligent terminal for intelligent terminal display user's manipulation instruction
CN107632980A (en) * 2017-08-03 2018-01-26 北京搜狗科技发展有限公司 Voice translation method and device, the device for voiced translation
WO2021249323A1 (en) * 2020-06-09 2021-12-16 北京字节跳动网络技术有限公司 Information processing method, system and apparatus, and electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1764944A (en) * 2003-03-26 2006-04-26 皇家飞利浦电子股份有限公司 Speech recognition system
CN101042866A (en) * 2006-03-22 2007-09-26 富士通株式会社 Speech recognition apparatus, speech recognition method, and recording medium recorded a computer program
CN101567189A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Device, method and system for correcting voice recognition result
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
CN104424944A (en) * 2013-08-19 2015-03-18 联想(北京)有限公司 Information processing method and electronic device
US20150255066A1 (en) * 2013-07-10 2015-09-10 Datascription Llc Metadata extraction of non-transcribed video and audio streams
CN105117034A (en) * 2015-08-31 2015-12-02 任文 Method for inputting Chinese speeches, positioning statements and correcting errors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1764944A (en) * 2003-03-26 2006-04-26 皇家飞利浦电子股份有限公司 Speech recognition system
CN101042866A (en) * 2006-03-22 2007-09-26 富士通株式会社 Speech recognition apparatus, speech recognition method, and recording medium recorded a computer program
CN101567189A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Device, method and system for correcting voice recognition result
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
US20150255066A1 (en) * 2013-07-10 2015-09-10 Datascription Llc Metadata extraction of non-transcribed video and audio streams
CN104424944A (en) * 2013-08-19 2015-03-18 联想(北京)有限公司 Information processing method and electronic device
CN105117034A (en) * 2015-08-31 2015-12-02 任文 Method for inputting Chinese speeches, positioning statements and correcting errors

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782560A (en) * 2017-03-06 2017-05-31 海信集团有限公司 Determine the method and device of target identification text
CN106782560B (en) * 2017-03-06 2020-06-16 海信集团有限公司 Method and device for determining target recognition text
CN106910503A (en) * 2017-04-26 2017-06-30 海信集团有限公司 Method, device and intelligent terminal for intelligent terminal display user's manipulation instruction
CN107632980A (en) * 2017-08-03 2018-01-26 北京搜狗科技发展有限公司 Voice translation method and device, the device for voiced translation
WO2021249323A1 (en) * 2020-06-09 2021-12-16 北京字节跳动网络技术有限公司 Information processing method, system and apparatus, and electronic device and storage medium
US11900945B2 (en) 2020-06-09 2024-02-13 Beijing Bytedance Network Technology Co., Ltd. Information processing method, system, apparatus, electronic device and storage medium
JP7448672B2 (en) 2020-06-09 2024-03-12 北京字節跳動網絡技術有限公司 Information processing methods, systems, devices, electronic devices and storage media

Similar Documents

Publication Publication Date Title
CN106534548B (en) Voice error correction method and device
CN105138515A (en) Named entity recognition method and device
US10811013B1 (en) Intent-specific automatic speech recognition result generation
CN111522994B (en) Method and device for generating information
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN104992704A (en) Speech synthesizing method and device
US20160092160A1 (en) User adaptive interfaces
US9576578B1 (en) Contextual improvement of voice query recognition
CN111709248A (en) Training method and device of text generation model and electronic equipment
CN105679314A (en) Speech recognition method and device
CN105513586A (en) Speech recognition result display method and speech recognition result display device
CN105529027A (en) Voice identification method and apparatus
US11398228B2 (en) Voice recognition method, device and server
US9922650B1 (en) Intent-specific automatic speech recognition result generation
US20130346066A1 (en) Joint Decoding of Words and Tags for Conversational Understanding
CN109976702A (en) A kind of audio recognition method, device and terminal
JP2020166839A (en) Sentence recommendation method and apparatus based on associated points of interest
CN104598515A (en) Song searching method, device and system
CN105488135A (en) Live content classification method and device
CN110610698B (en) Voice labeling method and device
CN109243468A (en) Audio recognition method, device, electronic equipment and storage medium
CN108549628A (en) The punctuate device and method of streaming natural language information
CN105225658A (en) The determination method and apparatus of rhythm pause information
CN104239442A (en) Method and device for representing search results
CN104408099A (en) Searching method and searching device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160420

RJ01 Rejection of invention patent application after publication