US11223878B2 - Electronic device, speech recognition method, and recording medium - Google Patents
Electronic device, speech recognition method, and recording medium Download PDFInfo
- Publication number
- US11223878B2 US11223878B2 US16/756,382 US201816756382A US11223878B2 US 11223878 B2 US11223878 B2 US 11223878B2 US 201816756382 A US201816756382 A US 201816756382A US 11223878 B2 US11223878 B2 US 11223878B2
- Authority
- US
- United States
- Prior art keywords
- text
- ratio
- electronic device
- sets
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
- H04N21/4314—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for fitting data in a restricted space on the screen, e.g. EPG data in a rectangular grid
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the disclosure relates to an electronic device, a speech recognition method, and a recording medium, and more particularly relates to an electronic device capable of recognizing an entire text from utterance of some text by using a ratio of included words extracted from text converted from an input speech, a speech recognition method, and a recording medium.
- the electronic device is a device electrically executing specific functions according to instructions of a user, and a technology that the electronic device recognizes a speech of a user and executes a function intended by a user when receiving an instruction of a user has been provided.
- An object of the disclosure is to provide an electronic device capable of recognizing an entire text from utterance of some text by using a ratio of included words extracted from text converted from an input speech, a speech recognition method, and a recording medium.
- an electronic device including a microphone for receiving a speech, a memory for storing a plurality of text sets, and a processor configured to convert the speech received via the microphone into text, search for words in each of the plurality of text sets common to words in the converted text, and determine at least one text set among the plurality of text sets based on a ratio of the searched common words.
- the processor may be configured to determine at least one text set based on a first ratio of the searched common words in the text set and a second ratio of the searched common words in the converted text.
- the processor may be configured to determine a text set having at least one ratio of the first ratio and the second ratio that is higher than a predetermined ratio as the at least one text set.
- the processor may be configured to, based on the number of text sets having a ratio higher than the predetermined ratio being more than one, display a UI in which the plurality of text sets are arranged in the order of at least one ratio of the first ratio and the second ratio of each of the plurality of text sets.
- the processor may be configured to determine one text set among the plurality of text sets based on a ratio of the extracted words in the plurality of text sets and an order of the extracted words.
- the processor may be configured to search for candidate text sets among the plurality of text sets based on the extracted words and an order of the extracted words, and determine at least one text set based on a ratio of the extracted words in the searched candidate text sets.
- the processor may be configured to execute an event corresponding to the determined text set.
- the electronic device may further include a communicator for receiving EPG information, and the processor may be configured to store broadcast program information included in the EPG information in the memory as the text set.
- the processor may be configured to, based on the determined text set corresponding to the EPG information, generate an event regarding a broadcast program corresponding to the EPG information.
- the event regarding the broadcast program may be at least one of changing, recording, scheduled recording, and scheduled viewing of a channel of the broadcast program.
- a speech recognition method of an electronic device including converting an input speech into text, extracting a plurality of words based on the converted texts, searching for words in each of a plurality of text sets stored in advance common to the plurality of extracted words and determining at least one text set among the plurality of text sets based on a ratio of the searched common words, and executing an event corresponding to the determined text set.
- the determining may include calculating a first ratio of the searched common words in the text set, calculating a second ratio of the searched common words in the converted text, and selecting at least one text set based on the calculated first ratio and the calculated second ratio.
- the selecting may include selecting a text set having at least one ratio of the first ratio and the second ratio that is higher than a predetermined ratio as at least one text set.
- the speech recognition method may further include, based on the number of text sets having a ratio higher than the predetermined ratio being more than one, displaying a UI in which the plurality of text sets are arranged in the order of at least one ratio of the first ratio and the second ratio of each of the plurality of text sets.
- the determining may include determining one text set among the plurality of text sets based on a ratio of the extracted words in the plurality of text sets and an order of the extracted words.
- the determining may include a first step of searching for candidate text sets based on the extracted words in the plurality of text sets and an order of the extracted words, and a second step of determining at least one text set based on a ratio of the extracted words in the searched candidate text sets.
- the speech recognition method may further include receiving EPG information, and storing broadcast program information included in the EPG information in a memory as the text set.
- the executing an event may include, based on the determined text set corresponding to the EPG information, executing an event regarding a broadcast program corresponding to the EPG information.
- the event regarding the broadcast program is at least one of changing, recording, scheduled recording, and scheduled viewing of a channel of the broadcast program.
- a computer-readable recording medium including a program for executing a speech recognition method of an electronic device, the speech recognition method including converting an input speech into text, extracting a plurality of words based on the converted texts, determining at least one text set among a plurality of text sets stored in advance based on a ratio of the extracted words in the plurality of text sets, and executing an event corresponding to the determined text set.
- FIG. 1 is a block diagram showing a configuration of an electronic device of an embodiment
- FIG. 2 is a block diagram showing a configuration of an electronic device according to various embodiments for speech recognition result matching
- FIG. 3 is a block diagram showing a configuration of a processor and a text set stored in a memory by specifying FIG. 1 ,
- FIG. 4 is a view for describing an example of an electronic device which recognizes an entire text from utterance of some text of a user and executes a function corresponding to the entire text
- FIG. 5 is a view for describing an example of an electronic device which shares a speech signal and text information with a server for the speech recognition result matching and outputs the recognition result
- FIG. 6 is a view for describing an example in which the speech recognition result matching is applied to an EPG
- FIG. 7 is a flowchart showing steps of a speech recognition method performed by an electronic device as an embodiment
- FIG. 8 is a flowchart for describing another embodiment showing a determination process for speech recognition performed by the electronic device.
- ordinal umbers such as “first”, “second”, and the like may be used for distinguishing an element from another in the specification and claims.
- Such ordinal numbers are used for distinguishing the same or similar elements from each other and the meaning of the term should not be limitedly interpreted by using such ordinal numbers.
- an element with such an ordinal number should not be limited in terms of a usage order or arrangement order due to the number thereof. If necessary, the ordinal numbers may be interchangeably used.
- a term such as “module”, “unit”, or “part” in embodiments of the disclosure is a term indicating an element performing at least one function or operation, and such an element may be implemented as hardware, software, or a combination of hardware and software. Further, except for when each of a plurality of “modules”, “units”, “parts” and the like needs to be realized in an individual specific hardware, the components may be integrated as at least one module or chip and be implemented as at least one processor (not shown).
- a certain part is connected to another part, this includes not only direct connection but also indirect connection through still another medium.
- the expression “a certain part includes another element” does not exclude other elements not disclosed, but means that other elements may be further included, unless otherwise noted.
- FIG. 1 is a block diagram showing a configuration of an electronic device of an embodiment of the disclosure.
- an electronic device 100 includes a microphone 110 , a memory 120 , and a processor 130 .
- the electronic device 100 may be implemented as, for example, an analog TV, a digital TV, a 3D TV, a smart TV, an LED TV, an OLED TV, a plasma TV, a monitor, a curved TV with a screen having a fixed curvature, a flexible TV with a screen having a fixed curvature, a bended TV with a screen having a fixed curvature, and/or a bendable TV having an adjustable curvature of a current screen by a received user input, but there is no limitation thereto.
- the microphone 110 may receive a user's speech and generate a speech signal corresponding to the received speech.
- the microphone 110 is implemented to be provided in the electronic device 100 , but may be an external microphone separately configured outside of the electronic device 100 and electrically connected to the electronic device 100 .
- the memory 120 may be implemented as a non-volatile memory (e.g., hard disk, solid state drive (SSD), or flash memory), a volatile memory, and the like, and may store text information, image contents, and information regarding the functions of the electronic device 100 .
- a non-volatile memory e.g., hard disk, solid state drive (SSD), or flash memory
- the memory 120 may store a plurality of text sets.
- each of the text sets may include a sentence regarding daily information such as “What time is it now?” or “How is the weather today?”, an instruction sentence regarding a function executable by the electronic device 100 , a sentence requesting for specific information, and the like, and may also include electronic program guide (EPG) information.
- EPG electronic program guide
- the text set is not limited to the above-mentioned elements.
- the information regarding the plurality of text sets stored in the memory 120 may be added, removed, or changed by the control of the processor 130 .
- the processor 130 may control the microphone 110 to receive a speech, control the memory 120 to store data, or receive data from the memory 120 .
- the processor 130 may be an element configured to control elements included in the electronic device 100 .
- the processor 130 may convert a speech signal obtained by receiving utterance of a user via the microphone 110 into text, and then compare this with each of the text sets stored in the memory 120 in a word unit.
- the comparison is not limitedly performed in the word unit and may be performed in a unit of syllable, consonant, vowel, letter, or alphabet.
- the processor 130 may determine a text set having a highest degree of coincidence by considering a ratio or an order of words coinciding between the text converted from the speech signal and each of the text sets stored in the memory.
- the processor 130 may determine one or more candidates among the texts based on the order of words, and determine a final text set by considering a ratio of words coinciding between the determined candidates and the text converted from the speech signal.
- the processor 130 may calculate the ratio of words in each of the text sets coinciding with the text converted from the speech signal, set only text sets having a ratio higher than a predetermined threshold value as candidates, and select a text set having a highest ratio of words coinciding with the text converted from the speech signal among the candidates.
- the processor 130 may control the electronic device 100 to execute functions of notifying a user that no results are obtained, requesting additional utterance, or inquiring about resetting of the predetermined threshold value, and the like.
- the processor 130 may search for words in each of the plurality of stored text sets common to the words in the text converted from the speech signal, and determine at least one text set based on a first ratio of common words searched in each of the text sets and a second ratio of common words searched in the text converted from the speech signal.
- the processor 130 may determine a text set having at least one ratio of the first ratio and the second ratio that is higher than the predetermined ratio as a result text set. If the number of text sets having a ratio higher than the predetermined ratio is more than one, the processor 130 may select a text set having a highest first ratio or second ratio among the plurality of text sets as the result text set, or display a UI in which the plurality of text sets satisfying the predetermined ratio are arranged in the order of at least one ratio of the first ratio and the second ratio thereof and allow a user to select them.
- ratios as threshold values may be set differently. In regard to each of the first ratio and the second ratio, if the number of text sets having a ratio higher than the predetermined ratio is more than necessary, the ratio as the threshold value may be decreased, and if there are no text sets having a ratio higher than the predetermined ratio, the ratio as the threshold value may be increased.
- the predetermined threshold value may be set in accordance with a kind of language, a frequency of use of each word included in the text converted from the speech signal, a kind of electronic device, the number of standard texts including each word included in the text converted from the speech signal, sentence completeness of the text converted from the speech signal, a degree of noise of the speech signal, and the like.
- the coincidence of words it may be determined which unit is to be used, by using a length of time of the speech signal, a data volume of the text converted from the speech signal, a kind of language, a kind of electronic device, a frequency of use of each word included in the text converted from the speech signal, an error possibility of each word included in the text converted from the speech signal determined by the electronic device, and the like.
- the processor 130 that has determined the text set according to the various determination methods described above may execute an event corresponding to the determined text set.
- FIG. 2 is a block diagram showing a configuration in a case where the electronic device 100 is implemented as a TV as one of various embodiments of the disclosure for speech recognition result matching.
- the description regarding the configuration overlapped with FIG. 1 will be omitted.
- the electronic device 100 may include the microphone 110 , the memory 120 , the processor 130 , a display 140 , a speech output unit 150 , and a communicator 160 as hardware elements.
- the electronic device 100 may further include a broadcast receiver 170 that receives a broadcast signal.
- the processor 130 may include a RAM 131 , a ROM 132 , a CPU 133 , and a system bus 134 .
- the RAM 131 , the ROM 132 , the CPU 133 , and the like may be connected to each other via the system bus 134 and transmit and receive various pieces of data or signals.
- the ROM 132 stores instruction sets for system booting, and the CPU 133 copies an operating system stored in a storage (not shown) of the electronic device 100 according to the instruction stored in the ROM 132 and boots up the system by executing the 0 /S.
- the CPU 133 may copy various applications stored in the storage of the electronic device 100 to the RAM 131 and execute various operations by executing the applications. Only one CPU 133 is included in FIG. 3 , but in the implementation, this may be implemented as a plurality of CPUs, DSPs, or SoCs.
- the CPU 133 accesses the storage (not shown) and executes the booting by using the 0 /S stored in the storage.
- the CPU 133 executes various operations by using various programs, contents, pieces of data, and the like stored in the storage.
- the memory 120 may be an element included in the storage or the storage may be an element included in the memory 120 .
- the display 140 may be implemented as a liquid crystal display (LCD), a cathode-ray tube (CRT), plasma display panel (PDP), organic light emitting diodes (OLED), a transparent OLED (TOLED), and the like. If the display 140 is formed of the LCD, the display 140 may include a driving circuit or a backlight unit which may be implemented in a form of an a-si TFT, a low temperature poly silicon (LTPS) TFT, an organic TFT (OTFT), and the like. The display 140 may be implemented in a form of touch screen that is able to detect a touch manipulation of a user.
- LCD liquid crystal display
- CRT cathode-ray tube
- PDP plasma display panel
- OLED organic light emitting diodes
- TOLED transparent OLED
- the display 140 may include a driving circuit or a backlight unit which may be implemented in a form of an a-si TFT, a low temperature poly silicon (LTPS) TFT, an organic TFT (OTFT),
- the speech output unit 150 is an element for outputting sounds, and may output, for example, a sound included in a broadcast signal received via a tuner (not shown), a sound input via the communicator 160 or the microphone 110 , or a sound included in an audio file stored in the memory 120 .
- the audio output unit 150 may include a speaker 151 and a headphone output terminal 152 .
- the audio output unit 150 may receive a result signal from the processor 130 may output a sound corresponding to the control of the processor 130 .
- the communicator 160 is an element executing communication with various kinds of external devices according to communication systems.
- the communicator 160 may be connected to an external device via a local area network (LAN) or the Internet, or may be connected to an external device in a wireless communication system (for example, wireless communication such as Z-wave, 4 LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless, Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, Bluetooth, Wi-Fi, Wi-Fi Direct, GSM, UMTS, LTE, or WiBRO).
- wireless communication such as Z-wave, 4 LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless, Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, Bluetooth, Wi-Fi, Wi-Fi Direct, GSM, UMTS, LTE, or WiBRO).
- the communicator 160 includes various communication chips such as a Wi-Fi chip 161 , a Bluetooth chip 162 , an NFC chip 163 , and a wireless communication chip 164 .
- the Wi-Fi chip 161 , the Bluetooth chip 162 , and the NFC chip 163 execute communication in the Wi-Fi system, the Bluetooth system, and the NFC system, respectively.
- the communicator 160 may include an optical receiver 165 that is able to receive a control signal (for example, an IR pulse) from an external device (not shown) or a server (not shown).
- a control signal for example, an IR pulse
- the wireless communication chip 164 indicates a chip executing the communication based on various communication standards such as IEEE, Zigbee, 3rd Generation (3G), 3rd Generation Partnership Project (3GPP), or Long Term Evolution (LTE).
- various communication standards such as IEEE, Zigbee, 3rd Generation (3G), 3rd Generation Partnership Project (3GPP), or Long Term Evolution (LTE).
- the CPU 133 determines the text set corresponding to the speech signal among the plurality of text sets stored in the memory 120 in advance and then executes a function corresponding to the determined text set.
- the CPU 133 may control the display 140 to display the text converted from the input speech signal, the determined text set, or the result of the execution of the function corresponding to the determined text set on the display 140 .
- the CPU 133 may control the audio output unit 150 to output the determined text set or the result of the execution of the function corresponding to the determined text set as sounds via the speaker 151 or the headphone 152 .
- the communicator 160 that is able to transmit and receive data to and from an external device (not shown) may transmit the speech signal, the converted text of the speech signal, or the determined text set information to an external device or receive additional text set information from an external device under the control of the CPU 133 .
- the CPU 133 may add, change, or remove the text set information stored in the memory 120 .
- the broadcast receiver 170 may tune and select only a frequency of a channel to be received by the electronic device 100 among a plurality of radio wave components through amplification, mixing, and resonance of a broadcast signal received in a wired or wireless manner.
- the broadcast signal may include videos, sounds, and additional data (e.g., electronic program guide (EPG)).
- EPG electronic program guide
- the broadcast receiver 170 may receive videos, sounds, and data in a frequency band corresponding to a channel number corresponding to a user input.
- the broadcast receiver 170 may receive a broadcast signal from various sources such as terrestrial broadcast, cable broadcast, satellite broadcast, and the like.
- the broadcast receiver 170 may also receive a broadcast signal from a source such as an analogue broadcast or a digital broadcast.
- the broadcast receiver may be implemented as a component of the all-in-one electronic device 100 or may be implemented as a separate device (e.g., a set-top box or a tuner connected to an input and output unit (not shown)) including a broadcast receiving unit electrically connected to the electronic device 100 .
- FIG. 3 is a block diagram showing a configuration of the processor and an embodiment of including the memory storing the text set.
- the processor 130 may include a speech recognition unit 310 and a matching unit 320 .
- the speech recognition unit 310 may convert a speech signal obtained by an input of utterance of a user from the microphone 110 into text data and transmit the text data obtained by converting the speech signal to the matching unit 320 .
- the matching unit 320 may receive each of the text sets stored in the memory 120 from the memory 120 and determine a text set having a highest
- the matching unit 320 may divide the text converted from the input speech signal in a unit of words, calculate the numbers of divided words and coinciding words included in each of the text sets, and determine a text set having a highest degree of coincidence based on the ratio of the numbers of words included.
- FIG. 4 is a view for describing an example of an electronic device which recognizes the entire text from utterance of some text of a user and executes a function corresponding to the entire text.
- FIG. 4 shows a user 410 uttering a voice and an electronic device 420 .
- a user 410 who wants to know the current time may wish to receive information regarding the current time from the electronic device 420 visually or acoustically by uttering a sentence such as “What time is it now?”.
- the sentence “What time is it now?” is an example of the text set mentioned above.
- FIG. 4 shows a case where the user 410 utters only “What time now”, instead of the entire sentence such as “What time is it now?”.
- the electronic device 420 may acquire a text “What time now” by converting a speech signal based on the utterance of the user and then, detect “What”, “time”, and “now” from “What time now”.
- the electronic device 420 may select text sets including the words coinciding with “What”, “time”, or “now” at a predetermined ratio or higher among the plurality of text sets stored in advance as candidates, and determine “What time is it now?” which is a text set having a highest ratio as a sentence intended by a user.
- the electronic device 420 may display that the electronic device 420 has recognized “What time is it now?” based on the result obtained by determining “What time is it now?” including words coinciding with any one of “What”, “time”, or “now” among the plurality of sets at the highest ratio as the utterance of the user. Since the text, that is, “What time is it now?” is a question, the electronic device 420 may display “It's 2 PM” as an answer of the text together with “What time is it now?” and output the answer as a sound.
- the embodiment of the disclosure is not limited to FIG. 4 .
- the electronic device 420 is a TV and a content of the utterance of a user is determined as a text set for changing a channel to channel 999
- the electronic device 420 may execute the function of the electronic device 100 as itself such as changing the channel of the TV to channel 999 .
- FIG. 5 is a view for describing embodiments of an electronic device and a server outputting a recognition result by transmitting and receiving a speech signal and text information for speech recognition result matching.
- the electronic device 500 may transmit a speech signal converted from a speech of the user to a server 520 and convert the speech signal into corresponding text at the same time.
- the server 520 which has received the speech signal converts the speech signal into the corresponding text, extracts words in each of a plurality of text sets stored in the server 520 in advance common to words in the text converted from the speech signal, and selects text sets as candidates by considering the order or ratio of the common words included.
- the electronic device 500 may extract words in each of the candidate text sets selected by the server 520 that are common to words in the text converted from the speech signal, and determines one text set from the candidate text sets by considering the order or ratio of the common words included.
- the electronic device 500 may transmit data regarding the text converted from the speech signal to the server 520 , not the speech signal.
- the server 520 is not necessarily convert the speech signal into text autonomously.
- the server 520 may give a feedback about information regarding the text converted from the speech signal by the electronic device 500 .
- the server 520 may determine whether or not each of all of the text sets stored in advance becomes the candidate text and continuously transmit the determined candidate text to the electronic device 500 in real time, and at the same time, the electronic device 500 may calculate the ratio of each of the candidate text sets received from the server 520 in real time, thereby reducing a period of time for deriving the final result.
- the electronic device 500 may execute an input of a speech signal and an output of a result mainly in a relationship with the user, and the server 520 may select a result text set by using data corresponding to the speech signal received from the electronic device 500 and a plurality of text sets stored in the server 520 in advance, and then transmit the selected text set to the electronic device 500 again.
- the server 520 may determine one or more candidates from the text sets based on the order of words, and finally determine one text set by considering the ratio of words in only the determined candidates coinciding with words in the text converted from the speech signal.
- the server 520 may calculate a ratio of words in each of the stored text sets coinciding with words in the text converted from the speech signal, set only the text sets having a ratio higher than a predetermined threshold value as candidates, and select a text set having a highest ratio of the words coinciding with the text converted from the speech signal among the candidates.
- the server 520 may search for a word in each of the plurality of stored text sets common to a word in the text converted from the speech signal, and determine at least one text set based on a first ratio of the common word searched in each of the text sets and a second ratio of the common word searched in the text converted from the speech signal.
- the server 520 may determine a text set having at least one ratio of the first ratio and the second ratio that is higher than the predetermined ratio as the at least one text set. If the number of text sets having a ratio higher than the predetermined ratio is more than one, the server 520 may select a text set having a highest first ratio or second ratio among the plurality of text sets as the result text set and transmit the result text set to the electronic device 500 , or transmit data in which the plurality of text sets satisfying the predetermined ratio are arranged in the order of at least one ratio of the first ratio and the second ratio thereof to the electronic device 500 and display a UI corresponding to the data on the electronic device 500 to allow a user to directly select the text set.
- the server 520 may set the ratios as threshold values differently. In regard to each of the first ratio and the second ratio, if the number of text sets having a ratio higher than the predetermined ratio is more than necessary, the ratio as the threshold value may be decreased, and if there are no text sets having a ratio higher than the predetermined ratio, the ratio as the threshold value may be increased.
- the predetermined threshold value may be set in accordance with a kind of language, a frequency of use of each word included in the text converted from the speech signal, a kind of electronic device, the number of standard texts including each word included in the text converted from the speech signal, sentence completeness of the text converted from the speech signal, a degree of noise of the speech signal, and the like.
- the server 520 may transmit an instruction to allow the electronic device 500 to execute functions of notifying a user that no results are obtained, requesting additional utterance, or inquiring about resetting of the predetermined threshold value, and the like.
- the information regarding the plurality of text sets stored in the server 520 may be updated, removed, or changed by using data transmitted to the server 520 from another external device (not shown) connectable to the server 520 .
- the electronic device 500 may transmit the text converted from the speech signal to the server 520 or transmit data in a form other than the speech signal or the text to the server 520 . If the data is received in a form other than the text, the text corresponding to the speech signal may be directly extracted in the server 520 .
- the server 520 may transmit the data regarding the determined text set to the electronic device 500 and also transmit an output instruction of the determined text set or an execution instruction of an event corresponding to the text set to the electronic device 500 .
- the electronic device 500 may display the text set determined by the server 520 through a display or output a sound via a speaker or a headphone according to the instruction of the server 520 .
- the electronic device 500 may display an answer of this question through a display or output a sound via a speaker or a headphone according to an instruction of the server 520 .
- the electronic device 500 may execute a function or an event corresponding to the text set determined by the server 590 according to the instruction of the server 590 .
- the electronic device 500 may output the text set determined by the server 590 or execute the function according to the instruction of the server 590 and then transmit report data reporting the output or execution to the server. In such a case, the server 590 which has not received the report data even after the predetermined period of time may transmit the instruction again.
- the updating, removing, or changing of the plurality of pieces of text set information stored in the server 520 may be performed by considering the number of times or the ratio of the selection of each of the text sets stored as the speech recognition result of external devices connectable to the server.
- the server 520 may store the plurality of pieces of the text set information by dividing these for each external device connectable to the server and capable of performing speech recognition.
- each electronic device 500 if the electronic device 500 is a navigations system, a ratio of the text sets corresponding to place names or traffic information may be high in a plurality of text sets stored in the server 520 for speech recognition of the navigation system.
- a ratio of text sets regarding broadcast programs or TV functions may be high in a plurality of text sets stored in the server 520 for speech recognition of the TV.
- a plurality of text sets stored in the server 520 for speech recognition of the smartphone may be stored separately for each application used in the smartphone.
- the function executed by the electronic device 500 and the server 520 separately may be performed by two configurations in one electronic device.
- FIG. 6 is a view for describing an example in which the speech recognition result matching is applied to an EPG.
- FIG. 6 shows speech recognition and matching between a user 610 , an electronic device 600 , and a server 620 as an embodiment in which the electronic device 600 is a TV.
- the electronic device 600 may receive electronic program guide (EPG) information from the server 620 by using a communicator (not shown) or receive the EPG information in a form of a broadcast signal including the EPG information that is received from the outside from a broadcast receiver (not shown), and FIG. 6 shows a state in which the electronic device 600 extracts broadcast program information and information regarding an EPG function included in the received EPG information as text sets and stores the extracted text sets in a memory (not shown).
- EPG electronic program guide
- the electronic device 600 may generate an event regarding the corresponding EPG information.
- the communicator (not shown) and the memory (not shown) may be provided in the electronic device 600 or separately provided and electrically connected thereto.
- EPG information 605 regarding the current time among the pieces of the received EPG information may be displayed on the electronic device 600 as the EPG information 605 of the corresponding time by an instruction of the user 610 , and pieces of broadcast program information included in the EPG 605 of the corresponding time are converted into text sets and stored in the memory in advance.
- the electronic device 606 may recognize that the “Now Testing Broadcast” 606 having a highest degree of coincidence of the words among the broadcast programs or functions included in the EPG information 605 of the corresponding time corresponds to the intention of the utterance of the user, by considering that the EPG is currently displayed.
- the electronic device 600 may change the channel to the channel on which the “Now Testing Broadcast” 606 is broadcast, display specific information regarding the “Now Testing Broadcast” 606 , or recording, scheduled recording, scheduled viewing, and the like regarding the corresponding channel.
- FIG. 7 is a flowchart showing steps of a speech recognition method performed by the electronic device as an embodiment of the disclosure.
- a speech input through utterance of a user is converted into a speech signal via a microphone and the converted speech signal is converted again into text (S 710 ).
- the speech of the user to be converted may have a pause of the speech equal to or shorter than a predetermined period of time or may have an entire time length of the user's speech not longer than another predetermined period of time.
- words included in the text converted from the speech signal is extracted (S 720 ).
- the number of extracted words may be stored in the electronic device.
- the parts not extracted as words may be words stored in advance, and therefore, data regarding the parts may be transmitted to an external device connectable to the electronic device to acquire necessary information regarding the parts not extracted as words.
- the matters to be extracted from the text converted from the speech signal are not necessarily in the unit of words. That is, the extraction may be performed in different kinds of units of syllables, letters, consonants, vowels, and alphabets according to kinds of languages, and the extracted matter may be compared with the plurality of text sets stored in advance.
- the words in each of the text sets stored in advance common to the words in the text converted from the speech signal may be searched and extracted (S 730 ).
- the number of words in each of the text sets stored in advance common to the words in the text converted from the speech signal may be stored.
- one text set is finally determined by using a ratio of the words in each of the text sets stored in advance common to the words in the text converted from the speech signal (S 740 ). At this time, not only the ratio, but also the order of the common words included may be considered.
- Candidates may be selected among the plurality of text sets stored in advance based on the common words and the order of the common words included, and one text set may be finally determined by using the ratio of the common words among the plurality of text sets.
- to use the ratio of the words in each of the text sets stored in advance common to the words in the converted text may mean to calculate and use at least any one of a first ratio between the number of common words and the number of words extracted from the text converted from the speech signal and a second ratio between the number of common words and the number of words extracted from each of the text sets stored in advance.
- a threshold value may be set for the first ratio and the second ratio, and only text sets having a ratio higher than the threshold value may be set as candidates or the candidates may be set based on the order of the common words included. Then, one text set may be determined among the text sets stored as the candidates, by using at least any one of the first ratio or the second ratio.
- a UI in which the plurality of text sets are arranged in the order of at least one ratio of the first ratio and the second ratio thereof may be displayed.
- an event corresponding to the determined text set may be executed (S 750 ). Specifically, the determined text set may be simply displayed or output as a sound as it is, or if the determined text set is a question, an answer of the question may be displayed or output as a sound. If the determined text set relates to a specific function of the electronic device, the corresponding function may be executed by the electronic device.
- information regarding a speech signal not matched may be transmitted to an external device such as a server, and additional information regarding this may be received again and stored in the electronic device.
- the function corresponding to the intention of the user may be executed by using the additional information stored in advance.
- a step of receiving the EPG information and storing broadcast program information included in the EPG information in a memory as text sets may be further included in advance.
- Step S 750 of executing an event corresponding to the determined text set when the determined text set corresponds to the EPG information stored in advance, functions such as changing, recording, scheduled recording, scheduled viewing, information providing, and the like of the channel of the broadcast program corresponding to the EPG information may be provided to a user.
- FIG. 8 is a flowchart for describing another specific embodiment showing a determination process for speech recognition performed by the electronic device.
- n the number of candidate sentences is denoted as n
- A a recognition result sentence
- Si a candidate sentence currently being processed
- the candidate sentences correspond to the text sets stored in advance and the recognition result sentence corresponds to the text converted from the speech signal. In FIG. 8 , it is assumed that 1 ⁇ i ⁇ n.
- all of the sentences stored in advance may be simply selected as candidates or the selection may be performed based on at least any one of the order or the ratio of words included therein common to words in the recognition result sentence.
- a speech recognition result sentence A based on the speech may be extracted (S 810 ).
- this is a process of obtaining text by converting a speech signal.
- words AS i commonly included in the recognition result text A and each of the candidate sentences S i may be extracted (S 815 ). Based on this, a recognition result inclusion degree of AS i with respect to A, that is, Ratio (A i ) which is a ratio of AS i included in A may be calculated (S 820 ). In addition, a candidate sentence inclusion degree of AS i with respect to S i , that is, Ratio (S i ) which is a ratio of AS i included in Si may be calculated (S 825 ).
- Steps S 815 to S 825 may be repeated with respect to all of the candidate sentences stored in advance (S 830 ), and accordingly, this process may be repeated by n times that is the number of candidate sentences.
- the candidate sentences having Ratio (A i ) equal to or higher than a threshold value TH(A) of the recognition result inclusion degree or Ratio (S i ) equal to or higher than a threshold value TH(S) of the candidate sentence inclusion degree(S 835 ). Then, among the candidate sentences extracted in S 835 , a candidate sentence S k having the maximum recognition result inclusion degree may be extracted (S 840 ).
- the number of words of the sentence is extracted and used for calculating the ratio, but there is no limitation to the above examples, since the process may be performed based on any one of the number of letters, the number of phrases, the number of syllables, not only the number of words.
- the language which is a reference of the speech recognition may be a language set as a default language, manually set by a user, or automatically set based on the language set for objects of the electronic device 100 .
- the language set for objects may be confirmed by applying optical character recognition (OCR) to objects displayed on the electronic device 100 .
- OCR optical character recognition
- the embodiments described above may be implemented in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof.
- the embodiments described in the disclosure may be implemented by using at least one of Application Specific Integrated Circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, and an electrical unit for executing other functions.
- ASICs Application Specific Integrated Circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- processors controllers, micro-controllers, microprocessors, and an electrical unit for executing other functions.
- the embodiments described in the disclosure may be implemented as the processor 130 itself.
- the embodiments such as the procedures and functions described in the disclosure may be implemented as separate software modules. Each of the software modules described above may execute one or more functions and operations described in the disclosure.
- Computer instructions for executing processing operations by the electronic device 100 according to the embodiments of the disclosure descried above may be stored in a non-transitory computer-readable medium.
- the computer instructions stored in such a non-transitory computer-readable medium are executed by a processor of a specific machine, the computer instructions may enable the specific machine to execute the processing operations by the electronic device 100 according to the embodiments described above.
- the non-transitory computer-readable medium is not a medium storing data for a short period of time such as a register, a cache, or a memory, but means a medium that semi-permanently stores data and is readable by a machine.
- Specific examples of the non-transitory computer-readable medium may include a CD, a DVD, a hard disk, a Blu-ray disc, a USB, a memory card, and a ROM.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (13)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2017-0143213 | 2017-10-31 | ||
| KR1020170143213A KR102452644B1 (en) | 2017-10-31 | 2017-10-31 | Electronic apparatus, voice recognition method and storage medium |
| PCT/KR2018/012750 WO2019088571A1 (en) | 2017-10-31 | 2018-10-25 | Electronic device, speech recognition method, and recording medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200280767A1 US20200280767A1 (en) | 2020-09-03 |
| US11223878B2 true US11223878B2 (en) | 2022-01-11 |
Family
ID=66332139
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/756,382 Active US11223878B2 (en) | 2017-10-31 | 2018-10-25 | Electronic device, speech recognition method, and recording medium |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11223878B2 (en) |
| EP (1) | EP3678131B1 (en) |
| KR (1) | KR102452644B1 (en) |
| WO (1) | WO2019088571A1 (en) |
Citations (54)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4570232A (en) * | 1981-12-21 | 1986-02-11 | Nippon Telegraph & Telephone Public Corporation | Speech recognition apparatus |
| US5598557A (en) * | 1992-09-22 | 1997-01-28 | Caere Corporation | Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files |
| US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
| US20020055950A1 (en) * | 1998-12-23 | 2002-05-09 | Arabesque Communications, Inc. | Synchronizing audio and text of multimedia segments |
| US20020093591A1 (en) * | 2000-12-12 | 2002-07-18 | Nec Usa, Inc. | Creating audio-centric, imagecentric, and integrated audio visual summaries |
| US6442518B1 (en) * | 1999-07-14 | 2002-08-27 | Compaq Information Technologies Group, L.P. | Method for refining time alignments of closed captions |
| US6473778B1 (en) * | 1998-12-24 | 2002-10-29 | At&T Corporation | Generating hypermedia documents from transcriptions of television programs using parallel text alignment |
| US20030004716A1 (en) | 2001-06-29 | 2003-01-02 | Haigh Karen Z. | Method and apparatus for determining a measure of similarity between natural language sentences |
| US20030025832A1 (en) * | 2001-08-03 | 2003-02-06 | Swart William D. | Video and digital multimedia aggregator content coding and formatting |
| US20030061028A1 (en) * | 2001-09-21 | 2003-03-27 | Knumi Inc. | Tool for automatically mapping multimedia annotations to ontologies |
| US20030169366A1 (en) * | 2002-03-08 | 2003-09-11 | Umberto Lenzi | Method and apparatus for control of closed captioning |
| US20030206717A1 (en) * | 2001-04-20 | 2003-11-06 | Front Porch Digital Inc. | Methods and apparatus for indexing and archiving encoded audio/video data |
| US20040096110A1 (en) * | 2001-04-20 | 2004-05-20 | Front Porch Digital Inc. | Methods and apparatus for archiving, indexing and accessing audio and video data |
| US20050227614A1 (en) * | 2001-12-24 | 2005-10-13 | Hosking Ian M | Captioning system |
| US20060015339A1 (en) * | 1999-03-05 | 2006-01-19 | Canon Kabushiki Kaisha | Database annotation and retrieval |
| US7047191B2 (en) * | 2000-03-06 | 2006-05-16 | Rochester Institute Of Technology | Method and system for providing automated captioning for AV signals |
| US7065524B1 (en) * | 2001-03-30 | 2006-06-20 | Pharsight Corporation | Identification and correction of confounders in a statistical analysis |
| US7092888B1 (en) * | 2001-10-26 | 2006-08-15 | Verizon Corporate Services Group Inc. | Unsupervised training in natural language call routing |
| US20070124147A1 (en) * | 2005-11-30 | 2007-05-31 | International Business Machines Corporation | Methods and apparatus for use in speech recognition systems for identifying unknown words and for adding previously unknown words to vocabularies and grammars of speech recognition systems |
| US20070124788A1 (en) * | 2004-11-25 | 2007-05-31 | Erland Wittkoter | Appliance and method for client-sided synchronization of audio/video content and external data |
| US20070214164A1 (en) * | 2006-03-10 | 2007-09-13 | Microsoft Corporation | Unstructured data in a mining model language |
| KR100764247B1 (en) | 2005-12-28 | 2007-10-08 | 고려대학교 산학협력단 | Apparatus and Method for speech recognition with two-step search |
| US20080066138A1 (en) * | 2006-09-13 | 2008-03-13 | Nortel Networks Limited | Closed captioning language translation |
| KR100825690B1 (en) | 2006-09-15 | 2008-04-29 | 학교법인 포항공과대학교 | How to fix recognition error in speech recognition system |
| US20080255844A1 (en) * | 2007-04-10 | 2008-10-16 | Microsoft Corporation | Minimizing empirical error training and adaptation of statistical language models and context free grammar in automatic speech recognition |
| US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
| US20080270134A1 (en) * | 2005-12-04 | 2008-10-30 | Kohtaroh Miyamoto | Hybrid-captioning system |
| US7509385B1 (en) * | 2008-05-29 | 2009-03-24 | International Business Machines Corporation | Method of system for creating an electronic message |
| US20090171662A1 (en) * | 2007-12-27 | 2009-07-02 | Sehda, Inc. | Robust Information Extraction from Utterances |
| US20100009720A1 (en) | 2008-07-08 | 2010-01-14 | Sun-Hwa Cha | Mobile terminal and text input method thereof |
| US20100091187A1 (en) * | 2008-10-15 | 2010-04-15 | Echostar Technologies L.L.C. | Method and audio/video device for processing caption information |
| US7729917B2 (en) * | 2006-03-24 | 2010-06-01 | Nuance Communications, Inc. | Correction of a caption produced by speech recognition |
| US7739253B1 (en) * | 2005-04-21 | 2010-06-15 | Sonicwall, Inc. | Link-based content ratings of pages |
| US7801910B2 (en) * | 2005-11-09 | 2010-09-21 | Ramp Holdings, Inc. | Method and apparatus for timed tagging of media content |
| US7873654B2 (en) * | 2005-01-24 | 2011-01-18 | The Intellection Group, Inc. | Multimodal natural language query system for processing and analyzing voice and proximity-based queries |
| US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
| US20110040559A1 (en) * | 2009-08-17 | 2011-02-17 | At&T Intellectual Property I, L.P. | Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment |
| US7962331B2 (en) * | 2003-12-01 | 2011-06-14 | Lumenvox, Llc | System and method for tuning and testing in a speech recognition system |
| US8121432B2 (en) * | 2005-08-24 | 2012-02-21 | International Business Machines Corporation | System and method for semantic video segmentation based on joint audiovisual and text analysis |
| US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
| US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
| US20120253799A1 (en) * | 2011-03-28 | 2012-10-04 | At&T Intellectual Property I, L.P. | System and method for rapid customization of speech recognition models |
| KR20120135855A (en) | 2011-06-07 | 2012-12-17 | 삼성전자주식회사 | Display apparatus and method for executing hyperlink and method for recogniting voice thereof |
| US8423363B2 (en) * | 2009-01-13 | 2013-04-16 | CRIM (Centre de Recherche Informatique de Montréal) | Identifying keyword occurrences in audio data |
| US8572488B2 (en) * | 2010-03-29 | 2013-10-29 | Avid Technology, Inc. | Spot dialog editor |
| JP5474723B2 (en) | 2010-09-30 | 2014-04-16 | Kddi株式会社 | Speech recognition apparatus and control program therefor |
| KR101495183B1 (en) | 2008-12-01 | 2015-02-24 | 엘지전자 주식회사 | Terminal and method for controlling the same |
| JP2015052743A (en) | 2013-09-09 | 2015-03-19 | Necパーソナルコンピュータ株式会社 | Information processing apparatus, information processing apparatus control method, and program |
| KR20150089145A (en) | 2014-01-27 | 2015-08-05 | 삼성전자주식회사 | display apparatus for performing a voice control and method therefor |
| US9183832B2 (en) | 2011-06-07 | 2015-11-10 | Samsung Electronics Co., Ltd. | Display apparatus and method for executing link and method for recognizing voice thereof |
| KR20160060405A (en) | 2014-11-20 | 2016-05-30 | 삼성전자주식회사 | Apparatus and method for registration of user command |
| US9392324B1 (en) | 2015-03-30 | 2016-07-12 | Rovi Guides, Inc. | Systems and methods for identifying and storing a portion of a media asset |
| US20170186422A1 (en) | 2012-12-29 | 2017-06-29 | Genesys Telecommunications Laboratories, Inc. | Fast out-of-vocabulary search in automatic speech recognition systems |
| US10573302B2 (en) * | 2015-03-13 | 2020-02-25 | Lg Electronics Inc. | Terminal and home appliance system including the same |
-
2017
- 2017-10-31 KR KR1020170143213A patent/KR102452644B1/en active Active
-
2018
- 2018-10-25 EP EP18874744.8A patent/EP3678131B1/en active Active
- 2018-10-25 WO PCT/KR2018/012750 patent/WO2019088571A1/en not_active Ceased
- 2018-10-25 US US16/756,382 patent/US11223878B2/en active Active
Patent Citations (59)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4570232A (en) * | 1981-12-21 | 1986-02-11 | Nippon Telegraph & Telephone Public Corporation | Speech recognition apparatus |
| US5598557A (en) * | 1992-09-22 | 1997-01-28 | Caere Corporation | Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files |
| US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
| US20020055950A1 (en) * | 1998-12-23 | 2002-05-09 | Arabesque Communications, Inc. | Synchronizing audio and text of multimedia segments |
| US6473778B1 (en) * | 1998-12-24 | 2002-10-29 | At&T Corporation | Generating hypermedia documents from transcriptions of television programs using parallel text alignment |
| US20060015339A1 (en) * | 1999-03-05 | 2006-01-19 | Canon Kabushiki Kaisha | Database annotation and retrieval |
| US6442518B1 (en) * | 1999-07-14 | 2002-08-27 | Compaq Information Technologies Group, L.P. | Method for refining time alignments of closed captions |
| US7047191B2 (en) * | 2000-03-06 | 2006-05-16 | Rochester Institute Of Technology | Method and system for providing automated captioning for AV signals |
| US20020093591A1 (en) * | 2000-12-12 | 2002-07-18 | Nec Usa, Inc. | Creating audio-centric, imagecentric, and integrated audio visual summaries |
| US7065524B1 (en) * | 2001-03-30 | 2006-06-20 | Pharsight Corporation | Identification and correction of confounders in a statistical analysis |
| US20030206717A1 (en) * | 2001-04-20 | 2003-11-06 | Front Porch Digital Inc. | Methods and apparatus for indexing and archiving encoded audio/video data |
| US20040096110A1 (en) * | 2001-04-20 | 2004-05-20 | Front Porch Digital Inc. | Methods and apparatus for archiving, indexing and accessing audio and video data |
| US20030004716A1 (en) | 2001-06-29 | 2003-01-02 | Haigh Karen Z. | Method and apparatus for determining a measure of similarity between natural language sentences |
| US20030025832A1 (en) * | 2001-08-03 | 2003-02-06 | Swart William D. | Video and digital multimedia aggregator content coding and formatting |
| US20030061028A1 (en) * | 2001-09-21 | 2003-03-27 | Knumi Inc. | Tool for automatically mapping multimedia annotations to ontologies |
| US7092888B1 (en) * | 2001-10-26 | 2006-08-15 | Verizon Corporate Services Group Inc. | Unsupervised training in natural language call routing |
| US20050227614A1 (en) * | 2001-12-24 | 2005-10-13 | Hosking Ian M | Captioning system |
| US20030169366A1 (en) * | 2002-03-08 | 2003-09-11 | Umberto Lenzi | Method and apparatus for control of closed captioning |
| US7962331B2 (en) * | 2003-12-01 | 2011-06-14 | Lumenvox, Llc | System and method for tuning and testing in a speech recognition system |
| US20070124788A1 (en) * | 2004-11-25 | 2007-05-31 | Erland Wittkoter | Appliance and method for client-sided synchronization of audio/video content and external data |
| US7873654B2 (en) * | 2005-01-24 | 2011-01-18 | The Intellection Group, Inc. | Multimodal natural language query system for processing and analyzing voice and proximity-based queries |
| US7739253B1 (en) * | 2005-04-21 | 2010-06-15 | Sonicwall, Inc. | Link-based content ratings of pages |
| US8121432B2 (en) * | 2005-08-24 | 2012-02-21 | International Business Machines Corporation | System and method for semantic video segmentation based on joint audiovisual and text analysis |
| US7801910B2 (en) * | 2005-11-09 | 2010-09-21 | Ramp Holdings, Inc. | Method and apparatus for timed tagging of media content |
| US20070124147A1 (en) * | 2005-11-30 | 2007-05-31 | International Business Machines Corporation | Methods and apparatus for use in speech recognition systems for identifying unknown words and for adding previously unknown words to vocabularies and grammars of speech recognition systems |
| US20080270134A1 (en) * | 2005-12-04 | 2008-10-30 | Kohtaroh Miyamoto | Hybrid-captioning system |
| KR100764247B1 (en) | 2005-12-28 | 2007-10-08 | 고려대학교 산학협력단 | Apparatus and Method for speech recognition with two-step search |
| US20070214164A1 (en) * | 2006-03-10 | 2007-09-13 | Microsoft Corporation | Unstructured data in a mining model language |
| US7729917B2 (en) * | 2006-03-24 | 2010-06-01 | Nuance Communications, Inc. | Correction of a caption produced by speech recognition |
| US20080066138A1 (en) * | 2006-09-13 | 2008-03-13 | Nortel Networks Limited | Closed captioning language translation |
| KR100825690B1 (en) | 2006-09-15 | 2008-04-29 | 학교법인 포항공과대학교 | How to fix recognition error in speech recognition system |
| US20080255844A1 (en) * | 2007-04-10 | 2008-10-16 | Microsoft Corporation | Minimizing empirical error training and adaptation of statistical language models and context free grammar in automatic speech recognition |
| US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
| US20090171662A1 (en) * | 2007-12-27 | 2009-07-02 | Sehda, Inc. | Robust Information Extraction from Utterances |
| US7509385B1 (en) * | 2008-05-29 | 2009-03-24 | International Business Machines Corporation | Method of system for creating an electronic message |
| KR20100006089A (en) | 2008-07-08 | 2010-01-18 | 엘지전자 주식회사 | Mobile terminal and method for inputting a text thereof |
| US20100009720A1 (en) | 2008-07-08 | 2010-01-14 | Sun-Hwa Cha | Mobile terminal and text input method thereof |
| KR101502003B1 (en) | 2008-07-08 | 2015-03-12 | 엘지전자 주식회사 | Mobile terminal and its text input method |
| US8498670B2 (en) | 2008-07-08 | 2013-07-30 | Lg Electronics Inc. | Mobile terminal and text input method thereof |
| US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
| US20100091187A1 (en) * | 2008-10-15 | 2010-04-15 | Echostar Technologies L.L.C. | Method and audio/video device for processing caption information |
| KR101495183B1 (en) | 2008-12-01 | 2015-02-24 | 엘지전자 주식회사 | Terminal and method for controlling the same |
| US8423363B2 (en) * | 2009-01-13 | 2013-04-16 | CRIM (Centre de Recherche Informatique de Montréal) | Identifying keyword occurrences in audio data |
| US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
| US20110040559A1 (en) * | 2009-08-17 | 2011-02-17 | At&T Intellectual Property I, L.P. | Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment |
| US8572488B2 (en) * | 2010-03-29 | 2013-10-29 | Avid Technology, Inc. | Spot dialog editor |
| JP5474723B2 (en) | 2010-09-30 | 2014-04-16 | Kddi株式会社 | Speech recognition apparatus and control program therefor |
| US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
| US20120253799A1 (en) * | 2011-03-28 | 2012-10-04 | At&T Intellectual Property I, L.P. | System and method for rapid customization of speech recognition models |
| KR20120135855A (en) | 2011-06-07 | 2012-12-17 | 삼성전자주식회사 | Display apparatus and method for executing hyperlink and method for recogniting voice thereof |
| US9183832B2 (en) | 2011-06-07 | 2015-11-10 | Samsung Electronics Co., Ltd. | Display apparatus and method for executing link and method for recognizing voice thereof |
| US20170186422A1 (en) | 2012-12-29 | 2017-06-29 | Genesys Telecommunications Laboratories, Inc. | Fast out-of-vocabulary search in automatic speech recognition systems |
| JP2015052743A (en) | 2013-09-09 | 2015-03-19 | Necパーソナルコンピュータ株式会社 | Information processing apparatus, information processing apparatus control method, and program |
| KR20150089145A (en) | 2014-01-27 | 2015-08-05 | 삼성전자주식회사 | display apparatus for performing a voice control and method therefor |
| US9711149B2 (en) | 2014-01-27 | 2017-07-18 | Samsung Electronics Co., Ltd. | Display apparatus for performing voice control and voice controlling method thereof |
| KR20160060405A (en) | 2014-11-20 | 2016-05-30 | 삼성전자주식회사 | Apparatus and method for registration of user command |
| US9830908B2 (en) | 2014-11-20 | 2017-11-28 | Samsung Electronics Co., Ltd. | Display apparatus and method for registration of user command |
| US10573302B2 (en) * | 2015-03-13 | 2020-02-25 | Lg Electronics Inc. | Terminal and home appliance system including the same |
| US9392324B1 (en) | 2015-03-30 | 2016-07-12 | Rovi Guides, Inc. | Systems and methods for identifying and storing a portion of a media asset |
Non-Patent Citations (6)
| Title |
|---|
| European Office Action dated Apr. 21, 2021 from European Application No. 18874744.8, 6 pages. |
| Extended European Search Report dated Sep. 21, 2020 in corresponding European Patent Application No. 18874744.8, 8 pages. |
| International Search Report dated Feb. 13, 2019, in corresponding International Patent Application No. PCT/KR2018/012750. |
| Korean Office Action dated Oct. 20, 2021 from Korean Application No. 10-2017-0143213. |
| So et al., "Implementation of Search Method based on Sequence and Adjacency Relationship of User Query", Journal of the Korean Institute of Intelligent Systems vol. 21, No. 6, pp. 724-729, 2011. |
| Written Opinion of the International Searching Authority dated Feb. 13, 2019, in corresponding Patent Application No. PCT/KR2018/012750. |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3678131A4 (en) | 2020-10-21 |
| EP3678131B1 (en) | 2023-05-24 |
| KR20190048334A (en) | 2019-05-09 |
| KR102452644B1 (en) | 2022-10-11 |
| WO2019088571A1 (en) | 2019-05-09 |
| US20200280767A1 (en) | 2020-09-03 |
| EP3678131A1 (en) | 2020-07-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112216281B (en) | Display apparatus and method for registering user command | |
| US10134387B2 (en) | Image display apparatus, method for driving the same, and computer readable recording medium | |
| US11726806B2 (en) | Display apparatus and controlling method thereof | |
| US11437046B2 (en) | Electronic apparatus, controlling method of electronic apparatus and computer readable medium | |
| KR102594022B1 (en) | Electronic device and method for updating channel map thereof | |
| US11615780B2 (en) | Electronic apparatus and controlling method thereof | |
| US12094460B2 (en) | Electronic device and voice recognition method thereof | |
| US20210074299A1 (en) | Electronic apparatus for selecting ai assistant and response providing method thereof | |
| US20200043493A1 (en) | Translation device | |
| KR20250002061A (en) | Display apparatus and controlling method thereof | |
| US11223878B2 (en) | Electronic device, speech recognition method, and recording medium | |
| US11107459B2 (en) | Electronic apparatus, controlling method and computer-readable medium | |
| US12430155B2 (en) | Display apparatus and controlling method thereof | |
| KR102449181B1 (en) | Electronic device and its control method | |
| US11455990B2 (en) | Electronic device and control method therefor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BAE, JAE HYUN;REEL/FRAME:052468/0136 Effective date: 20200406 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: WITHDRAW FROM ISSUE AWAITING ACTION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| CC | Certificate of correction | ||
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |