KR20140022320A

KR20140022320A - Method for operating an image display apparatus and a server

Info

Publication number: KR20140022320A
Application number: KR1020120089061A
Authority: KR
Inventors: 박기원; 정해근
Original assignee: 엘지전자 주식회사
Priority date: 2012-08-14
Filing date: 2012-08-14
Publication date: 2014-02-24

Abstract

An operating method of an image display device according to the present invention comprises the steps of: requesting data related to a language model to an external electronic device via a network; receiving data associated with the language model from the external device; and updating the language model stored in an audio database based on the received data. Thus, user convenience of a user can be improved by accurately and conveniently manipulating image display device and efficiently managing internal resources using an audio recognition technology. [Reference numerals] (AA) Start; (BB) End; (S1210) Request data related to a language model; (S1220) Receive data; (S1230) Update the language model

Description

Method for operating an image display apparatus and a server}

The present invention relates to an image display apparatus, a server, and an operation method thereof, and more particularly, to an image display apparatus, a server, and an operation method capable of providing an efficient voice recognition method and resource management.

A video display device is a device having a function of displaying an image that a user can view. The user can view the broadcast through the video display device. A video display device displays a broadcast selected by a user among broadcast signals transmitted from a broadcast station on a display. Currently, broadcasting is shifting from analog broadcasting to digital broadcasting worldwide.

Digital broadcasting refers to broadcasting in which digital video and audio signals are transmitted. Digital broadcasting is more resistant to external noise than analog broadcasting, so it has less data loss, is advantageous for error correction, has a higher resolution, and provides a clearer picture. Also, unlike analog broadcasting, digital broadcasting is capable of bidirectional service.

On the other hand, according to the diversification of functions and services of the image display apparatus, researches on a voice recognition technology capable of recognizing and manipulating user voices such as menu selection, text input, command input, and channel switching are increasing.

SUMMARY OF THE INVENTION An object of the present invention is to provide an image display apparatus, a server, and a method of operating the same, which can be accurately and conveniently operated and efficiently manage internal resources through voice recognition technology.

Another object of the present invention is to provide an image display apparatus, a server, and an operation method thereof, which may improve user convenience.

According to an aspect of the present invention, there is provided a method of operating an image display device, the method comprising: requesting data related to a language model from an external electronic device connected through a network to receive data related to a language model from an external electronic device. And updating the language model stored in the voice database based on the received data.

In addition, the operation method of the server according to an embodiment of the present invention for achieving the above object, receiving a request for data related to the language model for speech recognition from the image display device, response data to the image display device in accordance with the request And updating the database based on the request and response data.

According to the present invention, it is possible to operate accurately and conveniently through the voice recognition technology and to efficiently manage internal resources, thereby improving user convenience.

1A and 1B illustrate an image display system according to an exemplary embodiment of the present invention.
2 is an internal block diagram of an image display apparatus according to an embodiment of the present invention.
3 is an example of an internal block diagram of the control unit of FIG.
4 is an example of an internal block diagram of the server of FIG.
5 is a diagram illustrating a control method of the remote controller of FIG. 1.
6 is a perspective view of a remote control apparatus according to an embodiment of the present invention.
7 is an internal block diagram of a remote control apparatus according to an embodiment of the present invention.
8 and 9 are views referred to for describing a speech recognition process according to an embodiment of the present invention.
10A to 10B illustrate various examples of a platform structure diagram of the image display apparatus of FIG. 1.
11 is a diagram showing a platform structure according to an embodiment of the present invention.
12 is a flowchart illustrating a method of operating an image display apparatus according to an exemplary embodiment of the present invention.
13 is a flowchart illustrating a method of operating a server according to an exemplary embodiment of the present invention.
14 is a flowchart illustrating a method of operating an image display apparatus system according to an exemplary embodiment of the present invention.
15 is a diagram referred to describe an example of an operating method of an image display device system according to an exemplary embodiment of the present invention.
16 is a flowchart illustrating a method of operating an image display apparatus system according to an exemplary embodiment of the present invention.
17 to 24 are views referred to for describing various examples of an operating method of an image display device system according to an embodiment of the present invention.

Hereinafter, the present invention will be described in more detail with reference to the drawings.

The suffixes "module", "engine" and "part" for components used in the following description are merely given in consideration of ease of preparation of the present specification, and do not give particular meanings or roles by themselves. . Therefore, the "module", "engine" and "unit" may be used interchangeably.

1A and 1B illustrate an image display system according to an exemplary embodiment of the present invention.

Referring to FIG. 1A, an image display system 10 according to an embodiment of the present invention may include an image display apparatus 100 and a server 400, and the image display apparatus 100 and the server 400. ) May be connected via a network.

The image display apparatus 100 may be various electronic devices including display means. For example, it may include a TV (TV) capable of displaying a broadcast, a monitor, a computer, a tablet PC, a portable terminal, and the like.

The image display apparatus 100 according to an embodiment of the present invention may receive a voice signal and perform voice recognition on a voice signal received through an embedded voice recognition engine (not shown) provided therein. Can be done.

On the other hand, the image display apparatus 100 may include a voice database for storing data used for voice recognition in the voice recognition engine or in a storage unit outside the voice recognition engine.

The voice database may include an acoustic model database and a language model database that can store an acoustic model and a language model and store an acoustic model and a language model, respectively.

The voice database may further include a pronunciation dictionary database for storing vocabularies and corresponding pronunciation symbols. According to an embodiment, the speech recognition engine may further comprise a pronunciation symbol generation module for generating a pronunciation symbol from the received text data.

The speech recognition engine processes the received voice signal, and outputs the voice recognition result data by comparing the data with data stored in the voice database. On the other hand, the video display device 100 can perform various operations such as menu selection, text input, command input, and channel switching based on the output speech recognition result data.

Meanwhile, the image display system 10 according to an embodiment of the present invention may use the image display apparatus 100 and the server 400 for voice recognition. That is, the voice may be recognized by the embedded voice recognition engine of the image display apparatus 100, and the data necessary for voice recognition may be received at the server 400 or the input signal may be transmitted to the server 400 and the voice recognition result may be received. Can be.

The server 400 may exchange various data with the video display device 100. For example, an acoustic model, a language model, and pronunciation dictionary data may be exchanged, and a voice signal received by the image display apparatus 100 and a result of voice recognition of the image display apparatus 100 may be used. Data or data generated in the interim process and voice recognition result data of the server 400 may be exchanged.

In addition, data for content reproduction, data including data related to the content or the image display apparatus 100 or the server 400 may be transmitted and received.

Referring to FIG. 1B, the image display system 10 according to an embodiment of the present invention may include an image display apparatus 100 and a plurality of servers 400a, 400b, and 400c. The plurality of servers 400a, 400b, and 400c may have different data stored and managed, and the image display system 10 selects a server to be used according to a situation and uses an optimized data to provide efficient service. It can be provided.

Meanwhile, in the present specification, the image display apparatus 100 of the image display system 10 will be described with reference to embodiments in which necessary data is transmitted and received with the server 400. However, the present invention is not limited thereto, and the image display apparatus 100 is provided. The external electronic device connected to the network may perform the same role as the server 400.

For example, the storage means stores and manages data related to speech recognition, and a computer, network connection, computer, other video display device, smart phone, tablet PC, etc. are connected to the video display device 100 in response to a request of the video display device 100. Accordingly, data related to voice recognition may be transmitted to the image display apparatus 100.

2 is an internal block diagram of an image display apparatus according to an embodiment of the present invention.

Referring to FIG. 2, the image display apparatus 100 according to an exemplary embodiment of the present invention may include a broadcast receiver 105, an external device interface 130, a network interface 135, a storage 140, and a user. The input interface unit 150 may include a sensor unit (not shown), a controller 170, a display 180, and an audio output unit 185.

The broadcast receiver 105 may include a tuner 110 and a demodulator 120. On the other hand, if necessary, the broadcast receiver 105 may be designed to include the tuner 110 and the demodulator 120, but not include the network interface 135, and conversely, the network interface 135 It is also possible to design so as not to include the tuner unit 110 and the demodulation unit 120 while having.

The tuner unit 110 selects an RF broadcast signal corresponding to a channel selected by the user or all pre-stored channels among RF (Radio Frequency) broadcast signals received through the antenna. Also, the selected RF broadcast signal is converted into an intermediate frequency signal, a baseband image, or a voice signal.

For example, if the selected RF broadcast signal is a digital broadcast signal, it is converted into a digital IF signal (DIF). If the selected RF broadcast signal is an analog broadcast signal, it is converted into an analog baseband image or voice signal (CVBS / SIF). That is, the tuner unit 110 can process a digital broadcast signal or an analog broadcast signal. The analog baseband video or audio signal (CVBS / SIF) output from the tuner unit 110 can be directly input to the controller 170.

The tuner unit 110 may receive an RF broadcast signal of a single carrier according to an Advanced Television System Committee (ATSC) scheme or an RF broadcast signal of a plurality of carriers according to a DVB (Digital Video Broadcasting) scheme.

Meanwhile, the tuner unit 110 sequentially selects RF broadcast signals of all broadcast channels stored through a channel memory function among the RF broadcast signals received through the antenna in the present invention, and sequentially selects RF broadcast signals of the intermediate frequency signal, baseband image, . &Lt; / RTI >

On the other hand, the tuner unit 110 may be provided with a plurality of tuners in order to receive broadcast signals of a plurality of channels. Alternatively, a single tuner that simultaneously receives broadcast signals of a plurality of channels is also possible.

The demodulator 120 receives the digital IF signal DIF converted by the tuner 110 and performs a demodulation operation.

The demodulation unit 120 may perform demodulation and channel decoding, and then output a stream signal TS. At this time, the stream signal may be a signal in which a video signal, a voice signal, or a data signal is multiplexed.

The stream signal output from the demodulation unit 120 may be input to the controller 170. The control unit 170 performs demultiplexing, video / audio signal processing, and the like, and then outputs an image to the display 180 and outputs audio to the audio output unit 185.

The external device interface unit 130 can transmit or receive data with the connected external device 190. [ To this end, the external device interface unit 130 may include an A / V input / output unit (not shown) or a wireless communication unit (not shown).

The external device interface unit 130 can be connected to an external device such as a DVD (Digital Versatile Disk), a Blu ray, a game device, a camera, a camcorder, a computer , And may perform an input / output operation with an external device.

The A / V input / output unit can receive video and audio signals from an external device. Meanwhile, the wireless communication unit can perform short-range wireless communication with other electronic devices.

The network interface unit 135 provides an interface for connecting the video display device 100 to a wired / wireless network including the Internet network. For example, the network interface unit 135 may receive content or other data from the server 400 provided by the Internet or a content provider or a network operator through a network.

The storage unit 140 may store a program for each signal processing and control in the control unit 170 or may store the processed video, audio, or data signals.

In addition, the storage unit 140 may perform a function for temporarily storing video, audio, or data signals input to the external device interface unit 130. [ In addition, the recorded program may be stored in a transcoded form in a TS format or a predetermined format.

Meanwhile, the storage 140 may store and manage data used for speech recognition.

In addition, the storage 140 may store the platform of FIGS. 10A and 10B to be described later.

Although the storage unit 140 of FIG. 2 is provided separately from the control unit 170, the scope of the present invention is not limited thereto. The storage unit 140 may be included in the controller 170.

The user input interface unit 150 transmits a signal input by the user to the control unit 170 or a signal from the control unit 170 to the user.

(Not shown), such as a power key, a channel key, a volume key, and a set value, from the remote control apparatus 200, (Not shown) that senses a user's gesture to the control unit 170 or transmits a signal from the control unit 170 to the control unit 170 It is possible to transmit it to the sensor unit (not shown).

The controller 170 demultiplexes the input stream or processes the demultiplexed signals through the tuner 110, the demodulator 120, the external device interface 130, and the like to output an image or audio output. Can generate and output

The video signal processed by the controller 170 may be input to the display 180 and displayed as an image corresponding to the video signal. Also, the image signal processed by the controller 170 may be input to the external output device through the external device interface unit 130.

The audio signal processed by the control unit 170 may be output to the audio output unit 185 as an audio signal. The audio signal processed by the controller 170 may be input to the external output device through the external device interface unit 130. [

Although not shown in FIG. 2, the controller 170 may include a demultiplexer, an image processor, and the like. This will be described later with reference to FIG.

In addition, the control unit 170 can control the overall operation in the video display device 100. [ For example, the control unit 170 may control the tuner unit 110 to control the tuning of the RF broadcast corresponding to the channel selected by the user or the previously stored channel.

In addition, the controller 170 may control the image display apparatus 100 by a user command or an internal program input through the user input interface unit 150.

Meanwhile, the control unit 170 may control the display 180 to display an image. In this case, the image displayed on the display 180 may be a still image or a video, and may be a 2D image or a 3D image.

Meanwhile, the controller 170 may generate a 3D object for a predetermined 2D object among the images displayed on the display 180, and display the 3D object. For example, the object may be at least one of a connected web screen (newspaper, magazine, etc.), EPG (Electronic Program Guide), various menus, widgets, icons, still images, moving images, and text.

Such a 3D object may be processed to have a different depth than the image displayed on the display 180. [ Preferably, the 3D object may be processed to appear protruding from the image displayed on the display 180.

On the other hand, the control unit 170 can recognize the position of the user based on the image photographed from the photographing unit (not shown). For example, the distance (z-axis coordinate) between the user and the image display apparatus 100 can be grasped. In addition, the x-axis coordinate and the y-axis coordinate in the display 180 corresponding to the user position can be grasped.

Although not shown in the drawing, a channel browsing processing unit for generating a channel signal or a thumbnail image corresponding to an external input signal may be further provided. The channel browsing processing unit receives the stream signal TS output from the demodulation unit 120 or the stream signal output from the external device interface unit 130 and extracts an image from an input stream signal to generate a thumbnail image . The generated thumbnail image may be stream-decoded together with a decoded image and input to the controller 170. The control unit 170 may display a thumbnail list having a plurality of thumbnail images on the display 180 using the input thumbnail image.

At this time, the thumbnail list may be displayed in a simple view mode displayed on a partial area in a state where a predetermined image is displayed on the display 180, or in a full viewing mode displayed in most areas of the display 180. The thumbnail images in the thumbnail list can be sequentially updated.

The display 180 converts a video signal, a data signal, an OSD signal, a control signal processed by the control unit 170, a video signal, a data signal, a control signal, and the like received from the external device interface unit 130, .

The display 180 may be a PDP, an LCD, an OLED, a flexible display, or the like, and may also be capable of a 3D display.

In order to view the three-dimensional image, the display 180 may be divided into an additional display method and a single display method.

The single display method can implement a 3D image only on the display 180 without a separate additional display, for example, glass, and examples thereof include a lenticular method, a parallax barrier, and the like Various methods can be applied.

Meanwhile, the additional display method may implement a 3D image by using an additional display as a viewing device in addition to the display 180. For example, various methods such as a head mounted display (HMD) type and a glasses type may be applied.

On the other hand, the glasses type can be further divided into a passive type such as a polarizing glasses type and an active type such as a shutter glass type. Also, the head mount display type can be divided into a passive type and an active type.

Meanwhile, the display 180 may be configured as a touch screen and used as an input device in addition to the output device.

The audio output unit 185 receives the signal processed by the control unit 170 and outputs it as a voice.

A photographing unit (not shown) photographs the user. The photographing unit (not shown) may be implemented by a single camera, but the present invention is not limited thereto, and may be implemented by a plurality of cameras. On the other hand, the photographing unit (not shown) may be embedded in the image display device 100 on the upper side of the display 180 or may be disposed separately. The image information photographed by the photographing unit (not shown) may be input to the control unit 170.

The control unit 170 can detect the gesture of the user based on each of the images photographed from the photographing unit (not shown) or the signals sensed from the sensor unit (not shown) or a combination thereof.

The remote control apparatus 200 transmits the user input to the user input interface unit 150. [ To this end, the remote control apparatus 200 can use Bluetooth, RF (radio frequency) communication, infrared (IR) communication, UWB (Ultra Wideband), ZigBee, or the like. Also, the remote control apparatus 200 can receive the video, audio, or data signal output from the user input interface unit 150 and display it or output it by the remote control apparatus 200.

The remote control apparatus 200 will be described later with reference to FIGS. 5 to 7.

Meanwhile, the video display device 100 may be a digital broadcast receiver capable of receiving a fixed or mobile digital broadcast.

Meanwhile, a block diagram of the image display apparatus 100 shown in FIG. 2 is a block diagram for an embodiment of the present invention. Each component of the block diagram may be integrated, added, or omitted according to the specifications of the image display apparatus 100 actually implemented. That is, two or more constituent elements may be combined into one constituent element, or one constituent element may be constituted by two or more constituent elements, if necessary. In addition, the functions performed in each block are intended to illustrate the embodiments of the present invention, and the specific operations and apparatuses do not limit the scope of the present invention.

On the other hand, the image display apparatus 100 does not include the tuner 110 and the demodulator 120 shown in FIG. 1, unlike the illustrated in FIG. 1, and the network interface unit 135 or the external device interface unit ( Through 130, content may be received and played back.

In the following, an embodiment of the present invention will be described with reference to the image display system 10 and the image display apparatus 100 of FIG. 1A, but the present invention is not limited thereto.

3 is an example of an internal block diagram of the control unit of FIG.

The control unit 170 includes a demultiplexing unit 310, an image processing unit 320, a processor 330, an OSD generating unit 340, a mixer 345, A frame rate conversion unit 350, and a formatter 360. [0031] An audio processing unit (not shown), and a data processing unit (not shown).

The demultiplexer 310 demultiplexes the input stream. For example, when an MPEG-2 TS is input, it can be demultiplexed into video, audio, and data signals, respectively. The stream signal input to the demultiplexer 310 may be a stream signal output from the tuner 110 or the demodulator 120 or the external device interface 130.

The image processing unit 320 may perform image processing of the demultiplexed image signal. To this end, the image processing unit 320 may include a video decoder 225 and a scaler 235. [

The video decoder 225 decodes the demultiplexed video signal and the scaler 235 performs scaling so that the resolution of the decoded video signal can be output from the display 180.

The video decoder 225 may include a decoder of various standards.

On the other hand, the image signal decoded by the image processing unit 320 can be divided into a case where there is only a 2D image signal, a case where a 2D image signal and a 3D image signal are mixed, and a case where there is only a 3D image signal.

For example, when an external video signal input from the external device 190 or a broadcast video signal of a broadcast signal received from the tuner unit 110 includes only a 2D video signal, when a 2D video signal and a 3D video signal are mixed And a case where there is only a 3D video signal. Accordingly, the controller 170, particularly, the image processing unit 320 and the like can process the 2D video signal, the mixed video signal of the 2D video signal and the 3D video signal, , A 3D video signal can be output.

Meanwhile, the image signal decoded by the image processing unit 320 may be a 3D image signal in various formats. For example, a 3D image signal composed of a color image and a depth image, or a 3D image signal composed of a plurality of view image signals. The plurality of viewpoint image signals may include, for example, a left eye image signal and a right eye image signal.

The processor 330 may control the overall operation in the image display apparatus 100 or in the control unit 170. [ For example, the processor 330 may control the tuner 110 to select a channel selected by the user or an RF broadcast corresponding to a previously stored channel.

In addition, the processor 330 may control the image display apparatus 100 by a user command or an internal program input through the user input interface unit 150. [

In addition, the processor 330 may perform data transfer control with the network interface unit 135 or the external device interface unit 130.

In addition, the processor 330 may control the overall voice recognition function, control to perform voice recognition based on the voice signal input, or control to use the server 400 connected through a network.

The processor 330 may control operations of the demultiplexing unit 310, the image processing unit 320, the OSD generating unit 340, and the like in the controller 170.

The OSD generator 340 generates an OSD signal according to a user input or itself. For example, based on a user input signal, a signal for displaying various information in a graphic or text form on the screen of the display 180 can be generated. The generated OSD signal may include various data such as a user interface screen of the video display device 100, various menu screens, a widget, and an icon. In addition, the generated OSD signal may include a 2D object or a 3D object.

The OSD generating unit 340 can generate a pointer that can be displayed on the display based on the pointing signal input from the remote control device 200. [ In particular, such a pointer may be generated by the pointing signal processor, and the OSD generator 240 may include such a pointing signal processor (not shown). Of course, a pointing signal processing unit (not shown) may be provided separately from the OSD generating unit 240.

The mixer 345 may mix the OSD signal generated by the OSD generator 340 and the decoded video signal processed by the image processor 320. At this time, the OSD signal and the decoded video signal may include at least one of a 2D signal and a 3D signal. The mixed video signal is supplied to a frame rate converter 350.

A frame rate converter (FRC) 350 can convert the frame rate of an input image. On the other hand, the frame rate converter 350 can output the frame rate without conversion.

The formatter 360 may arrange the left eye image frame and the right eye image frame of the frame rate-converted 3D image. The left eye glass of the 3D viewing apparatus 195 and the synchronization signal Vsync for opening the right eye glass can be output.

The formatter 360 receives the mixed signal, i.e., the OSD signal and the decoded video signal, from the mixer 345, and separates the 2D video signal and the 3D video signal.

In the present specification, a 3D video signal means a 3D object. Examples of the 3D object include a picuture in picture (PIP) image (still image or moving picture), an EPG indicating broadcasting program information, Icons, texts, objects in images, people, backgrounds, web screens (newspapers, magazines, etc.).

On the other hand, the formatter 360 can change the format of the 3D video signal.

Meanwhile, the formatter 360 may convert the 2D video signal into a 3D video signal. For example, according to a 3D image generation algorithm, an edge or a selectable object is detected in a 2D image signal, and an object or a selectable object according to the detected edge is separated into a 3D image signal and is generated . At this time, the generated 3D image signal can be separated into the left eye image signal L and the right eye image signal R, as described above.

Although not shown in the drawing, it is also possible that a 3D processor (not shown) for 3-dimensional effect signal processing is further disposed after the formatter 360. The 3D processor (not shown) can process the brightness, tint, and color of the image signal to improve the 3D effect. For example, it is possible to perform signal processing such as making the near field clear and the far field blurring. On the other hand, the functions of such a 3D processor can be merged into the formatter 360 or merged into the image processing unit 320. [

Meanwhile, the audio processing unit (not shown) in the control unit 170 can perform the audio processing of the demultiplexed audio signal. To this end, the audio processing unit (not shown) may include various decoders.

In addition, the audio processing unit (not shown) in the control unit 170 can process a base, a treble, a volume control, and the like.

The data processing unit (not shown) in the control unit 170 can perform data processing of the demultiplexed data signal. For example, if the demultiplexed data signal is a coded data signal, it can be decoded. The encoded data signal may be EPG (Electronic Program Guide) information including broadcast information such as a start time and an end time of a broadcast program broadcasted on each channel.

In FIG. 3, the signals from the OSD generator 340 and the image processor 320 are mixed by the mixer 345 and then 3D processed by the formatter 360, but the present invention is not limited thereto. May be located after the formatter. That is, the output of the image processing unit 320 is 3D-processed by the formatter 360, and the OSD generating unit 340 performs 3D processing together with the OSD generation. Thereafter, the processed 3D signals are mixed by the mixer 345 It is also possible to do.

Meanwhile, the block diagram of the controller 170 shown in FIG. 3 is a block diagram for an embodiment of the present invention. Each component of the block diagram can be integrated, added, or omitted according to the specifications of the control unit 170 actually implemented.

In particular, the frame rate converter 350 and the formatter 360 are not provided in the controller 170, but may be separately provided.

4 is an example of an internal block diagram of the server of FIG.

Referring to FIG. 4, the server 400 may include a network interface unit 430, a storage unit 420, and a processor 410.

The storage unit 420 may store a voice signal or various data received from the network interface unit 430. In addition, the storage unit 420 may store and manage data of a plurality of users in a storage space of one storage medium, and may also include a plurality of storage media.

The network interface unit 430 may transmit / receive data related to voice recognition with one or more image display apparatuses, and may transmit / receive broadcast information or broadcast program related data about a broadcast signal. In addition, the data transmitted and received may include identification information of the image display apparatus and / or account information for identifying a user.

The data related to speech recognition may be an acoustic model, a language model, a pronunciation dictionary data used in the speech recognition process, and a voice signal received by the image display apparatus 100 and an image display apparatus 100. The voice recognition result data) or data generated in an intermediate process, voice recognition result data of the server 400, and the like.

The broadcast program related data may include detailed information of the broadcast program and data including additional information, transport stream data for reproduction of the broadcast program, or may include data transcoded in another manner.

On the other hand, the server 400 may store and manage the received signal, transmission and reception details.

The processor 410 can control the overall function of the server 400. [

The processor 410 may control to store data received from the image display apparatus, and generate auxiliary information for managing data for each of a plurality of user accounts. Then, at least a part of the generated information can be controlled to be transmitted to the video display device 100 through the network interface unit 430.

Meanwhile, the server 400 may include a speech recognition engine for speech recognition, and the processor 410 may control the speech recognition engine and the speech recognition process. The server 400 may transmit the voice recognition result to the video display device 100 through the network interface unit 430.

In addition, the processor 410 may search for content stored in the storage unit 420 of the server 400 in response to a request of the image display apparatus 100. For example, in response to a request of the image display apparatus 100, an acoustic model, a language model, and pronunciation dictionary data may be retrieved, and at least some of the data may be retrieved by the image display apparatus 100. You can send a response.

The processor 410 may search for content stored in the plurality of user accounts in consideration of the rights of the user who requested the content or information and the rights of other users.

Meanwhile, the network interface unit 430 may transmit / receive data with other servers and electronic devices through a network separately from the image display apparatus 100.

5 is a diagram illustrating a control method of the remote controller of FIG. 1.

5A illustrates that the pointer 180 corresponding to the remote control device 200 is displayed on the display 180. In this case,

The user can move or rotate the remote control device 200 up and down, left and right (Figure 5 (b)), and back and forth (Figure 5 (c)). The pointer 205 displayed on the display 180 of the video display device corresponds to the movement of the remote control device 200. [ The remote control apparatus 200 may be referred to as a spatial remote controller because the pointer 205 is moved and displayed according to the movement in the 3D space as shown in the figure.

5B illustrates that when the user moves the remote control apparatus 200 to the left, the pointer 205 displayed on the display 180 of the image display apparatus also shifts to the left correspondingly.

Information on the motion of the remote control device 200 sensed through the sensor of the remote control device 200 is transmitted to the image display device. The image display apparatus can calculate the coordinates of the pointer 205 from the information on the motion of the remote control apparatus 200. [ The image display apparatus can display the pointer 205 so as to correspond to the calculated coordinates.

5C illustrates a case in which the user moves the remote control device 200 away from the display 180 while pressing a specific button in the remote control device 200. FIG. Thereby, the selected area in the display 180 corresponding to the pointer 205 can be zoomed in and displayed. Conversely, when the user moves the remote control device 200 close to the display 180, the selection area within the display 180 corresponding to the pointer 205 may be zoomed out and zoomed out. On the other hand, when the remote control device 200 moves away from the display 180, the selection area is zoomed out, and when the remote control device 200 approaches the display 180, the selection area may be zoomed in.

On the other hand, when the specific button in the remote control device 200 is pressed, it is possible to exclude recognizing the up, down, left, and right movement. That is, when the remote control device 200 moves away from or near the display 180, the up, down, left and right movements are not recognized, and only the front and back movements can be recognized. Only the pointer 205 is moved in accordance with the upward, downward, leftward, and rightward movement of the remote control device 200 in a state where the specific button in the remote control device 200 is not pressed.

On the other hand, the moving speed and moving direction of the pointer 205 may correspond to the moving speed and moving direction of the remote control device 200.

6 is a perspective view of a remote control apparatus according to an embodiment of the present invention, Figure 7 is an internal block diagram of the remote control apparatus.

Referring to FIG. 6, the spatial remote control 201 according to an embodiment of the present invention may include various input keys or input buttons.

For example, the spatial remote controller 201 may include an Okay key 291, a menu key 292, a four direction key 293, a channel adjustment key 294, and a volume adjustment key 296 have. In addition, a microphone may be further included.

For example, the Okay key 291 may be used to select a menu or item, the menu key 292 may be used to display a predetermined menu, and the four direction keys 293 may be used to select a pointer or an indicator, The channel adjustment key 294 can be used for channel up and down adjustment, and the volume adjustment key 296 can be used for volume up and down adjustment.

In addition, the spatial remote controller 201 may further include a back key 297 and a home key 298. For example, the back key 297 may be used when moving to the previous screen, and the home key 298 may be used when moving to the home screen.

On the other hand, as shown in the figure, the confirmation key 291 may be added with a scroll function. For this, the confirmation key 291 can be implemented as a wheel key. That is, when the OK key 291 is pressed, it is used as a menu or item selection. When the OK key 291 is scrolled up and down, a display screen scrolling or a list page switching can be performed.

In detail, when an image larger than the size of the display is displayed on the display unit 180, when the confirmation key 291 is scrolled to search for the corresponding image, an image area not currently displayed on the display is displayed on the display. As another example, when the list page is displayed on the display unit 180, the previous page or the next page of the current page can be displayed by scrolling the OK key 291. [

This scroll function may be provided with a separate key other than the confirmation key 291. [

On the other hand, the four directional keys 293 can be arranged in four directions in the circular type, as shown in the figure, up, down, left, and right keys. Touch input to the four directional keys 293 may also be possible. For example, if there is a touch operation from the up key to the down key in the direction key 293, the set function may be inputted or performed according to the corresponding touch input.

Referring to FIG. 7, the remote control apparatus 200 includes a wireless communication unit 220, a user input unit 230, a sensor unit 240, an output unit 250, a power supply unit 260, and a storage unit 270. The controller 280 may include an audio input unit 290.

The wireless communication unit 220 may transmit / receive a signal with the image display device 100 and, for example, the wireless communication unit 211 of the user input interface unit 150.

Meanwhile, the coordinate value calculator 215 of the image display apparatus 100 may calculate the coordinates of the pointer 205 from the information about the movement of the remote controller 200.

In the present embodiment, the remote control apparatus 200 may include an RF module 221 capable of transmitting and receiving a signal with the RF module 212 of the image display apparatus 100 according to the RF communication standard. In addition, the remote control apparatus 200 may include an IR module 223 capable of transmitting and receiving a signal with the IR module 213 of the image display apparatus 100 according to the IR communication standard.

Further, the remote control device 200 may further include an NFC module (not shown) for short-range magnetic field communication with the electronic device.

The remote control device 200 can transmit a signal containing information on the motion and the like of the remote control device 200 to the image display device 100 through the RF module 221.

Also, the remote control device 200 can receive the signal transmitted by the video display device 100 through the RF module 221. [ Also, the remote control device 200 can transmit commands to the video display device 100 via the IR module 223, such as power on / off, channel change, volume change, and the like, as needed.

On the other hand, according to the present embodiment, the remote control device 200 can receive personal information by a near field communication with a predetermined electronic device.

On the other hand, the remote control device 200 can transmit the received personal information to the video display device 100. [ The communication method at this time can be an IR method or an RF method.

The user input unit 230 may include a keypad, a key (button), a touch pad, or a touch screen. The user can input a command related to the image display apparatus 100 to the remote control apparatus 200 by operating the user input unit 230. [ When the user input unit 230 has a hard key button, the user can input a command related to the image display device 100 to the remote control device 200 through the push operation of the hard key button. When the user input unit 230 includes a touch screen, the user may input a command related to the image display apparatus 100 to the remote control apparatus 200 by touching a soft key of the touch screen. In addition, the user input unit 230 may include various types of input means such as a scroll key, a jog key, and the like that can be operated by the user, and the present invention does not limit the scope of the present invention.

The sensor unit 240 may include a gyro sensor 241 or an acceleration sensor 243. The gyro sensor 241 can sense information on the motion of the remote control device 200. [

For example, the gyro sensor 241 can sense information on the operation of the remote control device 200 based on the x, y, and z axes. The acceleration sensor 243 can sense information on the moving speed of the remote control device 200 and the like. On the other hand, it may further include a distance measuring sensor, whereby the distance to the display 180 can be sensed. Alternatively, it can detect the amount of change in orientation by including a geomagnetic sensor that can detect the direction of the magnetic field generated by the earth and detect the orientation like a compass.

The output unit 250 may output an image or voice signal corresponding to the operation of the user input unit 230 or corresponding to the signal transmitted from the image display apparatus 100. The user can recognize whether the user input unit 230 is operated or whether the image display apparatus 100 is controlled through the output unit 250.

For example, the output unit 250 includes an LED module 251 that is turned on when the user input unit 230 is operated or a signal is transmitted and received between the image display device 100 and the image display device 100 through the wireless communication unit 225, 253 for outputting sound, an acoustic output module 255 for outputting sound, or a display module 257 for outputting an image.

The power supply unit 260 supplies power to the remote control device 200. The power supply unit 260 can reduce power consumption by stopping the power supply when the remote controller 200 is not moving for a predetermined period of time. The power supply unit 260 may resume power supply when a predetermined key provided in the remote control device 200 is operated.

The storage unit 270 may store various types of programs, application data, and the like necessary for the control or operation of the remote control apparatus 200. [ When the remote control device 200 wirelessly transmits and receives a signal through the image display device 100 and the RF module 221, the remote control device 200 and the image display device 100 transmit signals through a predetermined frequency band Send and receive. The control unit 280 of the remote control device 200 stores information on a frequency band or the like capable of wirelessly transmitting and receiving signals with the image display device 100 paired with the remote control device 200 in the storage unit 270. Reference may be made.

In addition, the storage unit 270 may store IR format key codes for controlling other electronic devices as an IR signal, and may store IR format key databases of a plurality of electronic devices.

The control unit 280 controls various matters related to the control of the remote control device 200. [ The controller 280 transmits a signal corresponding to a predetermined key manipulation of the user input unit 230 or a signal corresponding to the movement of the remote controller 200 sensed by the sensor unit 240 through the wireless communication unit 220. 100 can be sent.

The audio input unit 290 is for an audio signal and may include a microphone (MIC). The microphone may receive an external sound signal, for example, a user's voice signal by a microphone, and process the same as electrical voice data.

Meanwhile, the voice signal and data may be transmitted to the image display apparatus 100 through the wireless communication unit 220.

8 and 9 are views referred to for describing a speech recognition process according to an embodiment of the present invention.

Referring to FIG. 8A, when the user enters the voice channel switching mode, the guide message 820 may be further displayed in addition to the channel image 810 being viewed.

The guide message 820 may include a help for switching a voice channel, for example, "Please tell a channel you want to watch. Please check if you do not cover the microphone of the remote control."

On the other hand, the voice channel switching mode may be entered in a variety of ways, such as pressing one of the hard keys provided on the remote control device 200, or selecting a menu displayed on the display 180.

When the user inputs voice "Kbc" through the microphone of the remote controller as shown in FIG. 8 (b), the user may recognize the input voice signal and switch the channel to the matching "Kbc" channel.

9 briefly illustrates a configuration and operation of an example of a speech recognition engine.

Referring to FIG. 9, a preprocessing such as noise processing is performed on a first received voice signal 910. In step 920, a preprocessing process includes only a voice section from an input signal, such as end-point detection. A detection method and a method of distinguishing between a speech section and a silent section by extracting statistical features having a relatively simple calculation process from an input signal can be used.

A commonly used method is to determine the speech interval and the silence interval by comparing the energy value (or log energy value) at every interval of the input signal and comparing it with a predetermined threshold value by statistics.

In addition, the preprocessing process may include noise processing to remove noise.

Meanwhile, the preprocessing process 920 may further include necessary additional noise processing and signal processing.

Thereafter, a feature vector (parameter) effective for recognition is extracted from the input speech signal.

Here, a method based on LPC (Linear Predictive Coefficients) and an MFFC (Mel Frequency Cepstral Coefficients) extraction method can be used.

Thereafter, a voice signal and a pattern of the extracted feature parameter may be recognized (940), and an acoustic model 941 may be used for pattern recognition.

This is a method of modeling and comparing the signal characteristics of speech, and a direct comparison method of setting the recognition object as the feature vector model and comparing it with the feature vector of the input signal can be used. In the direct comparison method, a unit of a recognition target word, a phoneme, and the like is set as a feature vector model, and it is possible to compare how similar the input speech is.

Alternatively, a statistical method of statistically processing and using the feature vector of the recognition object can be used. This statistical method can construct the unit of the recognition target as a state sequence and use the relation between the status columns. The DTW (Dynamic Time Warping) method using the temporal arrangement relation, the probability value, And HMM (Hidden Markov Model) method.

In more detail, DTW (Dynamic Time Warping) is a method of obtaining the distance between the reference speech signal and the input speech signal using dynamic programming, and is mainly used for constructing a speaker-dependent isolated word recognition system and has a high recognition rate.

HMM (Hidden Markov Model) is a method of expressing transition probability of a negative state from one state to the next state, constructing a model representative of these from training data by using temporal statistical characteristics of a speech signal, And a probability model with a high degree of similarity is adopted as the recognition result.

Thereafter, data corresponding to the received voice signal may be determined, and a recognition result 1960 may be output.

On the other hand, the data determination process may include an operation of the language processing 950 using the language model 951.

The language model is generally used to find probability values for all possible word sequences. The grammar-based method considers only word sequences that are grammatically correct for a given situation among the possible combinations of words. From a database of uttered speech in a given situation, , A statistical-based scheme may be used that statistically estimates a probability value of a possible word sequence.

10A to 10B illustrate various examples of a platform structure diagram of the image display apparatus of FIG. 1.

The platform of the video display device 100 according to the embodiment of the present invention may include OS-based software to perform various operations as described above.

First, referring to FIG. 10A, the platform of the image display apparatus 100 according to an exemplary embodiment of the present invention is a detachable platform, in which a legacy system platform 1020 and a smart system platform 1030 are separated. Can be designed. The OS Kernel 1010 may be commonly used in the legacy system platform 1020 and the smart system platform 1030.

The legacy system platform 1020 may include a driver 1021 on the OS kernel 1010, middleware 1022, and an application layer 1023. The smart system platform 1030 may include a library 1031, a framework 1033, and an application layer 1034 on the OS kernel 1010.

The OS kernel 1010 is a core of an operating system, and when the image display apparatus 100 is driven, a hardware driver is driven, security of hardware and a processor in the image display apparatus 100, efficient management of system resources, and memory. At least one of management, provision of an interface to hardware by hardware abstraction, multiprocessing, and schedule management according to multiprocessing may be performed. The OS kernel 1010 may further provide power management or the like.

The hardware driver in the OS kernel 1010 may be, for example, at least one of a display driver, a Wi-Fi driver, a Bluetooth driver, a USB driver, an audio driver, a power management, a binder driver, a memory driver, and the like. It may include.

In addition, the hardware driver in the OS kernel 1010 is a driver for a hardware device in the OS kernel 1010 and includes a character device driver, a block device driver, and a network device driver. dirver). Since a block device driver transfers data in units of specific blocks, a buffer for storing a unit size may be required. The character device driver may transmit a basic data unit, i.e., a character unit, so that a buffer may not be needed.

The OS kernel 1010 may be implemented with various operating system (OS) based kernels such as Unix-based (Linux) and Windows-based. In addition, the OS kernel 1010 is an open OS kernel and may be general-purpose usable in other electronic devices.

The driver 1021 is located between the OS kernel 1010 and the middleware 1022 and, together with the middleware 1022, drives the device for the operation of the application layer 1023. For example, the driver 1021 may include a micom, a display module, a graphics processing unit (GPU), a frame rate converter (FRC), a general purpose input / output pin (GPIO), and the like in the video display device 100. It may include a driver such as HDMI, SDEC (System Decoder or Demultiplexer), VDEC (Video Decoder), ADEC (Audio Decoder), PVR (Personal Video Recorder), or I2C (Inter-Integrated Circuit). . These drivers operate in conjunction with hardware drivers in the OS kernel 1010.

In addition, the driver 1021 may further include a driver of the remote control apparatus 200, in particular, a 3D pointing device. The driver of the 3D pointing device may be provided in various ways in the OS kernel 1010 or the middleware 1022 in addition to the driver 1021.

The middleware 1022 may be located between the OS kernel 1010 and the application layer 1023, and may serve as an intermediary to exchange data between different hardware or software. Thus, it is possible to provide a standardized interface, various environment support, and system can interoperate with other tasks.

Examples of the middleware 1022 in the legacy system platform 1020 may include multimedia and hypermedia information coding expert groups (MHEG) and middleware of an advanced common application platform (ACAP), which are data broadcasting related middleware, and PSIP, which is broadcast information related middleware. Or there may be a middleware of the SI, and DLNA middleware, which is a middleware related to the peripheral communication.

The application layer 1023 on the middleware 1022, that is, the application layer 1023 in the legacy system platform 1020, may, for example, provide a user interface application for various menus or the like in the image display apparatus 100. It may include. The application layer 1023 on the middleware 1022 may be editable by a user's selection and may be updated through a network. By using the application layer 1023, it is possible to enter a desired menu among various user interfaces according to the input of the remote control apparatus 200 while watching a broadcast image.

In addition, the application layer 1023 in the legacy system platform 1020 may further include at least one of a TV guide application, a Bluetooth application, a reservation application, a digital video recorder (DVR) application, and a hotkey application. Can be.

Meanwhile, the library 1031 in the smart system platform 1030 may be located between the OS kernel 1010 and the framework 1033, and form the basis of the framework 1033. For example, the library 1031 may be a security-related library, SSL (Secure Socket Layer), a web engine-related library (WebKit), libc (c library), a video (audio) format, and the like. The media framework, which is a media related library, may be included. The library 1031 may be written based on C or C ++. In addition, it may be exposed to the developer through the framework 1033.

The library 1031 may include a runtime 1032 having a core java library and a virtual machine (VM). This runtime 1032 together with the library 1031 forms the basis of the framework 1033.

The virtual machine (VM) can be a plurality of instances, that is, a virtual machine that can perform multitasking. Meanwhile, each virtual machine (VM) may be allocated and executed according to each application in the application layer 1034. In this case, a binder (in the OS kernel 1010) may be used for scheduling or interconnection between a plurality of instances. Binder) driver (not shown) may operate.

The binder driver and the runtime 1032 may connect a Java-based application and a C-based library.

The library 1031 and runtime 1032 may correspond to middleware of a legacy system.

The framework 1033 in the smart system platform 1030, on the other hand, includes a program on which the application in the application layer 1034 is based. The framework 1033 is compatible with any application and may enable reuse, movement, or exchange of components. The framework 1033 may include a support program, a program that ties other software components, and the like. For example, a resource manager, an activity manager associated with an activity of an application, a notification manager, a content provider that summarizes shared information between applications, and the like . The framework 1033 may be written based on JAVA.

The application layer 1034 on the framework 1033 includes various programs that can be driven and displayed in the image display apparatus 100. For example, a Core Application having at least one of an email, a short message service (SMS), a calendar, a map, a browser, etc. . The application layer 1023 may be written based on JAVA.

In addition, the application layer 1034 is an application 1035 that is stored in the image display apparatus 100 and cannot be deleted by the user, and is downloaded and stored through an external device or a network, and is free to install or delete by the user. 1036).

Through the application in this application layer 1034, by internet connection, Internet telephony service, video on demand (VOD) service, web album service, social networking service (SNS), location based service (LBS), map service, web search Services, application search services, etc. may be performed. In addition, various functions such as game and schedule management can be performed.

Next, referring to FIG. 10B, the platform of the image display apparatus 100 according to an exemplary embodiment of the present invention is an integrated platform, and includes an OS kernel 1040, a driver 1050, middleware 1060, and a framework 1070. ), And application layer 1080.

The platform of FIG. 10B is different from that of the library 1031 of FIG. 10A and that the application layer 1080 is provided as an integrated layer, compared to FIG. 10A. In addition, the driver 1050 and the framework 1070 correspond to FIG. 10A.

On the other hand, the library 1031 of FIG. 4A may be merged into the middleware 1060. That is, the middleware 1060 is middleware under a legacy system, and middleware of MHEG or ACAP, which is data broadcasting related middleware, PSIP or SI middleware of broadcasting information related middleware, and DLNA middleware, which is related to peripheral communication, of course, image display. The middleware under the device system may include a security-related library SSL (Secure Socket Layer), a web engine-related library (WebKit), libc, a media-related library (Media Framework). On the other hand, it may further include the above-described runtime.

The application layer 1080 is an application under the legacy system, and includes a menu related application, a TV guide application, a reservation application, and the like, and an application under the video display system. The application layer 1080 may include an email, an SMS, a calendar, a map, a browser, and the like. have.

On the other hand, the application layer 1080 is an application 1091 that is stored in the image display apparatus 100 and cannot be deleted by the user, and is downloaded and stored through an external device or a network, and is free to install or delete by the user ( 1092).

The platform of FIGS. 10A and 10B described above may be used in a variety of electronic devices as well as an image display device.

Meanwhile, the platform of FIGS. 10A and 10B may be loaded in the storage 140 or the controller 170 or a separate processor (not shown).

11 is a diagram illustrating a platform structure according to an embodiment of the present invention.

In more detail, the configuration of the driver 1021 and the middleware 1022 and the channel switching operation using voice recognition are illustrated in the platform structure diagram described with reference to FIGS. 10A and 10B. Other operations besides the channel switching operation may be similarly performed.

The embedded voice engine 1110 receives the channel name and the channel information to store information necessary for channel switching, for example, a channel table 1114 for storing the physical and logical addresses of the channel (Physical Number, Major Number, Minor Number). Can be configured.

Meanwhile, the channel table 1114 may be provided inside the voice engine 1110 or separately provided.

The recognition engine 1120 may manage a voice database. The voice database may include a channel table 1114 or may operate in conjunction with the channel table 1114.

Meanwhile, unlike FIG. 11, the recognition engine 1120 may be included in the speech engine 1110. The recognition engine 1120 may be activated in the form of a task in the voice mode.

The EMVOICE API 1111 makes a function that can be used in a higher layer.

Meanwhile. Upon entering the voice channel switching mode, the EMVOICE MAIN 1112 may inform the remote controller driver 1170 of the voice channel switching mode and receive a voice signal.

Thereafter, the recognition engine 1120 may compare the voice data (PCM, etc.) with the voice database DB, determine a result with high agreement, and return the recognized result to the voice engine 1110. The channel manager 1130 may switch channels.

12 is a flowchart illustrating a method of operating an image display apparatus according to an exemplary embodiment of the present invention, and FIG. 13 is a flowchart illustrating a method of operating a server according to an exemplary embodiment of the present invention.

Referring to FIG. 12, the image display apparatus 100 that performs voice recognition using an acoustic model and a language model stored in a voice database may include external electronic devices connected through a network. For example, the server requests data related to the language model (S1210). Here, the data related to the language model may be at least a part of a language model that can be transmitted by an external electronic device.

Meanwhile, the image display apparatus 100 receives data related to the language model from the external electronic device (S1220), and updates the language model stored in the voice database based on the received data. S1230)

As described above with reference to FIG. 9, the voice recognition compares the input voice with data of a stored acoustic model, a language model, and the like, and outputs the most similar data as the recognition result data.

For example, the embedded speech engine included in the image display apparatus 100 extracts the features of the voice signal input through a microphone or a remote control device in a vector form, extracts the extracted feature vector, sound model, language model, and pronunciation. The speech recognition result may be output using a speech database including phonetic information such as a dictionary.

The image display device 100 obtains a recognition result by comparing the voice database with the feature vector. More specifically, a speech recognition result may be obtained by comparing the feature vector with a sound model for modeling and comparing signal characteristics of speech and a language model for modeling linguistic order relations of words or syllables corresponding to a recognized vocabulary. have.

However, for accurate speech recognition, the amount of data in the speech database should be large. In this case, not only the storage space is consumed, but also the search area is large, which unnecessarily consumes the resources of the image display device. This increases, and the recognition speed may slow down.

In particular, with the increase of natural language search and the like, the language model database has a significant effect on speech recognition performance. Both statistic-based language models such as n-gram models or grammar-based language models such as context-free grammars should have sufficient language models for accurate and fast speech recognition.

Therefore, the present invention does not always store and maintain the voice database used for speech recognition to the maximum, but requests for necessary data from the server when necessary, and performs an update for replacing or adding a part of the voice database to the received data. do.

Accordingly, it is possible to efficiently manage memory and resources, minimize the database search area and time in the speech recognition process, and improve the speech recognition speed.

On the other hand, the updating step (S1230), it may be characterized in that the grammar (context) or context (context) data is generated dynamically based on the received data. That is, the received data may be at least a part of grammar or context data or data required for generation, and the image display apparatus 100 converts the received data into a data form that can be used for speech recognition. can do.

The grammar or context data is data included in a language model, and grammar may define rules and orders inherent in a sentence. For example, the grammar may be defined as a BNF (Backus Naur Form) grammar. In addition, context is a general term of semantic and logical relations established between the components of a sentence, and can represent the front and rear linkage of the vocabulary.

Meanwhile, the speech recognition engine may analyze syntax and semantic analysis of the input speech using the grammar and context, and determine the most similar data as the speech recognition result.

Meanwhile, in the case of speech recognition using an embedded speech engine, a word or sentence that attempts speech recognition may not be recognized unless it is previously defined in a language model, in particular, a context. Thus, the embedded speech engine can use pre-generated (compiled) binary context data at speech mode entry or prior to system startup or boot.

According to the present invention, when the image display apparatus is driven, various kinds of information suitable for the characteristics of the image display apparatus may be taken to dynamically update a language model, particularly context data.

The context data may be generated in a language model or compiled into binary data based on the grammar data and the pronunciation dictionary data.

Meanwhile, the context data may be updated when the voice signal is input or when the voice mode is entered.

Alternatively, the updating of the context data may be performed at a predetermined time when the image display apparatus is turned on by setting.

The method of operating an image display device according to an exemplary embodiment of the present invention may include receiving a voice signal, extracting a feature vector based on the received voice signal, and comparing the feature vector with the voice database. The method may further include determining data corresponding to the received voice signal. Here, the voice signal may be received through a remote control device.

The data request step associated with the language model may include requesting a language model associated with data corresponding to the feature vector or the received voice signal. That is, when requesting data related to the language model to the server, the received voice signal may be transmitted to the server, or the data generated during the voice recognition process may be transmitted to the server to request a language model for speech recognition.

Alternatively, the data request step associated with the language model may be a request for a language model related to content currently in use.

Through the image display apparatus 100, information about content currently being used and information about a screen displayed by the image display apparatus 100 is transmitted to the server 400 to receive a language model including related vocabulary and sentences. Can be.

The probability of the vocabulary related to the image or the content displayed on the screen is higher than the probability of the vocabulary not related to the screen. As such, by using a language model having a high probability of the vocabulary related to the content, it is possible to increase the accuracy of speech recognition and to improve the speech recognition speed.

For example, when a user accesses a shopping site using an image display apparatus, the image display apparatus transmits URL information, displayed image or text information of the shopping site to a server, and searches for a product mainly used in connection with shopping. In addition, the user may receive grammar or context data including a predefined vocabulary and sentences related to a shopping-related search such as an order or payment.

Alternatively, the image display apparatus 100 may extract broadcast program information from the signal received through the tuner unit 110 and receive additional information related to the broadcast from the external server 400 based on the extracted program information.

In this case, at least some of voice information such as a language model and a word dictionary related to the additional information may be received together and stored in the embedded voice engine. Thereafter, the user may input a command by voice while watching a broadcast.

Meanwhile, the present invention may grasp content by using various well-known ACR (Auto content recognition) technology and request voice information related to the content from the server.

The image display apparatus 100 may update the language model using the received grammar or context data and use the same to recognize the received speech signal.

That is, a language model for speech recognition may be dynamically generated according to the user's intention, and the speed of speech recognition and the accuracy of the speech recognition result may be improved.

The method of operating an image display device according to an exemplary embodiment of the present invention may further include selecting one or more external electronic devices to request the data from among a plurality of external electronic devices connected through the network.

That is, the image display apparatus 100 may request data from a server having the highest probability of having necessary data from a plurality of connected servers as shown in FIG. 1B. For example, the language model related data may be requested from one of various servers such as an SDP server, an IBIS server, and an EPG server that provides a content related service.

On the other hand, if it fails to receive the necessary data may be able to request data again from another server.

In addition to the electronic devices connected through the network, the update may be performed through an external storage medium connected through the external device interface unit 130.

On the other hand, the operation method of the image display apparatus according to an embodiment of the present invention, further comprising the step of receiving a voice signal and using the voice database, the step of recognizing the voice signal, the data request associated with the language model The method may include requesting a language model including data corresponding to the voice signal from the external electronic device when the recognition of the voice signal fails or the confidence value of the processed recognition result is lower than a reference value. have.

That is, the data related to the language model may be requested and the language model may be updated only when the speech recognition of the embedded speech recognition engine fails or when it is determined that the result is not satisfactory.

The confidence value may be determined, for example, as a difference or distance between a feature vector of the input voice signal and a nearest vector corresponding to the voice recognition result data.

Alternatively, the operation method of the image display apparatus according to an embodiment of the present invention, the step of receiving a voice signal, using the voice database, the step of recognizing the voice signal, the recognition of the voice signal fails, or the recognition result When the confidence value of is less than or equal to the reference value, transmitting the data based on the voice signal to the external electronic device, and receiving the voice recognition result data from the external electronic device.

That is, when speech recognition of an embedded speech recognition engine fails or the result is determined to be not satisfactory, the speech recognition may be requested to the external electronic device and the data may be received as a result.

The data based on the voice signal may be a feature vector extracted from the voice signal or the voice signal or the recognition result. That is, the received voice signal may be transmitted to the server, or the feature vector or recognition result by voice recognition of the embedded speech recognition engine may be transmitted to the server.

Alternatively, the operation method of the image display apparatus according to an embodiment of the present invention, receiving a voice signal, using the voice database, recognizing the voice signal, the data based on the voice signal to the external electronic device Transmitting a voice signal, receiving voice recognition result data from the external electronic device, recognizing the voice signal using the voice database, or if a confidence value of the recognition result is equal to or less than a reference value. The method may further include using the result data.

In this embodiment, the received voice signal is processed by the speech recognition engine while transmitting related data to the external electronic devices in parallel. If the speech recognition result of the speech recognition engine is less than or equal to the reference value, the speech recognition result data may be used as the response data of the external electronic device.

Meanwhile, in the data request step S1210 related to the language model, data including sentence structure information of the voice database may be transmitted. Here, the sentence information is a type classified by classifying the arrangement and the combination of language elements in a sentence and may be part of the context data of the embedded voice database.

On the other hand, the received data in the receiving step (S1220) may be data containing a high frequency or high similarity in meaning structure used with the words included in the sentence information.

Referring to FIG. 13, in the method of operating an external electronic device, in particular, the server 400 according to an exemplary embodiment of the present invention, the method includes receiving a request for data related to a language model for speech recognition from an image display device (S1310). The method may include transmitting response data to the image display apparatus according to a request (S1320) and updating a database based on the request and response data (S1330). In addition, the method may further include searching a database provided in the storage unit 420.

On the other hand, in the update step (S1330), the request details and response details can be stored in association with the identification information of the video display device. The server 400 may transmit / receive data related to voice recognition with a plurality of image display apparatuses, store a request and a response, and may statistically process the stored data. In other words, by increasing the priority of the data with a large number of requests and searching the data in the order of higher priority, the search time of the data can be reduced, and the accuracy can be further increased.

The response data may be a language model including data corresponding to data included in the request. That is, the request data may include various data according to the embodiment. For example, the voice signal itself may be received, a signal from which noise is removed, or a signal of various stages such as a feature vector may be included. In this case, the server 400 may search for data including corresponding data after performing a predetermined process of the speech recognition process in order to determine the required language model.

Meanwhile, the method of operating a server according to an exemplary embodiment of the present invention may further include receiving data based on the voice signal from the image display apparatus and determining voice recognition result data corresponding to the data based on the voice signal. The method may further include storing at least one of a confidence value, a frequency of use, and a retry rate of the speech recognition result data.

On the other hand, the request receiving step (S1310), may receive data including the sentence pattern information of the audio database provided in the image display device, in this case, the server 400 and the words included in the sentence information; After searching for a word having a similarity in frequency or semantic structure that is used together with a higher value than a reference value, a language model including the retrieved data may be transmitted to the image display apparatus 100.

14 is a flowchart illustrating a method of operating an image display device system according to an exemplary embodiment of the present invention, and FIG. 15 is a view referred to for describing an example of an operation method of an image display device system according to an exemplary embodiment of the present invention.

Referring to the drawings, when the image display apparatus 100 enters the voice mode (S1410), the image display apparatus 100 may transmit a request signal for at least some of the language model data to the server (S1420).

The server 400 may search for language model data determined to be necessary for the embedded voice engine of the image display apparatus 100 based on the data included in the request signal (S1430) and transmit the same to the image display apparatus 100. .

On the other hand, the image display device 100 receiving the language model data may update the language model of the audio database (S1450).

In some embodiments, the server 400 may transmit a language model in the form of a text string.

In this case, the image display apparatus stores the received data, generates a pronunciation symbol and a pronunciation dictionary for a string included in the received data, and then based on the generated pronunciation symbol and pronunciation dictionary, Generate grammar or context data.

FIG. 15A illustrates a part of a string received from a server, and FIG. 15B illustrates phonetic symbols and pronunciation dictionary data for the string illustrated in FIG. 15A.

Meanwhile, grammar or context data may be generated as binary data and updated as shown in FIG. 15C.

16 is a flowchart illustrating a method of operating an image display apparatus system according to an exemplary embodiment of the present invention.

When the image display apparatus 100 receives a voice signal (S1610), first, the image display apparatus 100 may attempt to recognize a voice by using an embedded voice engine provided therein. (S1620) The embedded voice engine extracts a feature vector of the received voice signal. The most similar (high probability) result can be output as a recognition result compared with the acoustic model and the language model of the speech database.

On the other hand, when the confidence value of the recognition result output by the speech engine is greater than the predetermined reference value R, the corresponding operation, for example, text input or channel switching is performed using the output data of the speech engine as the speech recognition result. Or an operation such as executing a predetermined command can be executed (S1640).

On the other hand, when the speech recognition fails or when the confidence value of the recognition result output by the speech engine is equal to or less than the predetermined reference value R (S1650), the image display apparatus 100 displays the language model required for the speech recognition in the server 400. Or it may request to perform speech recognition (S1650).

Meanwhile, the server 400 may search for a language model or perform voice recognition according to the request of the image display apparatus 100 (S1660), and the server 400 may display data corresponding to the request of the image display apparatus. Can be sent to the display device.

Thereafter, the server 400 may update the request and response details in the database (S1670).

17 to 24 are views referred to for describing various examples of an operating method of an image display device system according to an embodiment of the present invention.

Referring to FIG. 17, the remote controller driver 1170 may simultaneously and sequentially transmit voice data received to an embedded voice engine and a server voice client operating in association with the server 400. The server voice client may be activated in a task form in order to process data with the server 400 quickly.

On the other hand, when the speech recognition of the embedded speech engine fails, it is possible to proceed with the server speech recognition process. Alternatively, when the confidence value of the result of the embedded speech recognition is less than the reference value, the embedded process may be terminated and the result of the server speech recognition process may be awaited. Meanwhile, the confidence value may be an evaluation value of the system for the speech recognition result.

Since the response time of the embedded voice engine is faster than that of the server, the response time of the embedded voice engine can be expected to be faster.

For example, a language model related to control of a video display device such as channel switching and volume control, a language model related to broadcast content such as a channel name and a popular program name are stored in an internal audio database, and a corresponding audio input is performed. Voice recognition and command operation can be performed quickly. On the other hand, other voice input may use an external server.

While watching a predetermined channel with the video display device 100, when a user inputs a voice of “channel K-B-S”, the voice is quickly recognized using the voice database stored therein. Likewise, the user can quickly change to the channel he wants to watch.

On the other hand, as shown in (b) of FIG. 18, in the case of inputting a search word, since it is difficult for a user to grasp the contents to be input by text in advance, a language model or a voice recognition may be requested to the server after the user's voice input.

On the other hand, the server 400 may store the results obtained after the server speech recognition process. The server 400 may manage various voice signals, request details, and voice recognition result information obtained from various clients with a voice history database.

Meanwhile, the server 400 may apply a weighted value to each item of the voice description database or apply a probability model to later requests.

The server 400 may generate a language model with an item having a higher weight or a probability, and transmit a language model to transmit or update a list for updating the speech engine to each client. The client can select the required item from the list or update the language model with the received data.

On the other hand, natural language processing (NLP) is a technology that deals with the understanding and analysis of human language by using speech-to-text (STT) results. Operation may be required.

Therefore, it is more effective to use an external server for such natural language processing.

19 to 21, the remote controller driver 1170 transmits a voice signal (“MBC turn on”) received through the remote controller to the voice engine, and the voice engine, if necessary, the voice signal. By transmitting to the STT server, the voice recognition result in the form of a text can be received from the STT server.

On the other hand, the image display apparatus 100 may transmit a voice recognition result "MBC please" to the natural language processing server (NLU Server), and receive data for natural language processing according to the user's intention. have. In this case, the natural language processing server may transmit data in the form of an XML document.

Meanwhile, the XML parser of the speech engine parses only a portion of the received XML document that requires an operation. As shown in FIG. 20, the channel switching may be performed after the TV determines that the channel switching to the MBC channel corresponds to the voice input intention of the user.

On the other hand, in order to perform scheduled recording of a predetermined broadcast program, the user must select a program name, date, start time, end time, etc., but the system understands and analyzes the meaning of the voice input requesting recording and the user intends to You can run one operation at a time.

When the user performs a voice input "Record this week's infinite challenge" through the remote control device 200, the user conventionally had to select and set the corresponding broadcast program from a predetermined menu or manually input various information.

However, according to the embodiment of the present invention, as shown in FIG. 21, the voice recognition result processed by the internal speech engine or an external STT server is transmitted to the natural language processing server, and the type of command determined by the natural language processing server and The broadcast program information retrieved in association with each other may be received.

The image display apparatus 100 may display the received information on the screen, and perform the operation without input of additional information after confirmation of the user.

The video display device 100 extracts program name, channel name, broadcast station name, time information, etc. from broadcast signal related information, for example, an EIT included in the broadcast signal, from a signal received through the tuner unit 110. Based on this, additional information related to the broadcast may be requested and received from the external server 400.

In this case, at least some of voice information such as a language model and a word dictionary related to the broadcasting related information may be received together and stored in the embedded voice engine. Thereafter, the user may input a command by voice while watching a broadcast, and the embedded voice engine may perform voice recognition more quickly and efficiently based on the received and updated data.

Alternatively, when the content currently being used by the user through the image display apparatus 100 is web surfing, the image display apparatus 100 extracts a URL address or text information in a web page and the like, and adds the related information from the external server 400. Information and voice information can be received.

For example, the user may also receive related word dictionaries and language models while receiving, as additional information, information related to the actor appearing in the broadcast program being watched by the user as information on other actors' works.

Alternatively, when a user uses a specific program such as a word processor through the image display apparatus 100, the image display apparatus 100 receives information such as a language model mainly used in the specific program from the external server 400. can do.

Alternatively, the present invention may grasp content by using various known ACR (Auto content recognition) technology and request voice information related to the content from the server.

Meanwhile, the image display apparatus 100 may store voice information related to content, and may update a voice recognition engine and / or a language model. Accordingly, the voice input related to the content used by the user can be recognized and processed more quickly and accurately.

19 to 21 illustrate an example of using a natural language processing server, the present invention is not limited thereto. As shown in the above-described embodiments, the image display apparatus 100 may be requested after receiving and receiving necessary data from the server. ) Can also handle natural language directly.

According to an exemplary embodiment of the present invention, the pointer displayed on the screen may be moved using the remote control apparatus 200, and a specific area of the screen may be selected.

Referring to FIG. 22A, when a user designates a portion 2220 of an image 2210 displayed by the image display apparatus 100 using the remote control apparatus 200 or the like, the image display apparatus ( 100 and the remote control device 200 may automatically switch to the voice input possible state, that is, the voice mode.

As shown in (b) of FIG. 22, when the user inputs a voice of "search", the image display apparatus 100 may perform an internet search operation by using a voice recognition result and an image of an area designated by the user. In addition, as shown in (c) of FIG. 22, the viewing video 2230 and the result 2240 corresponding to the voice command may be displayed together.

Alternatively, as shown in FIG. 23, the search result 2340 may be presented by referring not only to the region 2320 specified by the user in the predetermined image 2310 but also an image 2330 such as displayed text of the remaining area of the screen. .

Also in this case, similarly to the above-described embodiment, when the user is watching an image or designates a predetermined region, the image display apparatus may use a language model corresponding to the image attribute, for example, "edit", "save", " A language model including a vocabulary and sentences such as "search", "upload", "cut", "paste", and "note" may be requested and received from the server.

The image display apparatus 100 may update the database and enter the voice mode based on the received language model.

In addition, as shown in FIG. 24, another program may be driven based on a voice input of a user.

If a user inputs "edit" in a state where the user is viewing the Internet screen 2410 through the image display apparatus 100, the image display apparatus may drive a "editable" program such as a word processor. In addition, the text included in the Internet screen 2410 may be parsed and inserted into the word processor screen 2420 in an editable form 2421 for display.

Also in this case, similarly to the above-described embodiment, while driving the word processor program, the video display device 100 is similar to a word or vocabulary associated with the word processor, for example, "change font", "cut", "copy". The server may request and receive a language model including a vocabulary or a sentence such as "," paste "," change font color "or" font size 14 ".

According to the present invention, the voice database used for speech recognition is not always stored and maintained at maximum. Instead, a dynamic update requesting data required by the server when necessary and substituting or adding a part of the voice database to the received data is performed. Can be done.

Therefore, it is possible to operate the video display device accurately and conveniently through the voice recognition technology, and to efficiently manage internal resources, thereby improving user convenience.

The method of operating the image display device and the server according to the embodiment of the present invention is not limited to the configuration and method of the embodiments described above, but the embodiments may be modified in various ways. All or some of these may optionally be combined.

On the other hand, the operating method of the image display device and the server of the present invention can be implemented as processor-readable code in a processor-readable recording medium provided in the image display device and the server. The processor-readable recording medium includes all kinds of recording apparatuses in which data that can be read by the processor is stored. Examples of the recording medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and may also be implemented in the form of a carrier wave such as transmission over the Internet . In addition, the processor-readable recording medium may be distributed over network-connected computer systems so that code readable by the processor in a distributed fashion can be stored and executed.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

Claims

In the operating method of the image display apparatus for performing speech recognition using an acoustic model and a language model stored in a speech database,
Requesting data associated with the language model from an external electronic device connected through a network;
Receiving data related to the language model from the external electronic device; And
Updating the language model stored in the voice database based on the received data.

The method of claim 1,
And wherein said updating step dynamically generates grammar or context data based on the received data.

The method of claim 1,
Receiving a voice signal;
Extracting a feature vector based on the received speech signal; And
And comparing the feature vector with the speech database to determine data corresponding to the received speech signal.

The method of claim 3,
And the voice signal is received through a remote control device.

The method of claim 3,
The requesting data related to the language model may include requesting a language model related to data corresponding to the feature vector or the received voice signal.

The method of claim 1,
The requesting data related to the language model may include requesting a language model related to content currently being used.

The method of claim 1,
Storing the received data;
Generating a phonetic symbol and a phonetic dictionary for a string included in the received data;
And generating grammar or context data based on the generated phonetic symbols and phonetic dictionaries.

The method of claim 1,
Selecting one or more external electronic devices to request the data from among a plurality of external electronic devices connected through the network.

The method of claim 1,
Receiving a voice signal; and
Recognizing the voice signal using the voice database;
The data request step associated with the language model may include: a language model including data corresponding to the voice signal with the external electronic device when the recognition of the voice signal fails or the confidence value of the processed recognition result is equal to or less than a reference value. Requesting an image display device.

The method of claim 1,
Receiving a voice signal;
Recognizing the speech signal using the speech database;
Transmitting data based on the voice signal to the external electronic device when the recognition of the voice signal fails or the confidence value of the recognition result is equal to or less than a reference value; And
And receiving voice recognition result data from the external electronic device.

The method of claim 10,
And the data based on the speech signal is a feature vector extracted from the speech signal or the speech signal or the recognition result.

The method of claim 1,
Receiving a voice signal;
Recognizing the speech signal using the speech database;
Transmitting data based on the voice signal to the external electronic device;
Receiving voice recognition result data from the external electronic device; And
Using the received voice recognition result data when the recognition of the voice signal fails using the voice database or the confidence value of the recognition result is lower than a reference value.

The method of claim 1,
And requesting data related to the language model comprises transmitting data including sentence structure information of the speech database.

14. The method of claim 13,
And the received data is data including words frequently used in conjunction with words included in the sentence information or words having high similarity in meaning structure.

Receiving a request for data related to a language model for speech recognition from an image display device;
Transmitting response data to the video display device according to the request; And
Updating a database based on the request and response data.

16. The method of claim 15,
The updating may include storing request details and response details in association with identification information of the image display apparatus.

16. The method of claim 15,
And the response data is a language model including data included in the request.

16. The method of claim 15,
Receiving data based on the audio signal from the image display device;
And determining voice recognition result data corresponding to the data based on the voice signal.

19. The method of claim 18,
And storing at least one of a confidence value, a frequency of use, and a retry rate of the speech recognition result data.

16. The method of claim 15,
The request receiving step may include receiving data including sentence structure information of a voice database provided in the image display apparatus.

21. The method of claim 20,
And searching for a word having a frequency or semantic similarity higher than a reference value used with the words included in the sentence information.