WO2017146454A1 - Procédé et dispositif de reconnaissance de contenu - Google Patents
Procédé et dispositif de reconnaissance de contenu Download PDFInfo
- Publication number
- WO2017146454A1 WO2017146454A1 PCT/KR2017/001933 KR2017001933W WO2017146454A1 WO 2017146454 A1 WO2017146454 A1 WO 2017146454A1 KR 2017001933 W KR2017001933 W KR 2017001933W WO 2017146454 A1 WO2017146454 A1 WO 2017146454A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- text
- screen
- information
- recognizing
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 56
- 239000000284 extract Substances 0.000 claims abstract description 29
- 238000004891 communication Methods 0.000 claims description 43
- 238000004458 analytical method Methods 0.000 claims description 26
- 238000010586 diagram Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 19
- 238000012545 processing Methods 0.000 description 18
- 238000013523 data management Methods 0.000 description 12
- 230000003287 optical effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000003321 amplification Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/251—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/252—Processing of multiple end-users' preferences to derive collaborative data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
- H04N21/25891—Management of end-user data being end-user preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44222—Analytics of user selections, e.g. selection of programs or purchase activity
- H04N21/44224—Monitoring of user activity on external systems, e.g. Internet browsing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/658—Transmission by the client directed to the server
- H04N21/6582—Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
- H04N21/8405—Generation or processing of descriptive data, e.g. content descriptors represented by keywords
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4826—End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score
Definitions
- the disclosed embodiments are directed to a method for a device to identify content, a method for a server to identify content, a device for identifying content, and a server for identifying content.
- Advertisers need to know what content they are consuming in order to provide customized advertisements to meet the needs of various consumers.
- Conventional fingerprint-based content recognition technology extracts a fingerprint from an image or audio of a content currently being played by a display device such as a TV, transmits the fingerprint to a server, and matches it with reference data in a database of the server. Recognized whether it is content. Based on this result, the content consumption or viewing pattern of the display device can be analyzed, and the advertiser can effectively provide customized advertisements based on the analysis result.
- the disclosed embodiment provides a method of controlling a device that provides content capable of obtaining information of content watched by a user more efficiently by acquiring information of content watched by a user of the device through a screen of the captured device, and To provide a device and server.
- the screen of the device When a control signal for controlling at least one content provided by the device is received, the screen of the device is captured, and when the captured screen corresponds to the template screen, a character string including content information is included in a preset area of the captured screen. Extracts and compares the extracted character string with at least one text included in the preset semantic recognition model, detects text corresponding to the content information, and recognizes the content displayed on the screen of the device based on the detected text. Is initiated.
- FIG. 1 is a conceptual diagram illustrating a system for recognizing content according to an exemplary embodiment.
- FIG. 2 is a flowchart illustrating a method of controlling a device for providing content according to an exemplary embodiment.
- 3A to 3C are diagrams for describing a template screen provided for each content service for providing content.
- FIG. 4 is a flowchart for describing a method of extracting content information from a captured screen by a device providing content according to another exemplary embodiment.
- FIG. 5 is a diagram for describing a method of extracting content information from a screen captured by a device according to another exemplary embodiment.
- FIG. 6 is a flowchart illustrating a method of recognizing a content using a preset semantic recognition model according to an embodiment.
- FIG. 7 is a flowchart illustrating a method of recognizing content using a template screen corresponding to a content service, according to an exemplary embodiment.
- FIG. 8 is a flowchart illustrating a method of recognizing content by a device according to an exemplary embodiment.
- 9A to 9D are diagrams for describing a method of recognizing content by a device using a semantic recognition model, according to an exemplary embodiment.
- 10A and 10B are block diagrams of devices for recognizing content according to an embodiment.
- FIG. 11 is a flowchart illustrating a method of recognizing content provided to a device by a semantic recognition server according to an embodiment.
- FIG. 12 is a block diagram illustrating a semantic recognition server recognizing content provided to a device, according to an exemplary embodiment.
- FIG. 13 is a conceptual diagram illustrating a system for recognizing content displayed on a device, according to an exemplary embodiment.
- FIG. 14 is a block diagram illustrating in more detail an operation of a semantic recognizer included in a processor of a semantic recognition server, according to an exemplary embodiment.
- 15 is a block diagram illustrating in more detail an operation of a content data management module included in a semantic recognizer of a semantic recognition server, according to an exemplary embodiment.
- FIG. 16 is a diagram for describing a method of processing text based on a semantic recognition model based on text data in a semantic recognition server, according to an exemplary embodiment.
- a method of recognizing a content may include: capturing a screen of a device as a control signal for controlling at least one content provided by the device is received; If the captured screen corresponds to a template screen, extracting a string including content information from a preset area of the captured screen; Comparing the extracted character string with at least one text included in a preset semantic recognition model and detecting text corresponding to content information; And recognizing content displayed on the screen of the device based on the detected text.
- a method of recognizing a content may include: receiving, from a device, a string including content information extracted from a captured screen, as the screen captured by the device corresponds to a template screen; Comparing the received string with at least one text included in a preset semantic recognition model and detecting text corresponding to content information; And recognizing content displayed on the screen of the device based on the detected text.
- a method of recognizing content by a server may further include receiving voice data of a user viewing content of a device, and recognizing the content may include detected text and voice data of the received user. Recognize the content displayed on the screen of the device based on.
- the method of recognizing content by the server may further include obtaining content data at a predetermined period from an external server, and the semantic recognition model is updated based on the acquired content data at a predetermined period.
- a method for recognizing content by a server may include: information of a user viewing at least one content, information of a device, a viewing time of at least one content to be recognized, and a content service providing at least one content Acquire additional information including at least one of size information of each character in a string including the recognition information and the content information.
- the detecting of the text may include comparing the extracted content information with the at least one text to determine a probability value for each of the at least one text corresponding to the extracted content information. Calculating; And detecting any one of the at least one text based on the calculated probability value.
- a string including the content information extracted from another capture screen corresponding to the template screen from a device.
- recognizing content includes changing information that does not correspond to text detected in the extracted content information based on the detected text.
- the method according to an embodiment of the present disclosure further includes transmitting the content recognition result to the viewing pattern analysis server.
- the method according to an embodiment of the present disclosure further includes receiving, from the viewing pattern analysis server, viewing pattern history information of the user of the device generated by the viewing pattern analysis server based on the content recognition result.
- a device for recognizing content includes a communication unit configured to receive at least one content; A display unit displaying any one of at least one content; And capturing a screen of the device as a control signal for controlling at least one content is received, and extracting a character string including content information from a predetermined area of the captured screen when the captured screen corresponds to a template screen. And a processor configured to compare the extracted character string with at least one text included in the preset semantic recognition model, detect text corresponding to the content information, and recognize content displayed on the screen of the device based on the detected text. .
- a server recognizing a content may include: a communication unit configured to receive, from the device, a string including content information extracted from a captured screen as a screen captured by the device corresponds to a template screen; And a processor configured to compare the received content information with at least one text included in a preset semantic recognition model, detect text corresponding to the content information, and recognize content displayed on a screen of the device based on the detected text. Include.
- the communication unit receives the voice data of the user viewing the content of the device, the processor, based on the detected text and the received voice data of the user on the screen of the device Recognize the displayed content.
- the communication unit obtains the content data at a predetermined period from an external server, and the semantic recognition model recognizes the content to be updated based on the content data obtained at the predetermined period. .
- the communication unit may provide information of a user viewing at least one content, device information, viewing time of at least one content to be recognized, and at least one content. Additional information including at least one of size information of each character is obtained from a character string including recognition information of the content service and content information.
- the processor Comparing the extracted content information with the at least one text, Computing a probability value of each of the at least one text corresponding to the extracted content information; And at least one text based on the calculated probability value.
- the processor may further include receiving a string including the content information extracted from another capture screen corresponding to the template screen from the device. Include.
- the processor changes information that does not correspond to the detected text in the extracted content information based on the detected text.
- the communication unit transmits the content recognition result to the viewing pattern analysis server.
- the communication unit recognizes the content received from the viewing pattern analysis server viewing pattern history information of the user of the device generated by the viewing pattern analysis server based on the content recognition result .
- any part of the specification is to “include” any component, this means that it may further include other components, except to exclude other components unless otherwise stated.
- the terms “... unit”, “module”, etc. described in the specification mean a unit for processing at least one function or operation, which may be implemented in hardware or software or a combination of hardware and software. .
- FIG. 1 is a conceptual diagram illustrating a system for recognizing content according to an exemplary embodiment.
- the device 100 may be a TV, but this is only an example and may be implemented as an electronic device including a display.
- the device 100 may be a mobile phone, a tablet PC, a digital camera, a camcorder, a laptop computer, a tablet PC, a desktop, an e-book device, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia (PMP).
- PDA personal digital assistant
- PMP portable multimedia
- the electronic device may be implemented as various electronic devices such as a player, a navigation device, an MP3 player, a wearable device, and the like.
- embodiments may be easily implemented in a display device having a large display, such as a TV, but is not limited thereto.
- the device 100 may be fixed or mobile and may be a digital broadcast receiver capable of digital broadcast reception.
- the device 100 may receive a content service through the set top box 10.
- the content service may include not only real-time broadcast content services provided by terrestrial broadcast signals, cable broadcast set top boxes, and IPTV set top boxes, but also content services in the form of web applications such as Netflix and YouTube.
- the content service is only an example and the present invention is not limited thereto.
- the device 100 may receive at least one content from the service providing apparatus 10.
- the device 100 may display the content selected by the user of the device 100 among the at least one received content on the screen of the device 100.
- a user of the device 100 may transmit a control signal for controlling at least one content provided by the device 100 to the device 100.
- the control signal may include a remote control signal, a touch signal of a smart phone and a wearable device, a voice command, a gesture recognition signal, a complex sensor signal, and the like, but this is only an embodiment, and the control signal of the present invention is limited thereto. It is not.
- the device 100 may capture a screen of the device 100 on which content is displayed.
- the device 100 may capture a screen of the device 100 in which content is displayed on a predetermined time unit.
- the device 100 may determine whether the pre-stored template screen and the captured screen correspond to each other.
- the template screen may be preset with information about a text area in which information about content is displayed on the screen.
- the template screen may be different according to the type of content service for transmitting content to the device 100. E.g. The template screen for the content service A and the template screen for the content service B may be different from each other.
- the device 100 checks the type of content service and selects a template of the previously identified content service. An operation of capturing a screen on which content is displayed may be repeatedly performed until a screen corresponding to the selected template screen is captured. When the screen corresponding to the template screen is captured, the device 100 may stop capturing the screen of the device 100.
- the device 100 may extract a character string including content information from a preset area on the captured screen corresponding to the template screen.
- the preset area is an area including content information such as title name or channel name of the content.
- the content information may be displayed in text form.
- the device 100 may recognize a text displayed in a preset area and extract a string including content information.
- the device 100 may extract information on a type, title, and genre of content by reading text on a predetermined area of the captured screen by using an optical character reader (OCR).
- OCR optical character reader
- the device 100 may transmit the extracted character string to the meaning recognition server 200.
- the semantic recognition server 200 may recognize content viewed by a user of the device 100 based on a character string received from the device 100.
- the device 100 may detect a text corresponding to the content information by comparing a string including the content information with at least one text included in a preset semantic recognition model.
- the device 100 may recognize content displayed on the screen of the device 100 based on the detected text.
- the device 100 may increase the accuracy of content recognition by using a preset semantic recognition model.
- the semantic recognition server 200 may recognize a content by comparing a preset semantic recognition model with a received string. For example, the semantic recognition server 200 may select text corresponding to content information included in the received string from among at least one text included in the preset semantic recognition model. The meaning recognition server 200 may recognize content using the selected text.
- the selected text may be, for example, text indicating at least one of a title of the content and a type of channel through which the content is provided.
- the semantic recognition server 200 may transmit the content recognition result to the viewing pattern analysis server 300.
- the viewing pattern analysis server 300 may determine the viewing pattern of the user by obtaining a content recognition result from the semantic recognition server 200 for a predetermined period and analyzing the obtained content recognition result.
- FIG. 2 is a flowchart illustrating a method of controlling a device 100 (hereinafter, referred to as a device) that provides content according to an embodiment.
- the device 100 receives a control signal for controlling at least one content provided from the device 100.
- the device 100 may receive a control signal for controlling at least one content provided by the device 100 from an input device.
- the device 100 may receive a control signal for controlling at least one content provided from the device 100 from the remote controller.
- the control signal may be a channel change signal of the device 100, a power on signal of the device 100, a connection signal between another device and the device 100, a menu selection signal of the device 100, and at least one It may be one of request signals for content information.
- the device 100 captures the screen of the device 100 at a predetermined cycle.
- the device 100 may capture a screen of the device 100 on which content is displayed.
- the screen of the device 100 may display at least one of a title name, a channel name, a content list, content related additional information, and a user interface recognition for selecting content of at least one content provided from the device 100.
- this is only an example and the information displayed on the screen of the device 100 is not limited thereto.
- the device 100 may repeatedly capture a screen of the device 100 according to a preset period.
- the device 100 may capture the screen of the device 100 at two second intervals from the time when the control signal is received.
- the device 100 extracts a string including content information from a preset area of the captured screen.
- the template screen may be different according to the type of content service that provides at least one content to the device 100.
- information about an area in which content information provided by a specific content service is displayed may be preset on the template screen. For example, in the case of the template screen for the A content service, information indicating an area in which content information is displayed on the upper left of the screen may be preset.
- the device 100 may stop capturing the screen.
- the screen corresponding to the template screen may be a screen on which at least one of the image and the text is displayed at a position corresponding to at least one position of the image and the text displayed on the template screen.
- the device 100 may stop capturing the screen.
- the device 100 may extract a string including content information displayed in a predetermined area of a captured screen corresponding to a template screen.
- the device 100 recognizes content displayed on the screen of the device 100 based on content information included in the extracted character string.
- the device 100 may read text displayed on the extracted character string by using the OCR.
- the content information may include information about a title, type, and genre of the content.
- this is merely an example and content information is not limited thereto.
- 3A to 3C are diagrams for describing a template screen provided for each content service for providing content.
- the device 100 (hereinafter, referred to as a device) for providing content receives a control signal from an input device, the device 100 (hereinafter, referred to as a device) may have a predetermined cycle until a screen 310 corresponding to the template screen 320 is captured. You can take a screenshot.
- the device 100 may select a template screen 320 stored in advance for the A content service providing the content.
- the bounding box may be a template of the A content service. Therefore, as a result of periodically matching the captured screen to a template in the form of a bounding box, the device 100 extracts the text 312 from the preset content information display area 322 when the captured screen matches the template. You can extract a string containing information.
- the device 100 may capture the screen according to a preset period until the time when the screen 330 corresponding to the template screen 340 is captured. have.
- the device 100 may select a template screen 330 stored in advance for the B content service providing the content.
- a web application type content service such as Netflix in a smart TV or smart phone
- it may have a screen having the same UI layout.
- a template may be created from a UI layout of a screen immediately before content is played.
- the template may be periodically matched with the captured screen, and if matched, the text may be recognized from the preset content information display area 342 to extract a character string including the content information.
- the device 100 may extract the text 332 from the upper left area corresponding to the content information display area 342. .
- the device 100 may recognize the content displayed on the device 100 by reading the extracted text 432.
- the device 100 may capture the screen according to a predetermined cycle until the time when the screen 350 corresponding to the template screen 360 is captured.
- the device 100 may select a template screen 350 stored in advance for the C content service providing the content.
- the device 100 may create a template from a UI layout screen at a specific point in time before content is played back.
- the template screen 360 thus created may be stored in the device 100, which is a template screen.
- the image 360 may be periodically matched with the captured screen 350, and if it matches, the text may be extracted from the preset content information display area 362 to recognize the content.
- FIG. 4 is a flowchart illustrating a method of extracting content information from a captured screen by a device 100 (hereinafter, referred to as a device) that provides content according to another embodiment.
- the device 100 may receive a control signal for controlling at least one content provided by the device 100.
- the device 100 may detect an area around the point.
- the device 100 may determine whether the received control signal is a control signal pointing to the screen of the device 100. As the device 100 determines that the received control signal is a control signal pointing to the screen of the device 100, the device 100 may detect a peripheral area located within a preset range from the pointed point. In operation S430, the device 100 may extract a string including content information from the detected area.
- the device 100 may recognize content displayed on the screen of the device 100 based on the extracted string.
- the device 100 may read text displayed on the extracted character string by using a text reading technique such as OCR.
- a text reading technique such as OCR.
- this is only an example and the method of recognizing the content based on the string extracted by the device 100 is not limited to the above-described example.
- FIG. 5 is a diagram for describing a method of extracting content information from a captured screen by the device 100 according to another exemplary embodiment.
- the device 100 may receive a control signal 512 for controlling at least one content provided from the device 100.
- the device 100 may detect a peripheral area 522 of the pointed point.
- the device 100 may determine whether the received control signal 512 is a control signal pointing to the screen of the device 100. As the device 100 determines that the received control signal 512 is a control signal pointing to the screen of the device 100, the device 100 may detect the peripheral area 522 located within a preset range from the pointed point. .
- the device 100 may read content information extracted from the detected peripheral area 522 to recognize text representing information about a contact. For example, the device 100 may recognize the text indicating the information about the content and confirm that the title of the content is Kung Fu OO 2.
- FIG. 6 is a flowchart illustrating a method of recognizing content by using a preset semantic recognition model, according to an exemplary embodiment.
- the device 100 captures a screen of the device 100 as a control signal for controlling at least one content provided from the device 100 is received.
- the device 100 may receive a control signal for controlling at least one content. Also, the device 100 may determine a type of content service that provides at least one content to the device 100 based on the received control signal.
- the type of content service may be any one of a web-based video on demand (VOD) service, a live service, and an application-based service.
- VOD web-based video on demand
- the type of content service is not limited thereto. Accordingly, a template screen may be selected according to the type of content service, and matching between the template screen and the captured screen may be performed.
- the device 100 may capture the screen of the device 100 at a predetermined cycle.
- the device 100 may determine whether the captured screen corresponds to the template screen.
- the template screen may be different according to the type of content service.
- the device 100 extracts a string including content information from a preset area of the captured screen.
- the device 100 may extract a string including content information from an area of a captured screen corresponding to an area where content information is displayed on a template screen.
- the content information may include, for example, text such as the title of the content and the name or number of the channel on which the content is provided.
- the device 100 compares the extracted character string with at least one text included in a preset semantic recognition model and detects text corresponding to content information.
- the device 100 may preset a semantic recognition model.
- the semantic recognition model may include at least one text for recognizing content.
- the semantic recognition model may include at least one text indicating a title of content currently provided and a channel name or number on which content is provided.
- the semantic recognition model may be set differently according to the ID of the device and the ID of the user. For example, if the user is a woman in her 20s, the device 100 may select a semantic recognition model including at least one text indicating a title, a channel type, and the like of the content preferred by the woman in her 20s among the plurality of semantic recognition models. Can be.
- the device 100 may detect text included in a string using a format pattern preset for a template screen from the extracted string.
- the preset format pattern may be included in the semantic recognition model.
- the device 100 may detect text corresponding to the channel name and title name from the extracted character string.
- the device 100 may not correspond to at least one text included in the extracted text string with a format pattern preset for the template.
- the device 100 may detect text from the string by using a probability model that probabilistically calculates the relation of surrounding words in the string.
- the device 100 may extract the text that the actor's name is A and the broadcast name is B, based on the probability model, from the string B exclusive broadcast of A starring.
- the device 100 recognizes content displayed on the screen of the device 100 based on the detected text.
- the device 100 may determine the detected text as a title of content displayed on the screen of the device 100.
- the device 100 may verify the accuracy of the detected text by comparing the detected text among the at least one text included in the preset meaning recognition with the text having the highest similarity.
- the similarity may be determined according to the type of consonants and vowels between the texts and the coupling ratio.
- the device 100 may detect a kung fu having the highest similarity among at least one text included in the semantic recognition model. The device 100 compares the content information extracted from the captured screen with the semantic recognition model and detects text from the semantic recognition model, thereby correcting misunderstandings included in the received content information.
- the device 100 may verify the detected text based on the received additional information. For example, when the kung fu is detected as the title of the content, the device 100 determines whether the kung fu is broadcasted at the viewing time based on the information about the viewing time included in the additional information, and detects the detected text. Can be verified.
- the device 100 may verify the detected text based on the detected voice data. For example, while the kung fu is detected as the title of the content, it may be determined whether the voice data sensed by the device 100 indicates the kung fu to verify the detected text.
- the device 100 may repeatedly perform the above-described step S1120. Also, according to another example, the device 100 may request to recapture the screen of the device 100.
- FIG. 7 is a flowchart illustrating a method of recognizing content by using a template screen corresponding to a content service, according to an exemplary embodiment.
- the device 100 may receive a control signal for controlling at least one content provided by the device 100.
- the device 100 may determine a type of content service that provides at least one content to the device 100 based on the received control signal.
- the device 100 may determine the type of content service providing at least one content to the device 100 based on the received control signal.
- the type of content service may be any one of a web-based video on demand (VOD) service, a live service, and an application-based service.
- VOD web-based video on demand
- live service live service
- application-based service application-based service
- the device 100 may capture a screen of the device 100 on which content is displayed.
- the device 100 may capture a screen of the device 100 on which content is displayed as a control signal is received.
- the device 100 may select a template screen according to the determined type of content service.
- the template screen may be different according to the type of content service that provides at least one content to the device 100.
- the type of content service an area in which content information is displayed on a template screen and a size, color, shape, etc. of text constituting the content information may be different.
- steps S730 and S740 are described as parallel processes for convenience of description, but each step may be performed with a temporal relationship with each other.
- the device 100 may determine whether the captured screen corresponds to the template screen.
- the device 100 may perform the above-described steps S710 to S740 repeatedly.
- the device 100 may extract string data including content information from a preset area of the captured screen.
- the device 100 may extract string data including content information from a preset area of the captured screen.
- the device 100 may detect the text corresponding to the extracted content information by comparing the extracted string data information with at least one text included in a preset semantic recognition model.
- step S770 may correspond to step S630 described above with reference to FIG. 6.
- the device 100 may recognize content displayed on the screen of the device 100 based on the detected text.
- step S780 may correspond to step S640 described above with reference to FIG. 6.
- FIG. 8 is a flowchart illustrating a method of recognizing content by the device 100 according to an exemplary embodiment.
- the device 100 may receive a control signal for controlling at least one content provided by the device 100.
- step S810 may correspond to step S210 described above with reference to FIG. 2.
- the device 100 may capture a screen of the device 100 on which content is displayed at a predetermined cycle.
- the device 100 may determine whether the captured screen corresponds to the template screen.
- the device 100 may repeatedly capture the screen of the device 100 in step S810.
- the device 100 may extract a string including content information from a preset area of the captured screen.
- the device 100 may extract a string including content information from a predetermined area of the captured screen. For example, the device 100 may extract text from the captured screen in an area corresponding to the content information display area preset in the template screen.
- the device 100 may determine whether there is a text corresponding to content information among at least one text included in the semantic recognition model. Meanwhile, when the text corresponding to the content information is not detected, the device 100 according to an exemplary embodiment may repeatedly perform an operation of capturing the screen of the device 100 in operation S720.
- the device 100 may detect text corresponding to content information among at least one text included in the semantic recognition model.
- Step S860 may correspond to step S630 described above with reference to FIG. 6.
- the device 100 may recognize content displayed on the screen of the device 100 based on the detected text.
- Step S870 may correspond to step S640 described above with reference to FIG. 6.
- 9A to 9D are diagrams for describing a method in which the device 100 recognizes content using a semantic recognition model, according to an exemplary embodiment.
- the device 100 may receive a control signal for controlling content played by the device 100.
- the device 100 may determine a template screen used to identify content as the first template screen based on the received control signal.
- the device 100 may capture the screen 910a corresponding to the determined first template screen.
- the device 100 may detect an image 912a including content information from the captured screen 910a using the first template screen.
- the device 100 may read the detected image 912a to recognize the text 914a indicating information about the content.
- the device 100 may determine information necessary for identifying the content from the recognized text 914a using a preset semantic recognition model. For example, the device 100 may select text representing a title of content included on the recognized text 914a.
- the device 100 may compare the recognized text 914a with a preset semantic recognition model, and correct the 'Hong Fu' determined to be 'Kung Fu'.
- the device 100 may determine the type of the channel and the title of the content based on the text 916a in which an error is corrected.
- the device 100 may capture a screen 910b corresponding to the second template screen determined based on the control signal.
- the device 100 may detect an image 912b including content information from the captured screen 910b using the second template screen.
- the device 100 may read the detected image 912b to recognize the text 914b representing information about the content.
- the device 100 may determine information necessary for identifying the content from the recognized text 914b using a preset semantic recognition model. For example, the device 100 may select text representing a title of content included on the recognized text 914b.
- the device 100 may compare the recognized text 914b with a preset semantic recognition model, and correct the 'high' and 'ki' which are determined to be 'jiko' and 'ZI'. The device 100 may determine a title of the content based on the text 916b in which an error is corrected.
- the device 100 may capture a screen 910c corresponding to a third template screen determined based on a control signal.
- the device 100 may detect an image 912c including content information from the captured screen 910c using the third template screen.
- the device 100 may read the detected image 912c to recognize a text 914c representing information about content.
- the device 100 may determine information necessary for identifying the content from the recognized text 914c by using a preset semantic recognition model. For example, the device 100 may select 'descriptive items F' which is text indicating a title of content included on the recognized text 914c.
- the device 100 may compare the recognized text 914c with a preset semantic recognition model, and correct 'Joe F', which is determined to be, to 'za'.
- the device 100 may determine a title of the content based on the text 916c in which an error is corrected.
- the device 100 may capture a screen 910d corresponding to the fourth template screen determined based on the control signal.
- the device 100 may detect an image 912d including content information from the captured screen 910d using the fourth template screen.
- the device 100 may read the detected image 912d to recognize the text 914d indicating information about the content.
- the device 100 may determine information necessary for identifying the content from the recognized text 914d by using a preset semantic recognition model. For example, the device 100 may select '041', 'K E35joy' and 'Gag Concert' which are texts indicating title and channel information of content included in the recognized text 914d.
- the device 100 may compare the recognized text 914d with a preset semantic recognition model and correct 'K E35' that is determined to be “KBS”.
- the device 100 may determine the type of the channel and the title of the content based on the text 916d in which an error is corrected.
- the device 100 may recognize content more accurately by using a preset semantic recognition model as described above with reference to FIGS. 9A to 9D.
- 10A and 10B are block diagrams of a device 100 recognizing content according to an embodiment.
- a device 100 may include a communication unit 110, a controller 130, and a display unit 120.
- a communication unit 110 may include a communication unit 110, a controller 130, and a display unit 120.
- the device 100 may be implemented by more components than the illustrated components, and the device 100 may be implemented by fewer components.
- the device 100 may include, in addition to the communication unit 110, the display unit 120, and the control unit 130, an audio processing unit 115 and an audio output unit ( 125, the detector 140, the tuner 150, the power supply 160, the input / output unit 170, the video processor 180, and the storage 190 may be further included.
- an audio processing unit 115 and an audio output unit 125, the detector 140, the tuner 150, the power supply 160, the input / output unit 170, the video processor 180, and the storage 190 may be further included.
- the communication unit 110 may connect the device 100 to an external device (for example, an input device, a service providing device, a server, etc.) under the control of the controller 130.
- the controller 130 may transmit / receive content to a service providing device connected through the communication unit 110, download an application from the service providing device, or perform web browsing.
- the communication unit 110 may include one of a wireless LAN 111, a Bluetooth 112, and a wired Ethernet 113 in response to the performance and structure of the device 100.
- the communication unit 110 may include a combination of a wireless LAN 111, a Bluetooth 112, and a wired Ethernet 113.
- the communication unit 110 may receive a control signal of the input device under the control of the controller 130.
- the control signal may be implemented in a Bluetooth type, an RF signal type or a Wi-Fi type.
- the communication unit 110 may further include other short-range communication (eg, near field communication (not shown)) and Bluetooth low energy (BLE), in addition to Bluetooth.
- short-range communication eg, near field communication (not shown)
- BLE Bluetooth low energy
- the communication unit 110 receives a control signal for controlling at least one content provided from the device 100.
- the communication unit 110 may perform a function corresponding to the function of the sensing unit 140 to be described later.
- the communication unit 110 may transmit the extracted content information to the server.
- the communication unit 110 may receive content viewing pattern information of the user of the device 100 determined based on the content information extracted from the server 200.
- the display unit 120 generates a driving signal by converting an image signal, a data signal, an OSD signal, a control signal, and the like processed by the controller 140.
- the display unit 120 may be implemented as a PDP, an LCD, an OLED, a flexible display, or a 3D display.
- the display unit 120 may be configured as a touch screen and used as an input device in addition to the output device.
- the display 120 displays content.
- the display 120 may have a configuration corresponding to a screen in that content is displayed.
- the controller 130 typically controls the overall operation of the device 100.
- the controller 130 executes the programs stored in the storage 190 to execute the communication unit 110, the display unit 120, the audio processor 115, the audio output unit 125, and the detector 140.
- the tuner unit 150, the power supply unit 160, the input / output unit 170, the video processing unit 180, and the storage unit 190 may be controlled overall.
- the controller 130 captures a screen of the device 100 at predetermined intervals. In addition, when the captured screen corresponds to the template screen, the controller 130 extracts a string including content information from a preset area of the captured screen. The controller 130 recognizes the content displayed on the display 120 based on the content information included in the extracted character string.
- the controller 130 may determine a type of content service that provides at least one content to the device 100.
- the controller 130 may select a template screen according to the determined type of content service.
- the controller 130 may stop capturing the screen on which the content is displayed. In addition, the controller 130 may determine the type of content service and the type of control signal. The controller 130 may capture a screen of the device 100 on which content is displayed at a predetermined cycle based on the determined type of content service and the type of a control signal.
- the controller 130 may determine whether the received control signal is a signal pointing to the screen of the device 100 and detect a preset area from the point point on the screen. In addition, the controller 130 may extract a string including content information from the detected region. The controller 130 may recognize the content displayed on the display 120 based on the content information.
- the controller 130 detects text corresponding to the extracted content information by comparing the extracted content information with at least one text included in a preset semantic recognition model.
- the controller 130 may preset a semantic recognition model.
- the controller 130 may detect text having the highest similarity with the text included in the extracted content information among at least one text included in the semantic recognition model.
- the controller 130 recognizes the content displayed on the screen of the device 100 based on the detected text. For example, the controller 130 may determine the detected text as the title of the content displayed on the screen of the device 100. According to another example, the device 100 may verify the detected text based on the additional information. According to another example, the device 100 may verify the detected text based on the detected voice data.
- the audio processor 115 performs audio data processing.
- the audio processor 115 may perform various processing such as decoding, amplification, noise filtering, or the like on the audio data.
- the audio processor 115 may include a plurality of audio processing modules to process audio corresponding to a plurality of contents.
- the audio output unit 125 outputs audio included in the broadcast signal received through the tuner unit 140 under the control of the controller 180.
- the audio output unit 125 may output audio (eg, voice or sound) input through the communication unit 150 or the input / output unit 170.
- the audio output unit 125 may output audio stored in the storage 190 under the control of the controller 110.
- the audio output unit 125 may include at least one of a speaker 126, a headphone output terminal 127, or a S / PDIF (Sony / Philips Digital Interface: output terminal 128). It may include a combination of a speaker 126, a headphone output terminal 127, and an S / PDIF output terminal 128.
- the detector 140 may detect a user input and transmit the detected signal to the controller 130. In addition, the detector 140 may detect a user input for power on / off, channel selection, channel up / down, and screen setting. In addition, the detector 140 may detect a user input for moving a cursor displayed on the display 120 and a direction key input for moving focus between candidate items. In addition, the detector 140 detects a user's voice, a user's video, or a user's interaction.
- the microphone 141 receives the uttered voice of the user.
- the microphone 141 may convert the received voice into an electrical signal and output the converted signal to the controller 130.
- the microphone 141 may be embodied integrally or detachably with the device 100.
- the separated microphone 141 may be electrically connected to the image display device 100b through the communication unit 110 or the input / output unit 170. It will be readily understood by those skilled in the art that the microphone 141 may be excluded depending on the performance and structure of the device 100.
- the camera unit 142 may convert the received image into an electric signal and output the converted image to the controller 130 under the control of the controller 130.
- the light receiver 143 receives an optical signal (including a control signal) received from an external input device through a light window (not shown) of the bezel of the display 120.
- the optical receiver 143 may receive an optical signal corresponding to a user input (for example, touch, press, touch gesture, voice, or motion) from the input device.
- the control signal may be extracted by the control of the controller 130 from the received optical signal.
- the tuner 150 tunes only a frequency of a channel to be received by the display apparatus 100 among many propagation components through amplification, mixing, and resonance of a broadcast signal received by wire or wirelessly. can be selected by tuning.
- the broadcast signal includes audio, video, and additional information (eg, an EPG).
- the tuner unit 150 may determine a channel number according to a user input (for example, a control signal received from the control apparatus 200, for example, a channel number input, an up-down input of a channel, and a channel input on an EPG screen).
- a broadcast signal may be received in a frequency band corresponding to (for example, cable broadcast 506).
- the tuner unit 150 may receive a broadcast signal from various sources such as terrestrial broadcast, cable broadcast, satellite broadcast, and internet broadcast.
- the tuner unit 150 may receive a broadcast signal from a source such as analog broadcast or digital broadcast.
- the broadcast signal received through the tuner unit 150 is decoded (eg, audio decoded, video decoded or side information decoded) and separated into audio, video and / or side information.
- the separated audio, video and / or additional information may be stored in the storage 190 under the control of the controller 130.
- the power supply unit 160 supplies power input from an external power source to components inside the device 100 under the control of the controller 130.
- the power supply unit 160 may supply power output from one or more batteries (not shown) located in the device 100 to the internal components under the control of the controller 130.
- the input / output unit 170 may control the controller 130 to control video (for example, video), audio (for example, voice, music, etc.) and additional information (for example, from the outside of the device 100).
- EPG electronic program guide
- Input / output unit 170 is one of the HDMI port (High-Definition Multimedia Interface port, 171), component jack (component jack, 172), PC port (PC port, 173), and USB port (USB port, 174) It may include.
- the input / output unit 170 may include a combination of an HDMI port 171, a component jack 172, a PC port 173, and a USB port 174.
- the video processor 180 processes the video data received by the device 100.
- the video processor 180 may perform various image processing such as decoding, scaling, noise filtering, frame rate conversion, resolution conversion, and the like on the video data.
- the controller 130 stores a signal or data input from the outside of the device 100, or uses a RAM (RAM) 181 or an image display device 100b that is used as a storage area corresponding to various operations performed by the device 100.
- the control program for controlling may include a ROM 182 and a processor 183 stored therein.
- the processor 183 may include a graphic processor (not shown) for graphic processing corresponding to video.
- the processor 183 may be implemented as a system on chip (SoC) integrating a core (not shown) and a GPU (not shown).
- SoC system on chip
- the processor 183 may include a single core, dual cores, triple cores, quad cores, and multiple cores thereof.
- the processor 183 may include a plurality of processors.
- the processor 183 may be implemented as a main processor (not shown) and a sub processor (not shown) that operates in a sleep mode.
- the graphic processor 184 generates a screen including various objects such as an icon, an image, and a text by using a calculator (not shown) and a renderer (not shown).
- the calculator calculates attribute values such as coordinates, shapes, sizes, colors, and the like in which each object is to be displayed according to the layout of the screen using the user input sensed by the detector 130.
- the renderer generates screens of various layouts including objects based on the attribute values calculated by the calculator. The screen generated by the renderer is displayed in the display area of the display 120.
- the first to n interfaces 185-1 to 185-n are connected to the aforementioned various components.
- One of the interfaces may be a network interface connected to an external device via a network.
- the RAM 181, the ROM 182, the processor 183, the graphics processor 184, and the first through n interfaces 185-1 through 185-n may be interconnected through an internal bus 186. .
- control unit includes a processor 183, a ROM 182, and a RAM 181.
- the storage unit 190 may store various data, programs, or applications for driving and controlling the device 100 under the control of the controller 130.
- the storage 190 may provide a control program for controlling the device 100 and the controller 130, an application initially provided by a manufacturer or downloaded from the outside, a graphical user interface (GUI) related to the application, and a GUI.
- GUI graphical user interface
- objects eg, image text, icons, buttons, etc.
- user information eg., image text, icons, buttons, etc.
- documents e.g., documents, databases, or related data.
- the term “storage unit” refers to a memory card (eg, a micro SD card, a USB memory, or the like) mounted on the storage unit 190, the ROM 182 of the controller, the RAM 181, or the device 100. Not shown).
- the storage 190 may include a nonvolatile memory, a volatile memory, a hard disk drive (HDD), or a solid state drive (SSD).
- the storage 190 may include a broadcast receiving module (not shown), a channel control module, a volume control module, a communication control module, a voice recognition module, a motion recognition module, an optical reception module, a display control module, an audio control module, an external input control module, It may include a power control module, a power control module of an external device connected wirelessly (for example, Bluetooth), a voice database (DB), or a motion database (DB).
- Modules and databases (not shown) of the storage unit 190 include a broadcast reception control function, a channel control function, a volume control function, a communication control function, a voice recognition function, a motion recognition function, and an optical reception control function in the device 100.
- the display control function, the audio control function, the external input control function, the power control function, or a power control function of an external device connected to a wireless device may be implemented in software form.
- the controller 130 may perform each function by using the software stored in the storage 190.
- FIG. 11 is a flowchart illustrating a method of recognizing content provided to the device 100 by the semantic recognition server 200 according to an exemplary embodiment.
- the semantic recognition server 200 receives a string from the device 100 including content information extracted from the captured screen.
- the text string received by the semantic recognition server 200 may include text data displayed in a preset area corresponding to the template screen in the captured screen.
- the content information may include the title of the A content and information about the type of the channel on which the A content is provided.
- this is merely an example, and content information is not limited to the above-described example.
- the semantic recognition server 200 may receive additional information together with a string from the device 100.
- the additional information may include information about a time when content is displayed on the device 100, a user of the device 100, a type of content service, and a size, location, and color of text data displayed on the captured screen.
- the semantic recognition server 200 may receive voice data detected by the device 100 together with content information from the device 100.
- the voice data may include voice data of a user who watches the content displayed on the device 100.
- the meaning recognition server 200 may receive the string and the voice data together with the aforementioned additional information.
- the semantic recognition server 200 compares the received text string with at least one text included in a preset semantic recognition model and detects text corresponding to content information.
- the semantic recognition server 200 may preset a semantic recognition model.
- the semantic recognition model may include at least one text for recognizing content from content information received from the device 100.
- the semantic recognition model may include at least one text indicating a title of content currently provided and a type of channel on which content is provided.
- the semantic recognition model may include at least one of a probability model for calculating a probability according to a predetermined format pattern for each template screen and a relationship between words in the string in order to extract content information from a string.
- the semantic recognition model may be set differently according to the ID of the device and the ID of the user.
- the semantic recognition server 200 may include a semantic recognition model including at least one text indicating a title, a channel type, and the like, of a content preferred by a woman in their twenties among the plurality of semantic recognition models. Can be selected.
- the semantic recognition server 200 may detect text included in a string from a extracted string by using a format pattern preset for a template screen.
- the preset format pattern may be included in the semantic recognition model.
- the device 100 may detect text corresponding to the channel name and title name from the extracted character string.
- the device 100 may not correspond to at least one text included in the extracted text string with a format pattern preset for the template.
- the device 100 may detect text from the string by using a probability model that probabilistically calculates the relation of surrounding words in the string.
- the device 100 may extract the text that the actor's name is A and the broadcast name is B, based on the probability model, from the string B exclusive broadcast of A starring.
- the meaning recognition server 200 recognizes content displayed on the screen of the device 100 based on the detected text.
- the semantic recognition server 200 may determine the detected text as the title of the content displayed on the screen of the device 100.
- the meaning recognition server 200 may verify the detected text based on the additional information received from the device 100.
- text having the highest similarity with text included in the extracted content information may be detected.
- the similarity may be determined according to the matching ratio between the consonants and vowels of the text included in the content information, the type and the coupling relationship, and the like, and at least one text included in the semantic recognition model.
- the semantic recognition server 200 may detect a kung fu having the highest similarity among at least one text included in the semantic recognition model.
- the semantic recognition server 200 compares the content information received from the device 100 with the semantic recognition model and detects text from the semantic recognition model, thereby correcting an error included in the received content information.
- the semantic recognition server 200 may verify the detected text based on the voice data received from the device 100. For example, while kung fu is detected as the title of the content, it may be determined whether the voice data received by the semantic recognition server 200 indicates kung fu to verify the detected text.
- the semantic recognition server 200 may repeatedly perform step S1120 described above when it is determined that the detected text is not information suitable for recognizing the content. Also, according to another example, the meaning recognition server 200 may request the device 100 to recapture the screen of the device 100.
- the semantic recognition server 200 may transmit the content recognition result to the viewing pattern analysis server 300.
- the viewing pattern analysis server 300 may determine a viewing pattern including information on a channel and content type, content viewing time, and the like, which the user of the device 100 frequently watches.
- the information about the viewing pattern may be transmitted to the server of the content service provider or the advertisement provider.
- FIG. 12 is a block diagram illustrating a semantic recognition server 200 that recognizes content provided to the device 100, according to an exemplary embodiment.
- the semantic recognition server 200 may include a communication unit 210, a processor 220, and a storage unit 230. However, not all illustrated components are essential components.
- the semantic recognition server 200 may be implemented by more components than the illustrated components, and the semantic recognition server 200 may be implemented by fewer components.
- the communication unit 210 may connect the meaning recognition server 200 to an external device (eg, the device 100, the viewing pattern analysis server, etc.) under the control of the processor 220.
- the processor 220 may receive a string including content information from the device 100 connected through the communication unit 210.
- the communication unit 210 may receive content data at a predetermined cycle from an external web server (not shown).
- the content data can be used to generate a semantic recognition model.
- the processor 220 may transmit the identification information of the content displayed on the screen of the device 100 to the viewing pattern analysis server (not shown) through the communication unit 210.
- the communication unit 210 may receive voice data of the user sensed by the device 110.
- the voice data may include voice data sensed while the user watches the content displayed on the device 100.
- the processor 220 typically controls the overall operation of the semantic recognition server 200.
- the processor 220 may overall control the communicator 210, the storage 230, and the like by executing programs stored in the storage 230.
- the processor 220 compares the text string received through the communication unit 210 with at least one text included in a preset semantic recognition model. In addition, the processor 220 detects text corresponding to the content information from the at least one text based on the comparison result. For example, the processor 220 may detect text included in the character string from the extracted character string by using a format pattern preset for the template screen. According to another example, the device 100 may not correspond to at least one text included in the extracted text string with a format pattern preset for the template. In this case, the device 100 may detect text from the string by using a probability model that probabilistically calculates the relation of surrounding words in the string.
- the processor 220 recognizes content displayed on a screen of the device 100 based on the detected text. According to another embodiment, the processor 220 may verify the detected text based on the additional information received through the communication unit 210. Also, according to another example, when comparing the extracted content information with at least one text and selecting at least one candidate text having a similarity or more than a threshold, the processor 220 compares the at least one candidate text with additional information. By selecting either one, the accuracy of content recognition can be increased.
- the processor 220 may recognize content displayed on the screen of the device 100 based on the detected text and the voice data of the user received through the communication unit 210.
- the processor 220 may update the semantic recognition model based on at least one content data acquired at a predetermined period.
- the processor 220 may select the semantic recognition model according to the user's profile including at least one of the user's age, gender, and occupation.
- the storage unit 230 may store various data, programs, or applications for driving and controlling the meaning recognition server 200 under the control of the processor 220.
- the storage 230 may store at least one semantic recognition model.
- the storage unit 230 may store content data received from a web server (not shown).
- FIG. 13 is a conceptual diagram illustrating a system for recognizing content displayed on the device 100 (hereinafter, referred to as a content recognition system) according to an embodiment.
- the content recognition system may include a device 100 and a meaning recognition server 200.
- the device 100 may include a controller 130 and a detector 140.
- this is only to show some components necessary to describe the present embodiment, the components included in the device 100 is not limited to the above-described example.
- the controller 130 may capture a screen of the device 100 on which content is displayed as a control signal is received by the device 100. If the captured screen corresponds to a pre-stored template screen, the controller 130 may extract a string including content information from the captured screen.
- controller 130 may transmit the extracted character string to the meaning recognition server 200.
- the sensor 140 may detect voice data of a user who views at least one content received by the device 140. For example, voice data of a user who selects any one of at least one content received by the device 100 and voice data of a user who evaluates the content displayed on the device 100 may be sensed.
- the detector 140 may transmit the voice data to the meaning recognition server 200.
- the semantic recognition server 200 may include a communication unit 210, a processor 220, and a storage unit 230. However, this is only to show some components necessary to describe the present embodiment, the components included in the meaning recognition server 200 is not limited to the above-described example.
- the communication unit 210 may receive the string and voice data extracted from the device 100.
- the processor 220 may include a meaning recognizer 222 and a speech recognizer 224.
- the semantic recognizer 222 may detect text corresponding to content information from at least one text included in a preset semantic recognition model.
- the speech recognizer 224 may provide a result of analyzing the received speech data to the meaning recognizer 222. Accordingly, the meaning recognition unit 222 may verify the detected text by comparing the detected text with the speech data analysis result provided from the speech recognition unit 224. According to another example, when a plurality of texts corresponding to the received text is detected, the semantic recognizer 222 may select one by comparing the voice data analysis result with the detected plurality of texts.
- the text detected by the meaning recognition unit 222 may be transmitted to the viewing pattern analysis server 300 through the communication unit 210.
- the viewing pattern analysis server 300 may analyze the viewing pattern of the user of the device 100 by using the text received from the meaning recognition server 200 for a preset period of time.
- FIG. 14 is a block diagram illustrating in more detail an operation of the semantic recognition unit 1400 included in the processor 220 of the semantic recognition server 200, according to an exemplary embodiment.
- the semantic recognizer 1400 may include a content data management module 1410, an update module 1420, and a semantic recognition engine 1430.
- a content data management module 1410 may include a content data management module 1410, an update module 1420, and a semantic recognition engine 1430.
- the illustrated components may be performed in one module or may be performed in more modules than the illustrated components.
- the content data management module 1410 may receive content data from an external web server 50.
- the content data management module 1410 may provide the web server 50 with the type of the content providing service, the title of the content provided from the content providing service, the content of the content, the type of the channel where the content is provided, and the like. Can be received from 50.
- the content data management module 1410 may transmit the content data regarding the title of the content, the type of the channel, etc., from the received content data to the update module 1420.
- the update module 1420 may generate a semantic recognition model based on content data regarding a title, a channel type, and the like of the content received from the content data management module 1410.
- the update module 1420 may generate a semantic recognition model corresponding to the user of the device 100 using the additional information received from the device 100.
- the update module 1420 may generate a personalized semantic recognition model based on information about a gender, age, and age of the user received from the device 100.
- the semantic recognition engine 1430 may recognize content displayed on a screen of the device 100 using a string received from the device 100 based on a semantic recognition model provided from the update module 1420. have.
- the semantic recognition engine 1430 may extract a word predicted as a title of the content from the received character string, and transmit the word to the content data management module 1410.
- the content data management module 1410 may verify whether there is a mistake in the extracted word or whether the extracted word is suitable for recognizing the title of the content.
- the word verified from the content data management module 1410 may be transmitted to the viewing pattern analysis server 300.
- the content data management module 1410 is used.
- the extracted word may be transmitted to the viewing pattern analysis server 300 without going through the verification process.
- the viewing pattern analysis server 300 may analyze the viewing pattern of the user based on the content recognition result received from the meaning recognition engine 1430.
- the viewing pattern analysis server 300 may provide an analysis result to the update module 1420. Accordingly, the update module 1420 may update the semantic recognition model corresponding to the user of the device 100 based on the received viewing pattern analysis result.
- 15 is a block diagram illustrating in more detail an operation of the content data management module 1410 included in the semantic recognizer 220 of the semantic recognition server 200, according to an exemplary embodiment.
- the content data management module 1410 may include a web crawler 1412, a text processing module 1414, a content database management module 1416, and a content database 1418. have. However, not all illustrated components are essential components. For example, the illustrated components may be performed in one module or may be performed in more modules than the illustrated components.
- the web crawler 1412 may acquire content data among data provided from a web server. For example, the web crawler 1412 may visit a web page provided by a content service or a web page of a portal site, and may acquire content data recorded in the web page.
- the text processing module 1414 may process the content data acquired by the web crawler 1412 into a text form. For example, the text processing module 1414 may extract text from an image of a web page acquired by the web crawler 1412.
- the content database management module 1416 may classify the content data in the form of text obtained from the text processing module 1414 according to the type of the content service and the viewing time and store the content data in the content database 1418. .
- the content database management module 1416 may provide content data in text form to the update module 1420 and the semantic recognition engine 1430 described above with reference to FIG. 14.
- the content database 1418 is illustrated as being included in the content information management module 1410. However, this is only an example. According to another example, the content database 1418 may be described with reference to FIG. 12. It may be included in one storage 230.
- FIG. 16 is a diagram for describing a method of processing text based on a semantic recognition model based on text data in the semantic recognition server 200, according to an exemplary embodiment.
- the semantic recognition server 200 may detect at least one text capable of recognizing content from the content data 1610 in a text form.
- the semantic recognition server 200 may extract at least one text available for recognizing the content from the content data in the form of text using the preset template corpus 1620.
- the template corpus 1620 may be composed of words that can be used to recognize content. For example, a movie title, a drama title, a movie channel, and a broadcast time may be included in the template corpus according to an embodiment.
- the semantic recognition server 200 may classify the detected text according to a template corpus item.
- the template text 1630 may be stored together in an index form in the classified text 1630.
- the semantic recognition server 200 may generate a semantic recognition model based on the classified text 1630.
- FIG. 16 is only one example of methods for generating a semantic recognition model, and the method for generating a semantic recognition model in the present invention is not limited to using a corpus.
- Method according to an embodiment of the present invention is implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium.
- the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
- Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.
- Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks.
- Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
- a device may include a processor, a memory for storing and executing program data, a persistent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, a user interface such as a button, and the like.
- Methods implemented by software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor.
- the computer-readable recording medium may be a magnetic storage medium (eg, read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and an optical reading medium (eg, CD-ROM). ) And DVD (Digital Versatile Disc).
- the computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
- the medium is readable by the computer, stored in the memory, and can be executed by the processor.
- An embodiment may be represented by functional block configurations and various processing steps. Such functional blocks may be implemented in various numbers of hardware or / and software configurations that perform particular functions.
- an embodiment may include an integrated circuit configuration such as memory, processing, logic, look-up table, etc. that may execute various functions by the control of one or more microprocessors or other control devices. You can employ them.
- an embodiment may employ the same or different types of cores, different types of CPUs.
- Similar to the components in the present invention may be implemented in software programming or software elements, embodiments include C, C ++, including various algorithms implemented in combinations of data structures, processes, routines or other programming constructs. It may be implemented in a programming or scripting language such as Java, an assembler, or the like.
- the functional aspects may be implemented with an algorithm running on one or more processors.
- the embodiment may employ the prior art for electronic configuration, signal processing, and / or data processing.
- Terms such as “mechanism”, “element”, “means” and “configuration” can be used widely and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.
- connection or connection members of the lines between the components shown in the drawings by way of example shows a functional connection and / or physical or circuit connections, in the actual device replaceable or additional various functional connections, physical It may be represented as a connection, or circuit connections.
- such as "essential”, “important” may not be a necessary component for the application of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Social Psychology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Graphics (AREA)
- Computing Systems (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
L'invention concerne un dispositif qui effectue une capture d'écran du dispositif lorsqu'un signal de commande destiné à commander un ou plusieurs éléments de contenu fournis au dispositif est reçu, et, si l'écran capturé correspond à un écran de modèle, extrait un texte comprenant des informations de contenu à partir d'une région prédéfinie de l'écran capturé, compare le texte extrait à au moins un élément de texte compris dans un modèle de reconnaissance de signification prédéfini, détecte un texte correspondant aux informations de contenu, et reconnaît un contenu affiché sur l'écran du dispositif sur la base du texte détecté.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17756792.2A EP3399765A4 (fr) | 2016-02-26 | 2017-02-22 | Procédé et dispositif de reconnaissance de contenu |
CN201780013189.XA CN108702550A (zh) | 2016-02-26 | 2017-02-22 | 用于识别内容的方法及设备 |
US16/078,558 US20190050666A1 (en) | 2016-02-26 | 2017-02-22 | Method and device for recognizing content |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2016-0023639 | 2016-02-26 | ||
KR20160023639 | 2016-02-26 | ||
KR1020160073214A KR102561711B1 (ko) | 2016-02-26 | 2016-06-13 | 컨텐트를 인식하는 방법 및 장치 |
KR10-2016-0073214 | 2016-06-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017146454A1 true WO2017146454A1 (fr) | 2017-08-31 |
Family
ID=59686400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2017/001933 WO2017146454A1 (fr) | 2016-02-26 | 2017-02-22 | Procédé et dispositif de reconnaissance de contenu |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017146454A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107977652A (zh) * | 2017-12-21 | 2018-05-01 | 维沃移动通信有限公司 | 一种屏幕显示内容的提取方法及移动终端 |
EP3748982A4 (fr) * | 2018-05-21 | 2020-12-09 | Samsung Electronics Co., Ltd. | Dispositif électronique et acquisition d'informations de reconnaissance de contenu associée |
US11184670B2 (en) | 2018-12-18 | 2021-11-23 | Samsung Electronics Co., Ltd. | Display apparatus and control method thereof |
US11190837B2 (en) | 2018-06-25 | 2021-11-30 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030083906A (ko) * | 2002-04-23 | 2003-11-01 | 엘지전자 주식회사 | 텍스트 정보를 이용한 티브이 프로그램 추천 방법 |
JP2007013320A (ja) * | 2005-06-28 | 2007-01-18 | Funai Electric Co Ltd | 映像記録装置、コンテンツ記録装置、コンテンツ検索制御方法、および、コンテンツ検索プログラム |
JP2008154200A (ja) * | 2006-12-14 | 2008-07-03 | Samsung Electronics Co Ltd | 動画像の字幕検出装置およびその方法 |
US20140282668A1 (en) * | 2013-03-14 | 2014-09-18 | Samsung Electronics Co., Ltd. | Viewer behavior tracking using pattern matching and character recognition |
KR20150060801A (ko) * | 2012-09-19 | 2015-06-03 | 구글 인코포레이티드 | 현재 재생되는 텔레비젼 프로그램들과 연관된 인터넷-액세스가능 컨텐츠의 식별 및 제시 |
-
2017
- 2017-02-22 WO PCT/KR2017/001933 patent/WO2017146454A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030083906A (ko) * | 2002-04-23 | 2003-11-01 | 엘지전자 주식회사 | 텍스트 정보를 이용한 티브이 프로그램 추천 방법 |
JP2007013320A (ja) * | 2005-06-28 | 2007-01-18 | Funai Electric Co Ltd | 映像記録装置、コンテンツ記録装置、コンテンツ検索制御方法、および、コンテンツ検索プログラム |
JP2008154200A (ja) * | 2006-12-14 | 2008-07-03 | Samsung Electronics Co Ltd | 動画像の字幕検出装置およびその方法 |
KR20150060801A (ko) * | 2012-09-19 | 2015-06-03 | 구글 인코포레이티드 | 현재 재생되는 텔레비젼 프로그램들과 연관된 인터넷-액세스가능 컨텐츠의 식별 및 제시 |
US20140282668A1 (en) * | 2013-03-14 | 2014-09-18 | Samsung Electronics Co., Ltd. | Viewer behavior tracking using pattern matching and character recognition |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107977652A (zh) * | 2017-12-21 | 2018-05-01 | 维沃移动通信有限公司 | 一种屏幕显示内容的提取方法及移动终端 |
EP3748982A4 (fr) * | 2018-05-21 | 2020-12-09 | Samsung Electronics Co., Ltd. | Dispositif électronique et acquisition d'informations de reconnaissance de contenu associée |
US11575962B2 (en) | 2018-05-21 | 2023-02-07 | Samsung Electronics Co., Ltd. | Electronic device and content recognition information acquisition therefor |
US11190837B2 (en) | 2018-06-25 | 2021-11-30 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
US11184670B2 (en) | 2018-12-18 | 2021-11-23 | Samsung Electronics Co., Ltd. | Display apparatus and control method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020251283A1 (fr) | Sélection d'un modèle d'intelligence artificielle sur la base de données d'entrée | |
WO2020101453A1 (fr) | Dispositif électronique et procédé de reconnaissance d'une scène audio | |
WO2014003283A1 (fr) | Dispositif d'affichage, procédé de commande de dispositif d'affichage, et système interactif | |
WO2018043895A1 (fr) | Dispositif d'affichage et procédé de commande de dispositif d'affichage | |
WO2017111252A1 (fr) | Dispositif électronique et procédé de balayage de canaux dans un dispositif électronique | |
WO2017146454A1 (fr) | Procédé et dispositif de reconnaissance de contenu | |
WO2017099331A1 (fr) | Dispositif électronique et procédé pour dispositif électronique fournissant une interface utilisateur | |
WO2016117836A1 (fr) | Appareil et procédé de correction de contenu | |
WO2015194693A1 (fr) | Dispositif d'affichage de vidéo et son procédé de fonctionnement | |
WO2018155859A1 (fr) | Dispositif d'affichage d'image et procédé de fonctionnement dudit dispositif | |
WO2020017930A1 (fr) | Procédé de fourniture d'une liste de canaux recommandés et dispositif d'affichage associé | |
WO2019013447A1 (fr) | Dispositif de commande à distance et procédé de réception de voix d'un utilisateur associé | |
WO2019146844A1 (fr) | Appareil d'affichage et procédé permettant d'afficher un écran d'un appareil d'affichage | |
WO2017119708A1 (fr) | Appareil d'affichage d'image et son procédé de fonctionnement | |
WO2020145615A1 (fr) | Procédé de fourniture d'une liste de recommandations et dispositif d'affichage l'utilisant | |
WO2019054791A1 (fr) | Procédé et appareil d'exécution de contenu | |
WO2018124842A1 (fr) | Procédé et dispositif de fourniture d'informations sur un contenu | |
WO2019135433A1 (fr) | Dispositif d'affichage et système comprenant ce dernier | |
WO2020130262A1 (fr) | Dispositif informatique et procédé de fonctionnement associé | |
WO2015178716A1 (fr) | Procédé et dispositif de recherche | |
WO2019088627A1 (fr) | Appareil électronique et procédé de commande associé | |
WO2017160062A1 (fr) | Procédé et dispositif de reconnaissance de contenu | |
WO2019198951A1 (fr) | Dispositif électronique et procédé de fonctionnement de celui-ci | |
WO2020071816A1 (fr) | Dispositif d'affichage et serveur de communication avec le dispositif d'affichage | |
WO2023058835A1 (fr) | Dispositif électronique et son procédé de commande |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2017756792 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2017756792 Country of ref document: EP Effective date: 20180730 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |